Skip to content
Snippets Groups Projects
handson.ipynb 329 KiB
Newer Older
{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Logo](figures/GOE_Logo_Quer_IPC_Farbe_RGB.png)\n",
    "# High-Dimensional Neural Network Potentials\n",
    "**[Alexander L. M. Knoll](mailto:aknoll@chemie.uni-goettingen.de), and [Moritz R. Schäfer](mailto:moritzrichard.schaefer@uni-goettingen.de)** \n",
    "\n",
    "Behler Group, Theoretische Chemie, Institut für Physikalische Chemie\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "For this tutorial it is intended to use the RuNNer release version 1.2, available in conda-forge. \n",
    "The most recent version of RuNNer is [hosted on Gitlab](https://gitlab.com/TheochemGoettingen/RuNNer). For access please contact Prof. Dr. Jörg Behler (joerg.behler@uni-goettingen.de)."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/kalm/.pyenv/versions/pyiron/lib/python3.10/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated\n",
      "  \"class\": algorithms.Blowfish,\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import ipympl\n",
    "import ipywidgets as widgets\n",
    "\n",
    "from pyiron_atomistics import Project\n",
    "from pyiron_contrib.atomistics.runner.job import RunnerFit\n",
    "from pyiron_contrib.atomistics.runner.utils import container_to_ase\n",
    "\n",
    "from ase.geometry import get_distances\n",
    "\n",
    "from runnerase import generate_symmetryfunctions\n",
    "\n",
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Background\n",
    "### Architecture of an HDNNP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "**RuNNer** is a stand-alone Fortran program for the construction of high-dimensional neural network potentials (HDNNPs), written mainly by Jörg Behler. The central assumption made in constructing a HDNNP is that the total energy of the system $E_{\\mathrm{tot}}$ [can be separated into atomic contributions $E_i$](https://www.doi.org/10.1103/PhysRevLett.98.146401). HDNNP relates the local environment of the atoms to their atomic energies $E_i$, which contribute to the sum of all $N$ atomic energies, resulting in the total energy of the system $E_\\mathrm{tot}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\\begin{align}\n",
    "E_\\mathrm{tot} = \\sum_{i}^{N}E_i\\notag\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Every atomic energy is described by an atomic neural network (NN), which is element-specific. The entirety of all atomic NNs composes a HDNNP, whose general architecture is shown below for a binary system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"figures/2g.png\" class=\"center\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "As you can see, the atomic contributions in this model are predicted independently from each other. Therefore, the model can easily describe systems with differering numbers of atoms: adding or removing an atom corresponds to adding or removing a row in the figure shown above. This ability is what puts the \"high-dimensional\" into the name \"HDNNP\". \n",
    "\n",
    "Each atomic neural networks receives input information about the local atomic environment up to a certain cutoff radius $R_{\\mathrm{c}}$. This information is encoded based on the Cartesian coordinates in many-body descriptors, so-called [atom-centered symmetry functions (ACSF or just SF)](https://www.doi.org/10.1063/1.3553717). More details about this are shown below. For each atom, the values of multiple SFs compose a SF vector $G$ which is the input layer of the atomic NNs.\n",
    "\n",
    "Atomic NNs look like this:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"figures/ann.png\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Every value in the SF vector $G$ serves as one piece of input information to the atomic NN. We refer to the circles in the figure as _nodes_ (from graph theory) or _neurons_ (from neural science). The information from the input nodes flows through the atomic NN from left to right: the input layer is followed by a configurable number of hidden layers which consist, in turn, of an arbitrary number of _hidden nodes_. At the end, all information is collected in the output layer, which in our case is interpreted as the atomic energy contribution of the atom under consideration. The input nodes and the hidden nodes in the first layer are connected by weights. Moreover, the hidden and output nodes carry a bias value.\n",
    "\n",
    "During training, the weights and biases are optimized using backpropagation to represent best the data in the training data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "SFs provide the input for the NN and describe the local atomic environment of each atom. In principle, one could also use Cartesian coordinates to capture the atomic positions in a structure. As Cartesian coordinates describe the absolute positions of atoms the numerical input to the atomic NNs would change with translation or rotation of the system. However, these actions do not influence the energy of the system and different numerical inputs belonging to the same NN output lead to large training errors.\n",
    "\n",
    "In contrast, SFs describe the relative positions of the atoms to each other and are hence translationally and rotationally invariant. We differentiate two types of SFs: radial SF depend on the distance between atom pairs and serve as a measure for their bond order. Angular SFs additionally depend on the interatomic angles."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The cutoff function $f_{\\mathrm{c}}$ ensures that only the neighbors within one atomic environment counts towards the symmetry function values. The cutoff radius $R_\\mathrm{c}$ (usually $12\\,\\mathrm{bohr}$) defines how much of the local atomic environment is considered. All SFs and their derivatives will decrease to zero if the pairwise distance is larger than $R_\\mathrm{c}$. There are several cutoff funtions defined in **RuNNer** and we will use here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "    f_{c}(R_{ij}) = \n",
    "    \\begin{cases}\n",
    "    0.5 \\cdot [\\cos(\\pi x) + 1]& ~ \\text{for $R_{ij} \\leq R_\\mathrm{c}$},\\\\\n",
    "    0& ~ \\text{for $R_{ij} > R_\\mathrm{c}$}\n",
    "    \\end{cases}\n",
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "with the atomic distance $R_{ij}$, the cutoff radius $R_\\mathrm{c}$, and $x = \\frac{R_{ij}}{R_{\\mathrm{c}}}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the figure below for a graphical representation of the cutoff radius in a periodic system: the red atom is the central atom for which the SF values will be calculated. All yellow atoms lie within in the cutoff radius and will therefore contribute to the SF values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/Rc.png\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "To define the parameters for the radial SFs, it is important to know the shortest bond distance for each element combination in your data set. Usually, 5-6 radial SF are used for any element pair, with different $\\eta$ values to increase the resolution for structure description. It is possible to shift the maximum of the radial SF $G^2$ by $R_{s}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "    G_{i}^{2} = \\sum_{j}^{}e^{-\\eta (R_{ij} - R_{s})^2} \\cdot f_{c}(R_{ij}).\n",
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In most applications, the Gaussian exponents $\\eta$ for the radial SFs are chosen such that the SF turning points are equally distributed between the cutoff radius and specific minimum pairwise distance in the training dataset (small eta $\\eta$ = max. contraction). In RuNNer, you can either define element pair specific SF or define global SF which are used for every element combination. It is also possible to define different cutoff radii for the SF, even though this is rarely helpful and therefore not recommended.\n",
    "\n",
    "Below, you can see a graphical representation of the radial symmetry functions including the cutoff function for a cutoff radius of 12 Bohr."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/radials.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The same rules apply to the angular SFs. Here, however, three atomic positions are included in the calculation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "    G_{i}^{3} = 2^{\\zeta - 1}\\sum_{j}^{} \\sum_{k}^{} \\left[( 1 + \\lambda \\cdot cos \\theta_{ijk})^{\\zeta} \\cdot e^{-\\eta (R_{ij}^2 + R_{ik}^2 + R_{jk}^2)} \\cdot f_{\\mathrm{c}}(R_{ij}) \\cdot f_{\\mathrm{c}}(R_{ik}) \\cdot f_{\\mathrm{c}}(R_{jk}) \\right]\n",
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The angle $\\theta_{ijk} = \\frac{\\mathbf{R}_{ij} \\cdot \\mathbf{R}_{ik}}{R_{ij} \\cdot R_{ik}}$ is centered at atom $i$. For most system, we use permutations of $\\zeta = \\{1, 2, 4, 16\\}$, $\\eta = 0$, and $\\lambda$ = $\\{+1, -1\\}$. If many atoms of each element are present, angular SFs are usually not critical and a default set of SFs can be used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/angulars.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Due to time limitations, we will only focus on the model described above which is known as the second generation of HDNNPs (see [here](https://www.doi.org/10.1103/PhysRevLett.98.146401), and [here](https://www.doi.org/10.1002/anie.201703114), and [here](https://www.doi.org/10.1002/qua.24890)). However, in recent years third- and fourth-generation HDNNPs were developed by the Behler group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/2g.png\" width=\"500\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In second-generation HDNNPs only atomic interactions inside the cutoff sphere are taken into account. The resulting short-ranged potentials are well-suited to describe local bonding even for complex atomic environments. However, it can be expected that for many systems long-range interactions, primarily electrostatics, will be important.\n",
    "\n",
    "To overcome those limitations, third-generation NNs (see [here](https://www.doi.org/10.1103/PhysRevB.83.153101), and [here](https://www.doi.org/10.1063/1.3682557)) define a second set of atomic neural networks to construct environment-dependent atomic charges. They can then be used to calculate the long-range electrostatic energy without truncation. The total energy of the system is then given by the sum of the short-range and the electrostatic energies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/3g.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "While the use of environment-dependent charges is a clear step forward, they are not sufficient if long-range charge transfer is present.\n",
    "\n",
    "When dealing with systems where long-range charge transer is present, the usage of fourth-generation NNs (see [here](https://www.doi.org/10.1038/s41467-020-20427-2), and [here](https://www.doi.org/10.1021/acs.accounts.0c00689)) is recommended. Here, environment-dependent electronegativities $\\chi$ are computed first. They will then be used in a charge equilibration scheme to determine the atomic charges $Q$. Again, we can compute the electrostatic energy from this. Moreover, the atomic charges serve as an additional input neuron to train the short-range energy and forces. As it was the case for the third-generation NNs the total energy is then given by the sum of the short-range and the electrostatic energies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/4g.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Using RuNNer via the pyiron Interface "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, training a HDNNP with **RuNNer** can be separated into three different stages - so-called modes - in which different types of calculation are performed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- **Mode 1:** calculation of the SF values and separation of the dataset into a training and testing set.\n",
    "- **Mode 2:** training of the model to construct the HDNNP.\n",
    "- **Mode 3:** prediction of energy and forces (stress and charges can also be predicted)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "All these steps are performed consecutively beginning with mode 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The creation of a meaningful neural network potential lives and dies with high quality training data. Therefore, we will begin by inspecting the full training dataset. \n",
    "The dataset has been stored prior to the workshop in form of a `TrainingContainer`. \n",
    "In pyiron, `TrainingContainer`s are jobs which take a set of structures and properties like energies, forces, ... and store them in HDF format.\n",
    "\n",
    "Go ahead and open up the project:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {},
   "outputs": [],
   "source": [
    "pr = Project('../../introduction/training')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The project already contains several jobs:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },

   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>status</th>\n",
       "      <th>chemicalformula</th>\n",
       "      <th>job</th>\n",
       "      <th>subjob</th>\n",
       "      <th>projectpath</th>\n",
       "      <th>project</th>\n",
       "      <th>timestart</th>\n",
       "      <th>timestop</th>\n",
       "      <th>totalcputime</th>\n",
       "      <th>computer</th>\n",
       "      <th>hamilton</th>\n",
       "      <th>hamversion</th>\n",
       "      <th>parentid</th>\n",
       "      <th>masterid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>212</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>full</td>\n",
       "      <td>/full</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:34.816533</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>213</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>basic</td>\n",
       "      <td>/basic</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:47.281055</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>214</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>Al_basic_atomicrex</td>\n",
       "      <td>/Al_basic_atomicrex</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 02:00:15.887059</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>226</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>data_lithium</td>\n",
       "      <td>/data_lithium</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-07-21 08:18:19.193090</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>227</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode1</td>\n",
       "      <td>/fit_mode1</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:22.251691</td>\n",
       "      <td>2022-07-21 08:18:28.328624</td>\n",
       "      <td>6.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>228</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode2</td>\n",
       "      <td>/fit_mode2</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:30.158483</td>\n",
       "      <td>2022-07-21 08:22:02.862973</td>\n",
       "      <td>212.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>227.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>229</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode3</td>\n",
       "      <td>/fit_mode3</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:22:04.136475</td>\n",
       "      <td>2022-07-21 08:22:11.189763</td>\n",
       "      <td>7.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>228.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>230</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_0</td>\n",
       "      <td>/job_a_3_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:12.207784</td>\n",
       "      <td>2022-07-21 08:22:15.498472</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>231</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_167</td>\n",
       "      <td>/job_a_3_167</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:15.639460</td>\n",
       "      <td>2022-07-21 08:22:15.965042</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>232</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_333</td>\n",
       "      <td>/job_a_3_333</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.102696</td>\n",
       "      <td>2022-07-21 08:22:16.395679</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>233</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_5</td>\n",
       "      <td>/job_a_3_5</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.572264</td>\n",
       "      <td>2022-07-21 08:22:16.831195</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>234</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_667</td>\n",
       "      <td>/job_a_3_667</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.975377</td>\n",
       "      <td>2022-07-21 08:22:17.282200</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>235</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_833</td>\n",
       "      <td>/job_a_3_833</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.434798</td>\n",
       "      <td>2022-07-21 08:22:17.694902</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>236</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_4_0</td>\n",
       "      <td>/job_a_4_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.828141</td>\n",
       "      <td>2022-07-21 08:22:18.089258</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     id    status chemicalformula                 job               subjob  \\\n",
       "0   212  finished            None                full                /full   \n",
       "1   213  finished            None               basic               /basic   \n",
       "2   214  finished            None  Al_basic_atomicrex  /Al_basic_atomicrex   \n",
       "3   226  finished            None        data_lithium        /data_lithium   \n",
       "4   227  finished            None           fit_mode1           /fit_mode1   \n",
       "5   228  finished            None           fit_mode2           /fit_mode2   \n",
       "6   229  finished            None           fit_mode3           /fit_mode3   \n",
       "7   230  finished              Li           job_a_3_0           /job_a_3_0   \n",
       "8   231  finished              Li         job_a_3_167         /job_a_3_167   \n",
       "9   232  finished              Li         job_a_3_333         /job_a_3_333   \n",
       "10  233  finished              Li           job_a_3_5           /job_a_3_5   \n",
       "11  234  finished              Li         job_a_3_667         /job_a_3_667   \n",
       "12  235  finished              Li         job_a_3_833         /job_a_3_833   \n",
       "13  236  finished              Li           job_a_4_0           /job_a_4_0   \n",
       "\n",
       "   projectpath  \\\n",
       "0         None   \n",
       "1         None   \n",
       "2         None   \n",
       "3         None   \n",
       "4         None   \n",
       "5         None   \n",
       "6         None   \n",
       "7         None   \n",
       "8         None   \n",
       "9         None   \n",
       "10        None   \n",
       "11        None   \n",
       "12        None   \n",
       "13        None   \n",
       "\n",
       "                                                                                                                          project  \\\n",
       "0               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "1               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "2               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "3               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "4   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "5   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "6   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "7     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "8     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "9     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "10    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "11    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "12    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "13    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "\n",
       "                    timestart                   timestop  totalcputime  \\\n",
       "0  2022-06-03 00:02:34.816533                        NaT           NaN   \n",
       "1  2022-06-03 00:02:47.281055                        NaT           NaN   \n",
       "2  2022-06-03 02:00:15.887059                        NaT           NaN   \n",
       "3  2022-07-21 08:18:19.193090                        NaT           NaN   \n",
       "4  2022-07-21 08:18:22.251691 2022-07-21 08:18:28.328624           6.0   \n",
       "5  2022-07-21 08:18:30.158483 2022-07-21 08:22:02.862973         212.0   \n",
       "6  2022-07-21 08:22:04.136475 2022-07-21 08:22:11.189763           7.0   \n",
       "7  2022-07-21 08:22:12.207784 2022-07-21 08:22:15.498472           3.0   \n",
       "8  2022-07-21 08:22:15.639460 2022-07-21 08:22:15.965042           0.0   \n",
       "9  2022-07-21 08:22:16.102696 2022-07-21 08:22:16.395679           0.0   \n",
       "10 2022-07-21 08:22:16.572264 2022-07-21 08:22:16.831195           0.0   \n",
       "11 2022-07-21 08:22:16.975377 2022-07-21 08:22:17.282200           0.0   \n",
       "12 2022-07-21 08:22:17.434798 2022-07-21 08:22:17.694902           0.0   \n",
       "13 2022-07-21 08:22:17.828141 2022-07-21 08:22:18.089258           0.0   \n",
       "\n",
       "          computer           hamilton hamversion  parentid masterid  \n",
       "0   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "1   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "2   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "3   pyiron@lap2p#1  TrainingContainer        0.4       NaN     None  \n",
       "4   pyiron@lap2p#1          RunnerFit        0.4       NaN     None  \n",
       "5   pyiron@lap2p#1          RunnerFit        0.4     227.0     None  \n",
       "6   pyiron@lap2p#1          RunnerFit        0.4     228.0     None  \n",
       "7   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "8   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "9   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "10  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "11  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "12  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "13  pyiron@lap2p#1             Lammps        0.1       NaN     None  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pr.job_table()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The training data is stored in the project node `initial`."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In order to get a feeling for the data, we inspect its energy-volume curve:"
   ]
  },
  {
   "cell_type": "code",
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 1008x432 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
    "fig, ax = plt.subplots(figsize=(14, 6))\n",
    "data_full.plot.energy_volume()\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "As you can see in this plot, some regions of configuration space are much more densily sampled than others. The dataset consists of approximately 4000 structures, ranging from bulk lithium and aluminum to off-stochiometric liquid phases of LiAl alloy. \n",
    "\n",
    "Training a potential for such a large dataset to high accuracy takes a few hours. Therefore, we are going to focus on a case study: the subset of pure lithium structures in the dataset. \n",
    "\n",
    "We extract a sample from the full dataset using `TrainingContainer`s convenient `sample` function. It creates a new `TrainingContainer` job (here we give it the name `data_lithium`) using a simple filter function. The filter function will remove:\n",
    "* structures that contain Al.\n",
    "* structures with a positive energy.\n",
    "* structures in which atoms do not have any neighbors within a cutoff radius of 12 Bohr."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "def filter_lithium(container, idx):\n",
    "    \"\"\"Filter a given `container` for the most useful lithium structures.\"\"\"\n",
    "    # Get the elements, total energy and atomic positions.\n",
    "    elements = container.get_array('symbols', idx)\n",
    "    energy = container.get_array('energy', idx)\n",
    "    positions = container.get_array('positions', idx)\n",
    "\n",
    "    # Build the distance matrix.\n",
    "    distmatrix = get_distances(positions, positions)[1]\n",
    "\n",
    "    # Check if every atom has at least one neighbor in a 12 Bohr = 6.35 Ang                                                                                                          \n",
    "    # cutoff radius.\n",
    "    no_neighbors = False\n",
    "    for idx, row in enumerate(distmatrix):\n",
    "\n",
    "        # Remove self interaction.                                                                                                                                                        \n",
    "        row_no_selfinteraction = row[row > 0.0]\n",
    "\n",
    "        if all(row_no_selfinteraction > 6.35):\n",
    "            no_neighbors = True\n",
    "    \n",
    "    return 'Al' not in elements and energy < 0.0 and no_neighbors is False"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "DataContainer({'save_neighbors': True, 'num_neighbors': 12})\n",
      "The job data_lithium was saved and received the ID: 237\n"
     ]
    }
   ],
   "source": [
    "# Remove the job if it already exists.\n",
    "if 'data_lithium' in pr.list_nodes():\n",
    "    pr.remove_job('data_lithium')\n",
    "\n",
    "data_lithium = data_full.sample('data_lithium', filter_lithium)"
   ]
  },
  {
   "cell_type": "code",
   "source": [
    "print(len(list(data_lithium.iter_structures())))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "When inspecting the list of jobs in the project again, you will find that an additional `TrainingContainer` has been created."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
1137 1138 1139 1140 1141 1142 1143 1144 1145 1146 1147 1148 1149 1150 1151 1152 1153 1154 1155 1156 1157 1158 1159 1160 1161 1162 1163 1164 1165 1166 1167 1168 1169 1170 1171 1172 1173 1174 1175 1176 1177 1178 1179 1180 1181 1182 1183 1184 1185 1186 1187 1188 1189 1190 1191 1192 1193 1194 1195 1196 1197 1198 1199 1200 1201 1202 1203 1204 1205 1206 1207 1208 1209 1210 1211 1212 1213 1214 1215 1216 1217 1218 1219 1220 1221 1222 1223 1224 1225 1226 1227 1228 1229 1230 1231 1232 1233 1234 1235 1236 1237 1238 1239 1240 1241 1242 1243 1244 1245 1246 1247 1248 1249 1250 1251 1252 1253 1254 1255 1256 1257 1258 1259 1260 1261 1262 1263 1264 1265 1266 1267 1268 1269 1270 1271 1272 1273 1274 1275 1276 1277 1278 1279 1280 1281 1282 1283 1284 1285 1286 1287 1288 1289 1290 1291 1292 1293 1294 1295 1296 1297 1298 1299 1300 1301 1302 1303 1304 1305 1306 1307 1308 1309 1310 1311 1312 1313 1314 1315 1316 1317 1318 1319 1320 1321 1322 1323 1324 1325 1326 1327 1328 1329 1330 1331 1332 1333 1334 1335 1336 1337 1338 1339 1340 1341 1342 1343 1344 1345 1346 1347 1348 1349 1350 1351 1352 1353 1354 1355 1356 1357 1358 1359 1360 1361 1362 1363 1364 1365 1366 1367 1368 1369 1370 1371 1372 1373 1374 1375 1376 1377 1378 1379 1380 1381 1382 1383 1384 1385 1386 1387 1388 1389 1390 1391 1392 1393 1394 1395 1396 1397 1398 1399 1400 1401 1402 1403 1404 1405 1406 1407 1408 1409 1410 1411 1412 1413 1414 1415 1416 1417 1418 1419 1420 1421 1422 1423 1424 1425 1426 1427 1428 1429 1430 1431 1432 1433 1434 1435 1436 1437 1438 1439 1440 1441 1442 1443 1444 1445 1446 1447 1448 1449 1450 1451 1452 1453 1454 1455 1456 1457 1458 1459 1460 1461 1462 1463 1464 1465 1466 1467 1468 1469 1470 1471 1472 1473 1474 1475 1476 1477 1478 1479 1480 1481 1482 1483 1484 1485 1486 1487 1488 1489 1490 1491 1492 1493 1494 1495 1496 1497 1498 1499 1500 1501 1502 1503 1504 1505 1506 1507 1508 1509 1510 1511 1512 1513 1514 1515 1516 1517 1518 1519
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>status</th>\n",
       "      <th>chemicalformula</th>\n",
       "      <th>job</th>\n",
       "      <th>subjob</th>\n",
       "      <th>projectpath</th>\n",
       "      <th>project</th>\n",
       "      <th>timestart</th>\n",
       "      <th>timestop</th>\n",
       "      <th>totalcputime</th>\n",
       "      <th>computer</th>\n",
       "      <th>hamilton</th>\n",
       "      <th>hamversion</th>\n",
       "      <th>parentid</th>\n",
       "      <th>masterid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>212</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>full</td>\n",
       "      <td>/full</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:34.816533</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>213</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>basic</td>\n",
       "      <td>/basic</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:47.281055</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>214</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>Al_basic_atomicrex</td>\n",
       "      <td>/Al_basic_atomicrex</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 02:00:15.887059</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>227</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode1</td>\n",
       "      <td>/fit_mode1</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:22.251691</td>\n",
       "      <td>2022-07-21 08:18:28.328624</td>\n",
       "      <td>6.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>228</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode2</td>\n",
       "      <td>/fit_mode2</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:30.158483</td>\n",
       "      <td>2022-07-21 08:22:02.862973</td>\n",
       "      <td>212.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>227.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>229</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode3</td>\n",
       "      <td>/fit_mode3</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:22:04.136475</td>\n",
       "      <td>2022-07-21 08:22:11.189763</td>\n",
       "      <td>7.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>228.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>230</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_0</td>\n",
       "      <td>/job_a_3_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:12.207784</td>\n",
       "      <td>2022-07-21 08:22:15.498472</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>231</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_167</td>\n",
       "      <td>/job_a_3_167</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:15.639460</td>\n",
       "      <td>2022-07-21 08:22:15.965042</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>232</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_333</td>\n",
       "      <td>/job_a_3_333</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.102696</td>\n",
       "      <td>2022-07-21 08:22:16.395679</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>233</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_5</td>\n",
       "      <td>/job_a_3_5</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.572264</td>\n",
       "      <td>2022-07-21 08:22:16.831195</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>234</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_667</td>\n",
       "      <td>/job_a_3_667</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.975377</td>\n",
       "      <td>2022-07-21 08:22:17.282200</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>235</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_833</td>\n",
       "      <td>/job_a_3_833</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.434798</td>\n",
       "      <td>2022-07-21 08:22:17.694902</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>236</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_4_0</td>\n",
       "      <td>/job_a_4_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.828141</td>\n",
       "      <td>2022-07-21 08:22:18.089258</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>237</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>data_lithium</td>\n",
       "      <td>/data_lithium</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-07-21 08:24:51.905888</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     id    status chemicalformula                 job               subjob  \\\n",
       "0   212  finished            None                full                /full   \n",
       "1   213  finished            None               basic               /basic   \n",
       "2   214  finished            None  Al_basic_atomicrex  /Al_basic_atomicrex   \n",
       "3   227  finished            None           fit_mode1           /fit_mode1   \n",
       "4   228  finished            None           fit_mode2           /fit_mode2   \n",
       "5   229  finished            None           fit_mode3           /fit_mode3   \n",
       "6   230  finished              Li           job_a_3_0           /job_a_3_0   \n",
       "7   231  finished              Li         job_a_3_167         /job_a_3_167   \n",
       "8   232  finished              Li         job_a_3_333         /job_a_3_333   \n",
       "9   233  finished              Li           job_a_3_5           /job_a_3_5   \n",
       "10  234  finished              Li         job_a_3_667         /job_a_3_667   \n",
       "11  235  finished              Li         job_a_3_833         /job_a_3_833   \n",
       "12  236  finished              Li           job_a_4_0           /job_a_4_0   \n",
       "13  237  finished            None        data_lithium        /data_lithium   \n",
       "\n",
       "   projectpath  \\\n",
       "0         None   \n",
       "1         None   \n",
       "2         None   \n",
       "3         None   \n",
       "4         None   \n",
       "5         None   \n",
       "6         None   \n",
       "7         None   \n",
       "8         None   \n",
       "9         None   \n",
       "10        None   \n",
       "11        None   \n",
       "12        None   \n",
       "13        None   \n",
       "\n",
       "                                                                                                                          project  \\\n",
       "0               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "1               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "2               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "3   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "4   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "5   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "6     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "7     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "8     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "9     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "10    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "11    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "12    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "13              /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "\n",
       "                    timestart                   timestop  totalcputime  \\\n",
       "0  2022-06-03 00:02:34.816533                        NaT           NaN   \n",
       "1  2022-06-03 00:02:47.281055                        NaT           NaN   \n",
       "2  2022-06-03 02:00:15.887059                        NaT           NaN   \n",
       "3  2022-07-21 08:18:22.251691 2022-07-21 08:18:28.328624           6.0   \n",
       "4  2022-07-21 08:18:30.158483 2022-07-21 08:22:02.862973         212.0   \n",
       "5  2022-07-21 08:22:04.136475 2022-07-21 08:22:11.189763           7.0   \n",
       "6  2022-07-21 08:22:12.207784 2022-07-21 08:22:15.498472           3.0   \n",
       "7  2022-07-21 08:22:15.639460 2022-07-21 08:22:15.965042           0.0   \n",
       "8  2022-07-21 08:22:16.102696 2022-07-21 08:22:16.395679           0.0   \n",
       "9  2022-07-21 08:22:16.572264 2022-07-21 08:22:16.831195           0.0   \n",
       "10 2022-07-21 08:22:16.975377 2022-07-21 08:22:17.282200           0.0   \n",
       "11 2022-07-21 08:22:17.434798 2022-07-21 08:22:17.694902           0.0   \n",
       "12 2022-07-21 08:22:17.828141 2022-07-21 08:22:18.089258           0.0   \n",
       "13 2022-07-21 08:24:51.905888                        NaT           NaN   \n",
       "\n",
       "          computer           hamilton hamversion  parentid masterid  \n",
       "0   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "1   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "2   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "3   pyiron@lap2p#1          RunnerFit        0.4       NaN     None  \n",
       "4   pyiron@lap2p#1          RunnerFit        0.4     227.0     None  \n",
       "5   pyiron@lap2p#1          RunnerFit        0.4     228.0     None  \n",
       "6   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "7   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "8   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "9   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "10  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "11  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "12  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "13  pyiron@lap2p#1  TrainingContainer        0.4       NaN     None  "
      ]
     },
     "execution_count": 9,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pr.job_table()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "For comparison, here is the energy-volume curve from before, overlayed with the structures in the reduced dataset."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "image/png": "\n",
      "text/plain": [
       "<Figure size 1008x432 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
    "fig, ax = plt.subplots(figsize=(14, 6))\n",
    "data_full.plot.energy_volume()\n",
    "data_lithium.plot.energy_volume()\n",
    "plt.legend(['Full dataset', 'Lithium'])\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "As you can see, we have selected a very small part of the dataset for our demonstration (176 of ~4000 structures). Nevertheless, the following chapters will demonstrate all the relevant RuNNer concepts to create a similar potential with more training data. "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "pyiron and the RuNNer Fortran program communicate via a custom job type called `RunnerFit`. Here, we add a new job to the project via `create_job` and give it the name `fit_data_lithium`."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "pr_fit = pr.create_group('fit_lithium')"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "mode1 = pr_fit.create.job.RunnerFit('fit_mode1', delete_existing_job=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Every `RunnerFit` job is initialized with a sensible choice of input parameters for RuNNer (`parameters`) and an empty storage for training structures (`training_data`). This information can easily be accessed through the `input` property.  "
   ]
  },
  {
   "cell_type": "code",
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "application/json": {
       "parameters": {
        "bond_threshold": "0.5",
        "center_symmetry_functions": "True",
        "cutoff_type": "1",
        "elements": "None",
        "epochs": "30",
        "global_activation_short": "[['t', 't', 'l']]",
        "global_hidden_layers_short": "2",
        "global_nodes_short": "[[15, 15]]",
        "kalman_lambda_short": "0.98",
        "kalman_nue_short": "0.9987",
        "mix_all_points": "True",
        "nguyen_widrow_weights_short": "True",
        "nn_type_short": "1",
        "number_of_elements": "0",
        "optmode_charge": "1",
        "optmode_short_energy": "1",
        "optmode_short_force": "1",
        "points_in_memory": "1000",
        "precondition_weights": "True",
        "repeated_energy_update": "True",
        "runner_mode": "1",
        "scale_symmetry_functions": "True",
        "short_energy_error_threshold": "0.1",
        "short_energy_fraction": "1.0",
        "short_force_error_threshold": "1.0",
        "short_force_fraction": "0.1",
        "symfunction_short": "[]",
        "test_fraction": "0.1",
        "use_old_weights_charge": "False",
        "use_old_weights_short": "False",
        "use_short_forces": "True",
        "use_short_nn": "True",
        "write_weights_epoch": "5"
       },
       "training_data": "<pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingStorage object at 0x7ff3c9504460>"
      },
      "text/html": [
       "<pre>DataContainer({\n",
       "  \"parameters\": {\n",
       "    \"runner_mode\": \"1\",\n",
       "    \"symfunction_short\": \"[]\",\n",
       "    \"elements\": \"None\",\n",
       "    \"number_of_elements\": \"0\",\n",
       "    \"bond_threshold\": \"0.5\",\n",
       "    \"nn_type_short\": \"1\",\n",
       "    \"use_short_nn\": \"True\",\n",
       "    \"optmode_charge\": \"1\",\n",
       "    \"optmode_short_energy\": \"1\",\n",
       "    \"optmode_short_force\": \"1\",\n",
       "    \"points_in_memory\": \"1000\",\n",
       "    \"scale_symmetry_functions\": \"True\",\n",
       "    \"cutoff_type\": \"1\",\n",
       "    \"test_fraction\": \"0.1\",\n",
       "    \"use_short_forces\": \"True\",\n",
       "    \"epochs\": \"30\",\n",
       "    \"kalman_lambda_short\": \"0.98\",\n",
       "    \"kalman_nue_short\": \"0.9987\",\n",
       "    \"mix_all_points\": \"True\",\n",
       "    \"nguyen_widrow_weights_short\": \"True\",\n",
       "    \"repeated_energy_update\": \"True\",\n",
       "    \"short_energy_error_threshold\": \"0.1\",\n",
       "    \"short_energy_fraction\": \"1.0\",\n",
       "    \"short_force_error_threshold\": \"1.0\",\n",
       "    \"short_force_fraction\": \"0.1\",\n",
       "    \"use_old_weights_charge\": \"False\",\n",
       "    \"use_old_weights_short\": \"False\",\n",
       "    \"write_weights_epoch\": \"5\",\n",
       "    \"center_symmetry_functions\": \"True\",\n",
       "    \"precondition_weights\": \"True\",\n",
       "    \"global_activation_short\": \"[['t', 't', 'l']]\",\n",
       "    \"global_hidden_layers_short\": \"2\",\n",
       "    \"global_nodes_short\": \"[[15, 15]]\"\n",
       "  },\n",
       "  \"training_data\": \"<pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingStorage object at 0x7ff3c9504460>\"\n",
       "})</pre>"
      ],
      "text/plain": [
       "DataContainer({'parameters': DataContainer({'runner_mode': 1, 'symfunction_short': [], 'elements': None, 'number_of_elements': 0, 'bond_threshold': 0.5, 'nn_type_short': 1, 'use_short_nn': True, 'optmode_charge': 1, 'optmode_short_energy': 1, 'optmode_short_force': 1, 'points_in_memory': 1000, 'scale_symmetry_functions': True, 'cutoff_type': 1, 'test_fraction': 0.1, 'use_short_forces': True, 'epochs': 30, 'kalman_lambda_short': 0.98, 'kalman_nue_short': 0.9987, 'mix_all_points': True, 'nguyen_widrow_weights_short': True, 'repeated_energy_update': True, 'short_energy_error_threshold': 0.1, 'short_energy_fraction': 1.0, 'short_force_error_threshold': 1.0, 'short_force_fraction': 0.1, 'use_old_weights_charge': False, 'use_old_weights_short': False, 'write_weights_epoch': 5, 'center_symmetry_functions': True, 'precondition_weights': True, 'global_activation_short': [['t', 't', 'l']], 'global_hidden_layers_short': 2, 'global_nodes_short': [[15, 15]]}), 'training_data': <pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingStorage object at 0x7ff3c9504460>})"
      ]
     },
     "execution_count": 13,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "mode1.input"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Here, we will only explain the global keywords which are relevant for all modes. The remaining keywords will be explained in following chapters. If a keyword is not specified on the pyiron-side, the **RuNNer** Fortran program uses default values, if possible.\n",
    "For a more detailed explanation of all RuNNer keywords, take a look at [the RuNNer documentation](https://theochemgoettingen.gitlab.io/RuNNer)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "| Keyword | Default | Description |\n",
    "|---------|---------|:-------------|\n",
    "|runner_mode | 1| Choose the operating mode of RuNNer.\n",
    "|symfunction_short | empty `SymmetryFunctionSet` |Specification of the symmetry functions for a specific element with a specific neighbor element combination for the short-range NN.\n",
    "|elements | None| The element symbols of all elements in the system in arbitrary order. The number of specified elements must fit to the value of the keyword number_of_elements. Will be automatically set by `pyiron`.\n",
    "|number_of_elements | 0| Specify the number of chemical elements in the system. Will be automatically set by `pyiron`.\n",
    "|bond_threshold | 0.5| Threshold for the shortest bond in the structure in Bohr units. If a shorter bond occurs RuNNer will stop with an error message in runner_mode 2 and 3. In runner_mode 1 the structure will be eliminated from the data set.\n",
    "|nn_type_short | 1| Specify the NN type of the short-range part (atomic or pair-based energy expression).\n",
    "|use_short_nn | True| Use the a short range NN. \n",
    "|points_in_memory | 1000| This keyword controls memory consumption and IO and is therefore important to achieve an optimum performance of RuNNer. Has a different meaning depending on the current runner_mode.\n",
    "|use_short_forces | True| Use forces for fitting the short range NN weights."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "As of yet, this job does not have a training dataset. However, as you already saw for the EAM potential, adding a new training dataset to the job is as simple as calling `add_training_data`:"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "176"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# The if-conditions prevents you from accidentally adding the same\n",
    "# structures twice to the training dataset.\n",
    "if len(mode1.training_data) == 0:\n",
    "    mode1.add_training_data(data_lithium)\n",
    "\n",
    "len(mode1.training_data)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "By calling `add_training_data` multiple times, it is very easy to combine several independent training datasets for one fit."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "#### Specification of RuNNer Parameters\n",
    "\n",
    "While many of the default parameters in `RunnerFit` are suited for a wide range of calculations, you still need to carefully check each of them before starting a fit. Special attention must be given to the atom-centered symmetry functions as they have to be tailored to the system under investigation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "`RunnerFit` builds on the RuNNer ASE interface which is publicly available as `runnerase`. For further information check out the [runnerase documentation](https://runner-suite.gitlab.io/runnerase).\n",
    "\n",
    "`runnerase` provides a `generate_symmetryfunction` procedure, which makes it easy to tailor SFs to the corresponding training dataset. You can either generate radial SFs by setting `sftype=2`, or angular SFs with `sftype=3`. The algorithm will then automatically generate one set of SFs for each pair or each triplet of elements in the dataset. The number of SFs per set can be specified with the `amount` argument. Finally, `generate_symmetryfunctions` comes with several different `algorithm`s for choosing the parameters of the SFs:\n",
    "* **Radial SFs:** One can choose from `algorithm=turn` and `algorithm=half`. `turn` will choose the SF coefficients such that the turning points of all SFs are equally spaced between the cutoff radius and the minimum distance between any given element pair. `half` will make sure that the SFs are equally spaced at $f(G) = 0.5$.\n",
    "* **Angular SFs:** One can choose from `algorithm=turn`, `algorithm=half`, or `algorithm=literature`. While the first two algorithms behave similarly to the ones for radial SF, `literature` will return SF coefficients that have proven to be a reliable choice for most systems in previous publications.\n",
    "\n",
    "In most RuNNer-related publications, a combination of `algorithm=turn` for radial SFs and `algorithm=literature` for angular SFs has been used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "However, `runnerase` does not operate on pyiron objects like the `TrainingContainer`. Therefore, we transform the `TrainingContainer` with our lithium structures into a List of ASE Atoms objects. The `container_to_ase` function is defined in pyiron_contrib. "
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [],
   "source": [
    "dataset = container_to_ase(data_lithium)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In the following, we have prepared two interactive plotting functions for you, so you can try out different parameters for the SFs."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "a16558439a0c451d8e6fea4a851c3607",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(IntSlider(value=6, description='amount', max=12), Dropdown(description='algorithm', opti…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "@widgets.interact(amount=(0, 12, 1), algorithm=['turn', 'half'],\n",
    "                  cutoff_function=[True, False], show_legend=[True, False])\n",
    "def update(amount=6, algorithm='turn', cutoff_function=True, show_legend=False):\n",
    "    # Clear the plot.\n",
    "    plt.clf()\n",
    "    ax = plt.gca()\n",
    "    \n",
    "    # Generate the symmetry functions.\n",
    "    radials = generate_symmetryfunctions(dataset, sftype=2, algorithm=algorithm,\n",
    "                                         cutoff=12.0, amount=amount)\n",
    "    radials.plot.radial(cutoff_function=cutoff_function, show_legend=show_legend, axes=ax)\n",
    "    plt.show()"
   ]
  },
  {
   "cell_type": "code",
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "application/vnd.jupyter.widget-view+json": {
       "model_id": "10565efc82154d67bcdad4d31fc3ac29",
       "version_major": 2,
       "version_minor": 0
      },
      "text/plain": [
       "interactive(children=(IntSlider(value=4, description='amount', max=12), Dropdown(description='algorithm', inde…"
      ]
     },
     "metadata": {},
     "output_type": "display_data"
    }
   ],
   "source": [
    "@widgets.interact(amount=(0, 12, 1), algorithm=['half', 'turn', 'literature'],\n",
    "                  show_legend=[True, False])\n",
    "def update(amount = 4, algorithm='literature', show_legend=False):\n",
    "    plt.clf()\n",
    "    ax = plt.gca()\n",
    "    angulars = generate_symmetryfunctions(dataset, sftype=3, amount=amount,\n",
    "                                          algorithm=algorithm, cutoff=12.0)\n",
    "    angulars.plot.angular(axes=ax, show_legend=show_legend)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Adding SFs to the Job"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The parameter `mode1.parameters.symfunction_short` is, in fact, a so-called `SymmetryFunctionSet` object. They function similar to a folder: you can either store `SymmetryFunction` objects directly (= a file), or create more `SymmetryFunctionSet`s in them (= another folder).\n",
    "\n",
    "When `generate_symmetryfunctions` is called, it returns a `SymmetryFunctionSet` itself. Two `SymmetryFunctionSet`s can easily be combined using the `+` operator. This way, we can add a collection of radial symmetry functions to our job."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "outputs": [],
   "source": [
    "# Reset the symmetry function container.\n",
    "mode1.parameters.symfunction_short.reset()\n",
    "\n",
    "# Generate radial symmetry functions.\n",
    "radials = generate_symmetryfunctions(dataset, sftype=2, algorithm='half',\n",
    "                                     cutoff=12.0)\n",
    "\n",
    "mode1.parameters.symfunction_short += radials"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "All SFs stored in `radials` can be accessed through its `storage` property. As you can see, `radials` essentially organizes a list of SFs in a convenient storage format."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.0, 0.0]),\n",
       " SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.004675055980246072, 0.0]),\n",
       " SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.010843416275634649, 0.0]),\n",
       " SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.01939424193215976, 0.0]),\n",
       " SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.03192971575337408, 0.0]),\n",
       " SymmetryFunction(sftype=2, cutoff=12.0, elements=['Li', 'Li'], coefficients=[0.05159916711157465, 0.0])]"
      ]
     },
     "execution_count": 19,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Similarly, we can generate a set of angular symmetry functions and add them to the job as well."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "# Generate angular symmetry functions.                                                                                                                                                                                                                        \n",
    "angulars = generate_symmetryfunctions(dataset, sftype=3, amount=4,\n",
    "                                      algorithm='literature', cutoff=12.0)\n",
    "\n",
    "mode1.parameters.symfunction_short += angulars"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "outputs": [
    {
     "data": {
      "text/plain": [
       "[SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, -1.0, 1]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, -1.0, 2]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, -1.0, 4]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, -1.0, 8]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, 1.0, 1]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, 1.0, 2]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, 1.0, 4]),\n",
       " SymmetryFunction(sftype=3, cutoff=12.0, elements=['Li', 'Li', 'Li'], coefficients=[0.0, 1.0, 8])]"
      ]
     },
     "execution_count": 21,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "When we look at the `input` of the job again, you will find that all symmetry functions appear."
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [
    {
     "data": {
      "application/json": {
       "parameters": {
        "bond_threshold": "0.5",
        "center_symmetry_functions": "True",
        "cutoff_type": "1",
        "elements": "None",
        "epochs": "30",
        "global_activation_short": "[['t', 't', 'l']]",
        "global_hidden_layers_short": "2",
        "global_nodes_short": "[[15, 15]]",
        "kalman_lambda_short": "0.98",
        "kalman_nue_short": "0.9987",
        "mix_all_points": "True",
        "nguyen_widrow_weights_short": "True",
        "nn_type_short": "1",
        "number_of_elements": "0",
        "optmode_charge": "1",
        "optmode_short_energy": "1",
        "optmode_short_force": "1",
        "points_in_memory": "1000",
        "precondition_weights": "True",
        "repeated_energy_update": "True",
        "runner_mode": "1",
        "scale_symmetry_functions": "True",
        "short_energy_error_threshold": "0.1",
        "short_energy_fraction": "1.0",
        "short_force_error_threshold": "1.0",
        "short_force_fraction": "0.1",
        "symfunction_short": "[('Li', 2, 'Li', 0.0, 0.0, 12.0), ('Li', 2, 'Li', 0.004675055980246072, 0.0, 12.0), ('Li', 2, 'Li', 0.010843416275634649, 0.0, 12.0), ('Li', 2, 'Li', 0.01939424193215976, 0.0, 12.0), ('Li', 2, 'Li', 0.03192971575337408, 0.0, 12.0), ('Li', 2, 'Li', 0.05159916711157465, 0.0, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 1, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 2, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 4, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 8, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 1, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 2, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 4, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 8, 12.0)]",
        "test_fraction": "0.1",
        "use_old_weights_charge": "False",
        "use_old_weights_short": "False",
        "use_short_forces": "True",
        "use_short_nn": "True",
        "write_weights_epoch": "5"
       },
       "training_data": "<pyiron_contrib.atomistics.atomistics.job.trainingcontainer.TrainingStorage object at 0x7ff3c9504460>"
      },
      "text/html": [
       "<pre>DataContainer({\n",
       "  \"parameters\": {\n",
       "    \"runner_mode\": \"1\",\n",
       "    \"symfunction_short\": \"[('Li', 2, 'Li', 0.0, 0.0, 12.0), ('Li', 2, 'Li', 0.004675055980246072, 0.0, 12.0), ('Li', 2, 'Li', 0.010843416275634649, 0.0, 12.0), ('Li', 2, 'Li', 0.01939424193215976, 0.0, 12.0), ('Li', 2, 'Li', 0.03192971575337408, 0.0, 12.0), ('Li', 2, 'Li', 0.05159916711157465, 0.0, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 1, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 2, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 4, 12.0), ('Li', 3, 'Li', 'Li', 0.0, -1.0, 8, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 1, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 2, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 4, 12.0), ('Li', 3, 'Li', 'Li', 0.0, 1.0, 8, 12.0)]\",\n",
       "    \"elements\": \"None\",\n",
       "    \"number_of_elements\": \"0\",\n",
       "    \"bond_threshold\": \"0.5\",\n",
       "    \"nn_type_short\": \"1\",\n",
       "    \"use_short_nn\": \"True\",\n",
       "    \"optmode_charge\": \"1\",\n",
       "    \"optmode_short_energy\": \"1\",\n",
       "    \"optmode_short_force\": \"1\",\n",
       "    \"points_in_memory\": \"1000\",\n",
Loading
Loading full blame...