handson.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "![Logo](figures/GOE_Logo_Quer_IPC_Farbe_RGB.png)\n",
    "# High-Dimensional Neural Network Potentials\n",
    "**[Alexander L. M. Knoll](mailto:aknoll@chemie.uni-goettingen.de), and [Moritz R. Schäfer](mailto:moritzrichard.schaefer@uni-goettingen.de)** \n",
    "\n",
    "Behler Group, Theoretische Chemie, Institut für Physikalische Chemie\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "For this tutorial it is intended to use the RuNNer release version 1.2, available in conda-forge. \n",
    "The most recent version of RuNNer is [hosted on Gitlab](https://gitlab.com/TheochemGoettingen/RuNNer). For access please contact Prof. Dr. Jörg Behler (joerg.behler@uni-goettingen.de)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {
    "scrolled": true,
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "name": "stderr",
     "output_type": "stream",
     "text": [
      "/home/kalm/.pyenv/versions/pyiron/lib/python3.10/site-packages/paramiko/transport.py:236: CryptographyDeprecationWarning: Blowfish has been deprecated\n",
      "  \"class\": algorithms.Blowfish,\n"
     ]
    }
   ],
   "source": [
    "import numpy as np\n",
    "\n",
    "import matplotlib.pyplot as plt\n",
    "import ipympl\n",
    "import ipywidgets as widgets\n",
    "\n",
    "from pyiron_atomistics import Project\n",
    "from pyiron_contrib.atomistics.runner.job import RunnerFit\n",
    "from pyiron_contrib.atomistics.runner.utils import container_to_ase\n",
    "\n",
    "from ase.geometry import get_distances\n",
    "\n",
    "from runnerase import generate_symmetryfunctions\n",
    "\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Background\n",
    "### Architecture of an HDNNP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "**RuNNer** is a stand-alone Fortran program for the construction of high-dimensional neural network potentials (HDNNPs), written mainly by Jörg Behler. The central assumption made in constructing a HDNNP is that the total energy of the system $E_{\\mathrm{tot}}$ [can be separated into atomic contributions $E_i$](https://www.doi.org/10.1103/PhysRevLett.98.146401). HDNNP relates the local environment of the atoms to their atomic energies $E_i$, which contribute to the sum of all $N$ atomic energies, resulting in the total energy of the system $E_\\mathrm{tot}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "\\begin{align}\n",
    "E_\\mathrm{tot} = \\sum_{i}^{N}E_i\\notag\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Every atomic energy is described by an atomic neural network (NN), which is element-specific. The entirety of all atomic NNs composes a HDNNP, whose general architecture is shown below for a binary system."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "<img src=\"figures/2g.png\" class=\"center\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "As you can see, the atomic contributions in this model are predicted independently from each other. Therefore, the model can easily describe systems with differering numbers of atoms: adding or removing an atom corresponds to adding or removing a row in the figure shown above. This ability is what puts the \"high-dimensional\" into the name \"HDNNP\". \n",
    "\n",
    "Each atomic neural networks receives input information about the local atomic environment up to a certain cutoff radius $R_{\\mathrm{c}}$. This information is encoded based on the Cartesian coordinates in many-body descriptors, so-called [atom-centered symmetry functions (ACSF or just SF)](https://www.doi.org/10.1063/1.3553717). More details about this are shown below. For each atom, the values of multiple SFs compose a SF vector $G$ which is the input layer of the atomic NNs.\n",
    "\n",
    "Atomic NNs look like this:"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "<img src=\"figures/ann.png\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Every value in the SF vector $G$ serves as one piece of input information to the atomic NN. We refer to the circles in the figure as _nodes_ (from graph theory) or _neurons_ (from neural science). The information from the input nodes flows through the atomic NN from left to right: the input layer is followed by a configurable number of hidden layers which consist, in turn, of an arbitrary number of _hidden nodes_. At the end, all information is collected in the output layer, which in our case is interpreted as the atomic energy contribution of the atom under consideration. The input nodes and the hidden nodes in the first layer are connected by weights. Moreover, the hidden and output nodes carry a bias value.\n",
    "\n",
    "During training, the weights and biases are optimized using backpropagation to represent best the data in the training data set."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "### Symmetry Functions (SFs)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "SFs provide the input for the NN and describe the local atomic environment of each atom. In principle, one could also use Cartesian coordinates to capture the atomic positions in a structure. As Cartesian coordinates describe the absolute positions of atoms the numerical input to the atomic NNs would change with translation or rotation of the system. However, these actions do not influence the energy of the system and different numerical inputs belonging to the same NN output lead to large training errors.\n",
    "\n",
    "In contrast, SFs describe the relative positions of the atoms to each other and are hence translationally and rotationally invariant. We differentiate two types of SFs: radial SF depend on the distance between atom pairs and serve as a measure for their bond order. Angular SFs additionally depend on the interatomic angles."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Cutoff Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The cutoff function $f_{\\mathrm{c}}$ ensures that only the neighbors within one atomic environment counts towards the symmetry function values. The cutoff radius $R_\\mathrm{c}$ (usually $12\\,\\mathrm{bohr}$) defines how much of the local atomic environment is considered. All SFs and their derivatives will decrease to zero if the pairwise distance is larger than $R_\\mathrm{c}$. There are several cutoff funtions defined in **RuNNer** and we will use here"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "\\begin{align}\n",
    "    f_{c}(R_{ij}) = \n",
    "    \\begin{cases}\n",
    "    0.5 \\cdot [\\cos(\\pi x) + 1]& ~ \\text{for $R_{ij} \\leq R_\\mathrm{c}$},\\\\\n",
    "    0& ~ \\text{for $R_{ij} > R_\\mathrm{c}$}\n",
    "    \\end{cases}\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "with the atomic distance $R_{ij}$, the cutoff radius $R_\\mathrm{c}$, and $x = \\frac{R_{ij}}{R_{\\mathrm{c}}}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Take a look at the figure below for a graphical representation of the cutoff radius in a periodic system: the red atom is the central atom for which the SF values will be calculated. All yellow atoms lie within in the cutoff radius and will therefore contribute to the SF values."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/Rc.png\" width=\"500\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Radial Symmetry Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "To define the parameters for the radial SFs, it is important to know the shortest bond distance for each element combination in your data set. Usually, 5-6 radial SF are used for any element pair, with different $\\eta$ values to increase the resolution for structure description. It is possible to shift the maximum of the radial SF $G^2$ by $R_{s}$."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "\\begin{align}\n",
    "    G_{i}^{2} = \\sum_{j}^{}e^{-\\eta (R_{ij} - R_{s})^2} \\cdot f_{c}(R_{ij}).\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In most applications, the Gaussian exponents $\\eta$ for the radial SFs are chosen such that the SF turning points are equally distributed between the cutoff radius and specific minimum pairwise distance in the training dataset (small eta $\\eta$ = max. contraction). In RuNNer, you can either define element pair specific SF or define global SF which are used for every element combination. It is also possible to define different cutoff radii for the SF, even though this is rarely helpful and therefore not recommended.\n",
    "\n",
    "Below, you can see a graphical representation of the radial symmetry functions including the cutoff function for a cutoff radius of 12 Bohr."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/radials.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Angular Symmetry Functions"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The same rules apply to the angular SFs. Here, however, three atomic positions are included in the calculation."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "\\begin{align}\n",
    "    G_{i}^{3} = 2^{\\zeta - 1}\\sum_{j}^{} \\sum_{k}^{} \\left[( 1 + \\lambda \\cdot cos \\theta_{ijk})^{\\zeta} \\cdot e^{-\\eta (R_{ij}^2 + R_{ik}^2 + R_{jk}^2)} \\cdot f_{\\mathrm{c}}(R_{ij}) \\cdot f_{\\mathrm{c}}(R_{ik}) \\cdot f_{\\mathrm{c}}(R_{jk}) \\right]\n",
    "\\end{align}"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The angle $\\theta_{ijk} = \\frac{\\mathbf{R}_{ij} \\cdot \\mathbf{R}_{ik}}{R_{ij} \\cdot R_{ik}}$ is centered at atom $i$. For most system, we use permutations of $\\zeta = \\{1, 2, 4, 16\\}$, $\\eta = 0$, and $\\lambda$ = $\\{+1, -1\\}$. If many atoms of each element are present, angular SFs are usually not critical and a default set of SFs can be used."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/angulars.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "###  3rd and 4th Generation HDNNPs"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "Due to time limitations, we will only focus on the model described above which is known as the second generation of HDNNPs (see [here](https://www.doi.org/10.1103/PhysRevLett.98.146401), and [here](https://www.doi.org/10.1002/anie.201703114), and [here](https://www.doi.org/10.1002/qua.24890)). However, in recent years third- and fourth-generation HDNNPs were developed by the Behler group."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "#### Repetition: Second Generation HDNNP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<img src=\"figures/2g.png\" width=\"500\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    },
    "tags": []
   },
   "source": [
    "#### Third Generation HDNNP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In second-generation HDNNPs only atomic interactions inside the cutoff sphere are taken into account. The resulting short-ranged potentials are well-suited to describe local bonding even for complex atomic environments. However, it can be expected that for many systems long-range interactions, primarily electrostatics, will be important.\n",
    "\n",
    "To overcome those limitations, third-generation NNs (see [here](https://www.doi.org/10.1103/PhysRevB.83.153101), and [here](https://www.doi.org/10.1063/1.3682557)) define a second set of atomic neural networks to construct environment-dependent atomic charges. They can then be used to calculate the long-range electrostatic energy without truncation. The total energy of the system is then given by the sum of the short-range and the electrostatic energies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/3g.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "source": [
    "#### Fourth Generation HDNNP"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "While the use of environment-dependent charges is a clear step forward, they are not sufficient if long-range charge transfer is present.\n",
    "\n",
    "When dealing with systems where long-range charge transer is present, the usage of fourth-generation NNs (see [here](https://www.doi.org/10.1038/s41467-020-20427-2), and [here](https://www.doi.org/10.1021/acs.accounts.0c00689)) is recommended. Here, environment-dependent electronegativities $\\chi$ are computed first. They will then be used in a charge equilibration scheme to determine the atomic charges $Q$. Again, we can compute the electrostatic energy from this. Moreover, the atomic charges serve as an additional input neuron to train the short-range energy and forces. As it was the case for the third-generation NNs the total energy is then given by the sum of the short-range and the electrostatic energies."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "-"
    }
   },
   "source": [
    "<img src=\"figures/4g.png\" width=\"800\"/>"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "slide"
    }
   },
   "source": [
    "## Using RuNNer via the pyiron Interface "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "In general, training a HDNNP with **RuNNer** can be separated into three different stages - so-called modes - in which different types of calculation are performed."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "- **Mode 1:** calculation of the SF values and separation of the dataset into a training and testing set.\n",
    "- **Mode 2:** training of the model to construct the HDNNP.\n",
    "- **Mode 3:** prediction of energy and forces (stress and charges can also be predicted)."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "All these steps are performed consecutively beginning with mode 1."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "fragment"
    }
   },
   "source": [
    "### Data Preparation"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The creation of a meaningful neural network potential lives and dies with high quality training data. Therefore, we will begin by inspecting the full training dataset. \n",
    "The dataset has been stored prior to the workshop in form of a `TrainingContainer`. \n",
    "In pyiron, `TrainingContainer`s are jobs which take a set of structures and properties like energies, forces, ... and store them in HDF format.\n",
    "\n",
    "Go ahead and open up the project:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [],
   "source": [
    "pr = Project('../../introduction/training')"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The project already contains several jobs:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {
    "slideshow": {
     "slide_type": "skip"
    }
   },
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>id</th>\n",
       "      <th>status</th>\n",
       "      <th>chemicalformula</th>\n",
       "      <th>job</th>\n",
       "      <th>subjob</th>\n",
       "      <th>projectpath</th>\n",
       "      <th>project</th>\n",
       "      <th>timestart</th>\n",
       "      <th>timestop</th>\n",
       "      <th>totalcputime</th>\n",
       "      <th>computer</th>\n",
       "      <th>hamilton</th>\n",
       "      <th>hamversion</th>\n",
       "      <th>parentid</th>\n",
       "      <th>masterid</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>212</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>full</td>\n",
       "      <td>/full</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:34.816533</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>213</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>basic</td>\n",
       "      <td>/basic</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 00:02:47.281055</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>214</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>Al_basic_atomicrex</td>\n",
       "      <td>/Al_basic_atomicrex</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-06-03 02:00:15.887059</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>zora@cmti001#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>226</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>data_lithium</td>\n",
       "      <td>/data_lithium</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/</td>\n",
       "      <td>2022-07-21 08:18:19.193090</td>\n",
       "      <td>NaT</td>\n",
       "      <td>NaN</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>TrainingContainer</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>227</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode1</td>\n",
       "      <td>/fit_mode1</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:22.251691</td>\n",
       "      <td>2022-07-21 08:18:28.328624</td>\n",
       "      <td>6.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>5</th>\n",
       "      <td>228</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode2</td>\n",
       "      <td>/fit_mode2</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:18:30.158483</td>\n",
       "      <td>2022-07-21 08:22:02.862973</td>\n",
       "      <td>212.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>227.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>6</th>\n",
       "      <td>229</td>\n",
       "      <td>finished</td>\n",
       "      <td>None</td>\n",
       "      <td>fit_mode3</td>\n",
       "      <td>/fit_mode3</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/</td>\n",
       "      <td>2022-07-21 08:22:04.136475</td>\n",
       "      <td>2022-07-21 08:22:11.189763</td>\n",
       "      <td>7.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>RunnerFit</td>\n",
       "      <td>0.4</td>\n",
       "      <td>228.0</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>7</th>\n",
       "      <td>230</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_0</td>\n",
       "      <td>/job_a_3_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:12.207784</td>\n",
       "      <td>2022-07-21 08:22:15.498472</td>\n",
       "      <td>3.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>8</th>\n",
       "      <td>231</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_167</td>\n",
       "      <td>/job_a_3_167</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:15.639460</td>\n",
       "      <td>2022-07-21 08:22:15.965042</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>9</th>\n",
       "      <td>232</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_333</td>\n",
       "      <td>/job_a_3_333</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.102696</td>\n",
       "      <td>2022-07-21 08:22:16.395679</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>10</th>\n",
       "      <td>233</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_5</td>\n",
       "      <td>/job_a_3_5</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.572264</td>\n",
       "      <td>2022-07-21 08:22:16.831195</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>11</th>\n",
       "      <td>234</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_667</td>\n",
       "      <td>/job_a_3_667</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:16.975377</td>\n",
       "      <td>2022-07-21 08:22:17.282200</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>12</th>\n",
       "      <td>235</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_3_833</td>\n",
       "      <td>/job_a_3_833</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.434798</td>\n",
       "      <td>2022-07-21 08:22:17.694902</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>13</th>\n",
       "      <td>236</td>\n",
       "      <td>finished</td>\n",
       "      <td>Li</td>\n",
       "      <td>job_a_4_0</td>\n",
       "      <td>/job_a_4_0</td>\n",
       "      <td>None</td>\n",
       "      <td>/home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/</td>\n",
       "      <td>2022-07-21 08:22:17.828141</td>\n",
       "      <td>2022-07-21 08:22:18.089258</td>\n",
       "      <td>0.0</td>\n",
       "      <td>pyiron@lap2p#1</td>\n",
       "      <td>Lammps</td>\n",
       "      <td>0.1</td>\n",
       "      <td>NaN</td>\n",
       "      <td>None</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     id    status chemicalformula                 job               subjob  \\\n",
       "0   212  finished            None                full                /full   \n",
       "1   213  finished            None               basic               /basic   \n",
       "2   214  finished            None  Al_basic_atomicrex  /Al_basic_atomicrex   \n",
       "3   226  finished            None        data_lithium        /data_lithium   \n",
       "4   227  finished            None           fit_mode1           /fit_mode1   \n",
       "5   228  finished            None           fit_mode2           /fit_mode2   \n",
       "6   229  finished            None           fit_mode3           /fit_mode3   \n",
       "7   230  finished              Li           job_a_3_0           /job_a_3_0   \n",
       "8   231  finished              Li         job_a_3_167         /job_a_3_167   \n",
       "9   232  finished              Li         job_a_3_333         /job_a_3_333   \n",
       "10  233  finished              Li           job_a_3_5           /job_a_3_5   \n",
       "11  234  finished              Li         job_a_3_667         /job_a_3_667   \n",
       "12  235  finished              Li         job_a_3_833         /job_a_3_833   \n",
       "13  236  finished              Li           job_a_4_0           /job_a_4_0   \n",
       "\n",
       "   projectpath  \\\n",
       "0         None   \n",
       "1         None   \n",
       "2         None   \n",
       "3         None   \n",
       "4         None   \n",
       "5         None   \n",
       "6         None   \n",
       "7         None   \n",
       "8         None   \n",
       "9         None   \n",
       "10        None   \n",
       "11        None   \n",
       "12        None   \n",
       "13        None   \n",
       "\n",
       "                                                                                                                          project  \\\n",
       "0               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "1               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "2               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "3               /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/   \n",
       "4   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "5   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "6   /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/fit_lithium/   \n",
       "7     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "8     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "9     /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "10    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "11    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "12    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "13    /home/kalm/dcmnts/uni/promotion/work/workshop_bochum/handson_notebook/workshop_preparation/introduction/training/E_V_curve/   \n",
       "\n",
       "                    timestart                   timestop  totalcputime  \\\n",
       "0  2022-06-03 00:02:34.816533                        NaT           NaN   \n",
       "1  2022-06-03 00:02:47.281055                        NaT           NaN   \n",
       "2  2022-06-03 02:00:15.887059                        NaT           NaN   \n",
       "3  2022-07-21 08:18:19.193090                        NaT           NaN   \n",
       "4  2022-07-21 08:18:22.251691 2022-07-21 08:18:28.328624           6.0   \n",
       "5  2022-07-21 08:18:30.158483 2022-07-21 08:22:02.862973         212.0   \n",
       "6  2022-07-21 08:22:04.136475 2022-07-21 08:22:11.189763           7.0   \n",
       "7  2022-07-21 08:22:12.207784 2022-07-21 08:22:15.498472           3.0   \n",
       "8  2022-07-21 08:22:15.639460 2022-07-21 08:22:15.965042           0.0   \n",
       "9  2022-07-21 08:22:16.102696 2022-07-21 08:22:16.395679           0.0   \n",
       "10 2022-07-21 08:22:16.572264 2022-07-21 08:22:16.831195           0.0   \n",
       "11 2022-07-21 08:22:16.975377 2022-07-21 08:22:17.282200           0.0   \n",
       "12 2022-07-21 08:22:17.434798 2022-07-21 08:22:17.694902           0.0   \n",
       "13 2022-07-21 08:22:17.828141 2022-07-21 08:22:18.089258           0.0   \n",
       "\n",
       "          computer           hamilton hamversion  parentid masterid  \n",
       "0   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "1   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "2   zora@cmti001#1  TrainingContainer        0.4       NaN     None  \n",
       "3   pyiron@lap2p#1  TrainingContainer        0.4       NaN     None  \n",
       "4   pyiron@lap2p#1          RunnerFit        0.4       NaN     None  \n",
       "5   pyiron@lap2p#1          RunnerFit        0.4     227.0     None  \n",
       "6   pyiron@lap2p#1          RunnerFit        0.4     228.0     None  \n",
       "7   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "8   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "9   pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "10  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "11  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "12  pyiron@lap2p#1             Lammps        0.1       NaN     None  \n",
       "13  pyiron@lap2p#1             Lammps        0.1       NaN     None  "
      ]
     },
     "execution_count": 3,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pr.job_table()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "The training data is stored in the project node `initial`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 4,
   "metadata": {
    "slideshow": {
     "slide_type": "subslide"
    }
   },
   "outputs": [],
   "source": [
    "data_full = pr['basic']"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "slideshow": {
     "slide_type": "notes"
    }
   },
   "source": [
    "In order to get a feeling for the data, we inspect its energy-volume curve:"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 5,
   "metadata": {
    "scrolled": false,
    "slideshow": {
     "slide_type": "fragment"
    }