DeepErwin Tutorial


DeepErwin is a python3 package and has been tested on Linux and macOS.

Installation from source

To get the most up-to-date version of the code, we recommend to checkout our repository from github:

To install deeperwin and all its dependencies, go to the downloaded directory and run

pip install -e .

This will install the repository “in-place”, so you can make changes to the source code without having to reinstall the package. If you need CUDA support to run the JAX code on GPUs (recommended), additionally install the prepackaged jax[cuda] wheel:

pip install --upgrade "jax[cuda]" -f

Installation using pip

DeepErwin is also available as a pypi package, however note that we may not always have the latest version of our code on pypi:

pip install deeperwin

To install from source and being able to modify the package, go to the repository root (containig the file and install the package via:

pip install -e .

Note that you need to have python >= 3.8 and we recommend to install the source in a separate conda- or virtual-environment.

Running a simple calculation

To run a DeepErwin calculation, all configuration options must be specified in a YAML file, typically named config.yml. For all options that are not specified explicitly, sensible default values will be used. The default values are defined in :~deeperwin.configuration: and a full_config.yml will also be created for each calculation listing the full configuration.

The absolute minimum that must be specified in a config-file is the physical system that one is interested in, i.e. the positions and charges of the nuclei.

    R: [[0,0,0], [3.0,0,0]]
    Z: [3, 1]

By default, DeepErwin assumes a neutral, closed shell calculation, i.e. the number of electrons equals the total charge of all nuclei, and the number of spin-up electrons is equal to the number of spin-down electrons. For a system with an uneven number of electrons, it is assumed that there is one extra spin-up electron. To calculate charged or spin-polarized systems, simply state the total number of electrons and the total number of spin-up electrons, e.g.

    R: [[0,0,0], [3.0,0,0]]
    Z: [3, 1]
    n_electrons: 4
    n_up: 2

Additionally, you might want to specifiy settings for the CASSCF-baseline model: The number of active electrons and active orbitals.

    R: [[0,0,0], [3.0,0,0]]
    Z: [3, 1]
    n_electrons: 4
    n_up: 2
    n_cas_electrons: 2
    n_cas_orbitals: 4

For several small molecules (e.g. H2, LiH, Ethene, first and second row elements) we have predefined their geometries and spin-settings. Instead of setting all these parameters manually, you can just specify them using the tag physical: name:

    name: LiH

You can also partially overwrite settings, e.g. to calculate a modified geometry of a molecule. For example to calculate a streteched LiH molecule with a bond-length of 3.5 bohr use this configuration:

    name: LiH
    R: [[0,0,0],[3.5,0,0]]

To run an actual calculation, run the python package as an executable:

deeperwin run config.yml

This will combine your supplied configuration with default values for all other settings and dump it as full_config.yml. It will then run a calculation in the current directory, writing its output to the standard output and logfile.

You can also set-up factorial sweeps of config-options, by using `deeperwin setup` with the -p flag. The following call will set-up 12 subdirectories (4 molecules x 3 learning-rates) and start calculations for all of them. If you run this on a SLURM-cluster, the jobs will not be executed directly, but instead SLURM-jobs will be submitted for parallel computation.

deeperwin setup -p experiment_name my_sweep -p B C N O -p optimization.learning_rate 1e-3 2e-3 5e-3 config.yml

The code runs best on a GPU, but will in principle also work on a CPU. It will generate several output files, in particular containing:

  • GPU.out containing a detailed debug log of all steps of the calculation

  • full_config.yml containing all configuration options used for this calculation: Your provided options, as well as all default options. Take a look at this file to see all the available config options for DeepErwin

  • checkpoint files containing a compressed, pickled representation of all data (including history and model weights)

Major configuration options

To see a structure of all possible configuration options, take a look at the class Configuration which contains a full tree of all possible config options. Alternatively you can see the full configuration tree when looking at the full_config.yml file that is being generated at every run.

Here are some of the most important configuration options:

Major configuration options






Name of the molecule to be calculated, e.g. N2, CO, etc. For several small molecules this automatically popoulates the geometry, nuclear charges, electron number and spin

R, Z, n_electrons, n_up

Physical properties (e.g. geometry) of your system in atomic units (bohr)



Type of model to use, e.g. “dpe1” (arxiv:2105.08351), “dpe4” (arxiv:2205.09438), “ferminet”. This sets all model-related defaults and allows subsequent changes to be made from there.


Enable/disable a local coordinate system for each ion

features.use_distance_features, features.use_el_el_differences, features.use_el_ion_differences

Choose input features to be fed into embedding: Distances (scalar) and/or differences (3D vectors)

Type of embedding to use to use, e.g. “dpe1”, “dpe4”, “ferminet”


Number of embedding iterations (=embedding network depth)

embedding.n_hidden_one_el, embedding.n_hidden_one_el

For FermiNet, DeeepErwin: Number of hidden neurons in one-electron and two-electron streams


Number of determinants to use for building the wavefunction


Config-options related to FermiNet-like exponential envelope orbitals


Config-options related to PauliNet-like orbitals from a baseline calculation (e.g. Hartree-Fock)


Type of optimizer, e.g. “adam”, “rmsprop”, “kfac”, “kfac_adam”


Initial learning-rate during optimization. May be modified during optimization by the LR-schedule (optimization.schedule).


Number of epochs to train the wavefunction model. In each epoch all n_walkers walkers are updated using MCMC and then optimized batch-by-batch.

mcmc.n_walkers, mcmc.n_inter_steps, mcmc. …

Settings for Markov-Chain Monte Carlo (MCMC) sampling during wavefunction optimization. Analogous settings can be found within evaluation and pre_training.



Number of evaluation steps after the wavefunction optimization



Number of supervised pre-training steps to take before variational optimization


wandb.entity, wandb.project

When set, this enables logging of the experiment to Weights&Biases. Set logging.wandb=None to disable W&B-logging (default).



Number of GPUs to use for parallelization


Abort computation when no GPU is found, instead of computing on CPUs



Path to a directory containing a previously successfully finished wavefunction optimization to use as initializer for this experiment.

Optimization using weight-sharing

ATTENTION: The weight-sharing technique is currently not supported on the master branch. A fully functioning codebase for weight-sharing can be found under the “weight_sharing” branch.

When calculating wavefunctions for multiple related wavefunctions (e.g. for different geometries of the samemolecule), the naive approach would be to conduct independent wavefuntion optimiziations for each run. To do this you can set changes to the physical-configuration, to launch multiple independent experiments with the same configuration, but different physical systems.

    name: LiH
      - R: [[0,0,0],[3.0,0,0]]
        comment: "Equilibrium bond length"
      - R: [[0,0,0],[2.8,0,0]]
        comment: "Compressed molecule"
      - R: [[0,0,0],[3.2,0,0]]
        comment: "Stretched molecule"

As outlined in our arxiv publication, the optimization can be sped-up significantly when not optimizing all geometries independently, but sharing weights between them. This interdependent, weight-sharing optimization can be enabled be setting optimization.shared_optimization.use = True. To disable weight-sharing, simply set :code:`optimization.shared_optimization = None`(default).

    name: LiH
      - R: [[0,0,0],[3.0,0,0]]
        comment: "Equilibrium bond length"
      - R: [[0,0,0],[2.8,0,0]]
        comment: "Compressed molecule"
      - R: [[0,0,0],[3.2,0,0]]
        comment: "Stretched molecule"
        use: True