Training a Predictor on hydrogen adsorption energies

In this tutorial we are going to show how to use the learning tools within AutoCat to train a regressor that can predict adsorption energies of hydrogen on a set of single-atom alloys.

Creating a `DesignSpace`

Let's start by creating a DesignSpace. Normally each of these structures would be optimized via DFT, but for demo purposes we'll use the generated structures directly. First we need to generate the single-atom alloys. Here, we can use AutoCat's generate_saa_structures function.

>>> # Generate the clean single-atom alloy structures
>>> from autocat.saa import generate_saa_structures
>>> from autocat.utils import flatten_structures_dict
>>> saa_struct_dict = generate_saa_structures(
...     ["Fe", "Cu", "Au"],
...     ["Pt", "Pd", "Ni"],
...     facets={"Fe":["110"], "Cu":["111"], "Au":["111"]},
...     n_fixed_layers=2,
... )
>>> saa_structs = flatten_structures_dict(saa_struct_dict)

Now that we have the clean structures, let's adsorb hydrogen on the surface. For convenience let's place H at the origin instead of considering all symmetry sites. To accomplish this we can make use of AutoCat's place_adsorbate function.

>>> # Adsorb hydrogen onto each of the generated SAA surfaces
>>> from autocat.adsorption import place_adsorbate
>>> from ase import Atoms
>>> ads_structs = []
>>> for clean_struct in saa_structs:
...     ads_struct = place_adsorbate(clean_struct, Atoms("H"))
...     ads_structs.append(ads_struct)

This has collected all of the single-atom alloys with hydrogen adsorbed into a single list of ase.Atoms objects, ads_structs. Ideally at this stage we'd have adsorption energies for each of the generated structures after relaxation. As a proxy in this demo we'll create random labels, but this should be adsorption energies if you want to train a meaningful Predictor!

>>> # Generate the labels for each structure
>>> import numpy as np
>>> labels = np.random.uniform(-1.5,1.5,size=len(ads_structs))

Finally, using both our structures and labels we can define a DesignSpace. In practice, if any of the labels for a structure are unknown, it can be included as a numpy.nan

>>> from autocat.learning.sequential import DesignSpace
>>> design_space = DesignSpace(ads_structs, labels)
>>> design_space
+-------------------------+-------------------------------------------+
|                         |                DesignSpace                |
+-------------------------+-------------------------------------------+
|    total # of systems   |                     9                     |
| # of unlabelled systems |                     0                     |
|  unique species present | ['Fe', 'H', 'Pt', 'Pd', 'Ni', 'Cu', 'Au'] |
|      maximum label      |             1.0173326963281424            |
|      minimum label      |            -1.4789390894451206            |
+-------------------------+-------------------------------------------+

Setting up a `Predictor`

When setting up our Predictor we now have two choices to make:

The technique to be used for featurizing the systems
The regression model to be used for training and predictions

Internally, the Predictor will contain a Featurizer object (that the user supplies) which stores all of our choices for how to featurize the systems. Our choice of featurizer class and the associated kwargs are specified via the featurizer_class and kwargs arguments, respectively. By providing the design space structures some of the kwargs related to the featurization (e.g. maximum structure size) can be automatically obtained.

Let's featurize the hydrogen environment via dscribe's SOAP class

>>> from autocat.learning.featurizers import Featurizer
>>> from dscribe.descriptors.soap import SOAP
>>> featurizer = Featurizer(
...     featurizer_class=SOAP,
...     kwargs={"rcut": 7.0, "nmax": 8, "lmax": 8},
...     design_space_structures=design_space.design_space_structures
... )
>>> featurizer
+-----------------------------------+-------------------------------------------+
|                                   |                 Featurizer                |
+-----------------------------------+-------------------------------------------+
|               class               |       dscribe.descriptors.soap.SOAP       |
|               kwargs              |    {'rcut': 7.0, 'nmax': 8, 'lmax': 8}    |
|            species list           | ['Fe', 'Ni', 'Pt', 'Pd', 'Au', 'Cu', 'H'] |
|       maximum structure size      |                     37                    |
|               preset              |                    None                   |
| design space structures provided? |                    True                   |
+-----------------------------------+-------------------------------------------+

Similarly, we can specify the regressor to be used. The class should be "sklearn-like" with fit and predict methods.

Here we will use sklearn's GaussianProcessRegressor for regression.

>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import RBF
>>> kernel = RBF(1.5)
>>> regressor = GaussianProcessRegressor(kernel=kernel)

Now that we have both our Featurizer and regressor, we can construct a Predictor object.

>>> from autocat.learning.predictors import Predictor
>>> predictor = Predictor(
...     regressor=regressor,
...     featurizer=featurizer,
... )
>>> predictor
+-----------+------------------------------------------------------------------+
|           |                            Predictor                             |
+-----------+------------------------------------------------------------------+
| regressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
|  is fit?  |                              False                               |
+-----------+------------------------------------------------------------------+
+-----------------------------------+-------------------------------------------+
|                                   |                 Featurizer                |
+-----------------------------------+-------------------------------------------+
|               class               |       dscribe.descriptors.soap.SOAP       |
|               kwargs              |    {'rcut': 7.0, 'nmax': 8, 'lmax': 8}    |
|            species list           | ['Fe', 'Ni', 'Pt', 'Pd', 'Au', 'Cu', 'H'] |
|       maximum structure size      |                     37                    |
|               preset              |                    None                   |
| design space structures provided? |                    True                   |
+-----------------------------------+-------------------------------------------+

Training and making predictions

With our newly defined Predictor we can train it using data from our DesignSpace and the fit method. Again, please note we are using random labels here, solely for demonstration purposes.

>>> train_structures = design_space.design_space_structures[:5]
>>> train_labels = design_space.design_space_labels[:5]
>>> predictor.fit(train_structures, train_labels)

Making predictions is a similar process except using the predict method.

>>> test_structures = design_space.design_space_structures[5:]
>>> predicted_labels = predictor.predict(test_structures)

In this example, since we already have the labels for the test structures, we can also use the score method to calculate a prediction score.

>>> test_labels = design_space.design_space_labels[5:]
>>> mae = predictor.score(test_structures, test_labels)

Training a Predictor on hydrogen adsorption energies

Creating a DesignSpace

Setting up a Predictor

Training and making predictions

Creating a `DesignSpace`

Setting up a `Predictor`