Predictors
In order to iterate a sequential learning pipeline,
a regressor is needed to select subsequent candidate systems.
For this purpose, there is the
Predictor
object class. This contains two key attributes:
- a regressor that can be fit to data and used for predictions
(the class provided must have
fit
andpredict
methods) - a
Featurizer
to be used for featurizing the structures. There are two currently implemented approaches, structure methods that featurize the entire structure (e.g.SineMatrix
,ElementProperty
) and adsorbate methods that featurize locally (e.g.SOAP
).
Generally, this predictor object behaves similarly to regressors found in
sklearn
with its own
fit
,
predict
,
and
score
methods.
As an example, let's train a random forest regressor on some single atom alloys.
>>> import numpy as np
>>> from dscribe.descriptors import SineMatrix
>>> from sklearn.ensemble import RandomForestRegressor
>>> from autocat.saa import generate_saa_structures
>>> from autocat.utils import flatten_structures_dict
>>> from autocat.learning.featurizers import Featurizer
>>> from autocat.learning.predictors import Predictor
>>> saa_dict = generate_saa_structures(["Cu", "Au", "Fe"], ["Pt", "Ru", "Ni"])
>>> saa_structs = flatten_structures_dict(saa_dict)
>>> labels = np.random.randint(1, size=(len(saa_structs) - 1))
>>> featurizer = Featurizer(
... featurizer_class=SineMatrix
... )
>>> regressor = RandomForestRegressor()
>>> acp = Predictor(
... regressor=regressor,
... featurizer=featurizer,
... )
>>> acp.fit(saa_structs[:-1], labels)
>>> acp
+-----------+----------------------------------------------------------+
| | Predictor |
+-----------+----------------------------------------------------------+
| regressor | <class 'sklearn.ensemble._forest.RandomForestRegressor'> |
| is fit? | True |
+-----------+----------------------------------------------------------+
+-----------------------------------+----------------------------------------------------+
| | Featurizer |
+-----------------------------------+----------------------------------------------------+
| class | dscribe.descriptors.sinematrix.SineMatrix |
| kwargs | None |
| species list | ['Fe', 'Ni', 'Pt', 'Pd', 'Cu', 'C', 'N', 'O', 'H'] |
| maximum structure size | 100 |
| preset | None |
| design space structures provided? | False |
+-----------------------------------+----------------------------------------------------+
>>> pred, _ = acp.predict([saa_structs[-1]])
>>> pred
array([0.])
SineMatrix
.
Note as well that the predict
method will return uncertainty estimates
if available. To see this, let's train a gaussian process regressor with an RBF
kernel. Let's also featurize using SOAP
to see how featurization kwargs are passed
>>> import numpy as np
>>> from ase import Atoms
>>> from dscribe.descriptors import SOAP
>>> from sklearn.gaussian_process import GaussianProcessRegressor
>>> from sklearn.gaussian_process.kernels import RBF
>>> from autocat.surface import generate_surface_structures
>>> from autocat.utils import flatten_structures_dict
>>> from autocat.adsorption import place_adsorbate
>>> from autocat.learning.featurizers import Featurizer
>>> from autocat.learning.predictors import Predictor
>>> subs = flatten_structures_dict(generate_surface_structures(["Pt", "Fe", "Ru"]))
>>> structs = [place_adsorbate(s, Atoms("OH")) for s in subs]
>>> labels = np.random.randint(1, size=(len(structs) - 1))
>>> featurizer = Featurizer(
... featurizer_class=SOAP,
... design_space_structures=structs,
... kwargs={"rcut": 6.0, "nmax": 6, "lmax": 6}
... )
>>> kernel = RBF()
>>> regressor = GaussianProcessRegressor(kernel=kernel)
>>> acp = Predictor(
... featurizer=featurizer,
... regressor=regressor
... )
>>> acp.fit(structs[:-1], labels)
>>> acp
+-----------+------------------------------------------------------------------+
| | Predictor |
+-----------+------------------------------------------------------------------+
| regressor | <class 'sklearn.gaussian_process._gpr.GaussianProcessRegressor'> |
| is fit? | True |
+-----------+------------------------------------------------------------------+
+-----------------------------------+-------------------------------------+
| | Featurizer |
+-----------------------------------+-------------------------------------+
| class | dscribe.descriptors.soap.SOAP |
| kwargs | {'rcut': 6.0, 'nmax': 6, 'lmax': 6} |
| species list | ['Fe', 'Ru', 'Pt', 'O', 'H'] |
| maximum structure size | 38 |
| preset | None |
| design space structures provided? | True |
+-----------------------------------+-------------------------------------+
>>> pred, unc = acp.predict([structs[-1]])
>>> pred
array([0.])
>>> unc
array([1.])