Gaussian Process Regressor

Gaussian Process Regressor#

The GPR model#

The infinities classifier#

Though in principle there is no lower limit to the log-posterior values that GPry can handle, there are reasons in practice for dealing with very small posterior values in a different way:

Large negative log-posteriors, especially those that are literally or effectively minus infinity, can create instabilities in the GP interpolation, even when regularised.
It is common that returning these values is the way likelihood implementations signify that somewhere along the computational pipeline a particular step failed, so it would not make sense to include it in the interpolation.
Especially for likelihoods of noisy data, very-low-likelihood values have numerical (deterministic) noise, which does not make sense to model with the GPR.

Since GPry is an inference code, aiming at modelling probability density functions around their modes, it makes sense to censor such values, and if possible to predict them before evaluation of the true likelihood to prevent wasting time exploring a very-low-probability region and making the GPR model heavier.

To do that, if the acount_for_inf option of the GPR is set to SVM (default), GPry uses a given /threshold/ (inf_threshold) to classify values of the true and simulated log-posterior as either finite or negative infinity, depending on whether the difference between their log-posterior and the maximum found so far is smaller or larger than the threshold. Only points classified as /finite/ will form part of the GPR training set, whereas both point types will be used to retrain an SVM classifier at each iteration. The SVM classifier partitions the parameter space into /finite/ and /minus infinity/ regions. Points in parameter space that fall in the /minus infinity/ region are automatically discarded during the acquisition phase before their true posterior evaluation.

The SVM classifier is stored in the infinities_classifier attribute of the GPR instance. Because it is defined in the transformed space [TODO: ADD REFERENCE], its methods should not be used directly. Instead, it can be called with un-transformed parameter input via the GPR methods is_finite(), that takes a log-posterior value and returns a boolean depending of whether it is above the threshold, and predict_is_finite(), which takes a point in parameter space and returns a boolean depending on whether a log-posterior evaluation is expected to be finite at that location. Both methods accept vector input. The current log-posterior threshold is returned by the property abs_finite_threshold().

class gpr.GaussianProcessRegressor(kernel='RBF', output_scale_prior=[0.01, 1000.0], length_scale_prior=[0.001, 10.0], noise_level=0.01, clip_factor=1.1, optimizer='fmin_l_bfgs_b', n_restarts_optimizer=0, preprocessing_X=None, preprocessing_y=None, account_for_inf='SVM', inf_threshold='20s', keep_min_finite=None, trust_region_factor=None, trust_region_nstd=None, bounds=None, random_state=None, verbose=1)[source]#

Bases: GaussianProcessRegressor, BaseEstimator

GaussianProcessRegressor (GPR) that allows dynamic expansion.

The implementation is based on Algorithm 2.1 of Gaussian Processes for Machine Learning (GPML) by Rasmussen and Williams.

In addition to standard scikit-learn estimator API, GaussianProcessRegressor:

allows prediction without prior fitting (based on the GP prior).

implements a pipeline for pretransforming data before fitting.

provides the method append_to_data() which allows to append additional data points to an already existing GPR. This is done either by refitting the hyperparameters (theta) or alternatively by using the Matrix inversion Lemma to keep the hyperparameters fixed.

overwrites the (hidden) native deepcopy function. This enables copying the GPR as well as the sampled points it contains.

Parameters:

kernel (kernel object, string, dict, optional (default: "RBF")) –
The kernel specifying the covariance function of the GP. If “RBF”/”Matern” is passed, the asymmetric kernel:
```
ConstantKernel() * RBF/Matern()
```
is used as default where n_dim is the number of dimensions of the training space. In this case you will have to provide the prior bounds as the bounds parameter to the GP. For the Matern kernel $\nu=3/2$. To pass different arguments to the kernel, e.g. nu=5/2 for Matern, pass a single-key dict as {"Matern": {"nu": 2.5}}". Note that the kernel’s hyperparameters are optimized during fitting.
output_scale_prior (tuple as (min, max), optional (default: [1e-2, 1e3])) – Prior for the (non-squared) scale parameter, in normalised logp units.
length_scale_prior (tuple as (min, max), optional (default: [1e-3, 1e1])) – Prior for the length parameters, as a fraction of the parameter priors sizes.
noise_level (float or array-like, optional (default: 1e-2)) – Square-root of the value added to the diagonal of the kernel matrix during fitting. Larger values correspond to increased noise level in the observations and reduce potential numerical issue during fitting. If an array is passed, it must have the same number of entries as the data used for fitting and is used as datapoint-dependent noise level. Note that this is equivalent to adding a WhiteKernel with c=noise_level.
clip_factor (float, optional (default: 1.1)) – Factor for upper clipping of the GPR predictions, to avoid overshoots.

optimizer (str or callable, optional (default: "fmin_l_bfgs_b")) –

Can either be one of the internally supported optimizers for optimizing the kernel’s parameters, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:

def optimizer(obj_func, initial_theta, bounds):
    # * 'obj_func' is the objective function to be maximized, which
    #   takes the hyperparameters theta as parameter and an
    #   optional flag eval_gradient, which determines if the
    #   gradient is returned additionally to the function value
    # * 'initial_theta': the initial value for theta, which can be
    #   used by local optimizers
    # * 'bounds': the bounds on the values of theta
    ....
    # Returned are the best found hyperparameters theta and
    # the corresponding value of the target function.
    return theta_opt, func_min

Per default, the ‘fmin_l_bfgs_b’ algorithm from scipy.optimize is used. If None is passed, the kernel’s parameters are kept fixed. Available internal optimizers are:

'fmin_l_bfgs_b'

n_restarts_optimizer (int, optional (default: 0)) – The number of restarts of the optimizer for finding the kernel’s parameters which maximize the log-marginal likelihood. The first run of the optimizer is performed from the kernel’s initial parameters, the remaining ones (if any) from thetas sampled log-uniform randomly from the space of allowed theta-values. If greater than 0, all bounds must be finite. Note that n_restarts_optimizer == 0 implies that one run is performed.
preprocessing_X (X-preprocessor, Pipeline_X, optional (default: None)) – Single preprocessor or pipeline of preprocessors for X. If None is passed the data is not preprocessed. The fit method of the preprocessor is only called when the GP’s hyperparameters are refit.
preprocessing_y (y-preprocessor or Pipeline_y, optional (default: None)) – Single preprocessor or pipeline of preprocessors for y. If None is passed the data is not preprocessed. The fit method of the preprocessor is only called when the GP’s hyperparameters are refit.
account_for_inf (SVM, None or "SVM" (default: "SVM")) – Uses a SVM (Support Vector Machine) to classify the data into finite and infinite values. This allows the GP to express values of -inf in the data (unphysical values). If all values are finite the SVM will just pass the data through itself and do nothing.
inf_threshold (None, float or str) – Threshold for the infinities classifier to consider a value finite, understood as a positive difference with respect to the current maximum y. It can be given as a string formed of a number and ending in “s”, meaning the distance to the mode which shall be considered finite in $\sigma$ using a $\chi^2$ distribution. Used only if account_for_inf is not None.
keep_min_finite (int, optional (default: None)) – Minimum number of points to be considered finite, and this part of the GPR training set. Useful e.g. if a point with a much larger y-value than the rest is suddenly found. If no infinities classifier selected, it has no effect. Otherwise, if None, is set to the dimensionality of the problem.
bounds (array-like, shape=(n_dims,2), optional) – Array of bounds of the prior [lower, upper] along each dimension. Has to be provided when the kernel shall be built automatically by the GP.
trust_region_factor (float, optional) – If defined as a positive float, it defines a trust region as a hypercube containing the GPR (finite) training set, enlarged by the given factor. The bounds of this trust region can be read from the trust_bounds attribute, and points outside these bounds will cause the predict method to return a negative infinity. Useful when the posterior can be expected to be much smaller than the prior (but in that case it would be preferable to simply reduce the prior bounds), noticeable e.g. by the acquisition module is taking too long to propose points.
trust_region_nstd (float, optional) – If defined as a positive float, the definition of the trust region only takes into account training points corresponding to a significance (assuming a Gaussian posterior) equivalent to this value in 1d standard deviations.
random_state (int or numpy.random.Generator, optional) – The generator used to perform random operations of the GPR. If an integer is given, it is used as a seed for the default global numpy random number generator.
verbose (1, 2, 3, optional (default: 1)) – Level of verbosity of the GP. 3 prints Infos, Warnings and Errors, 2 Warnings and Errors, and 1 only Errors. Should be set to 2 or 3 if problems arise.

Attributes:

n (int) – Number of points/features in the training set of the GPR.
d (int) – Dimensionality of the training data.
bounds (array) – The bounds with which the GPR was defined.
trust_bounds (array or None) – The bounds of the trust region if trust_region_factor was defined, or None otherwise.
X_train (array-like, shape = (n_samples, n_features)) – Original (untransformed) feature values in training data of the GPR. Intended to be used when one wants to access the training data for any purpose.
y_train (array-like, shape = (n_samples, [n_output_dims])) – Original (untransformed) target values in training data of the GPR. Intended to be used when one wants to access the training data for any purpose.
X_train_ (array-like, shape = (n_samples, n_features)) – (Possibly transformed) feature values in training data of the GPR (also required for prediction). Mostly intended for internal use.
y_train_ (array-like, shape = (n_samples, [n_output_dims])) – (Possibly transformed) target values in training data of the GPR (also required for prediction). Mostly intended for internal use.
n_total (int) – Number of points/features in the training set of the model, including points with target values classified as infinite.
X_train_all (array-like, shape = (n_samples, n_features)) – Original (untransformed) feature values in training data of the model, including points with target values classified as infinite, and thus not part of the training set of the GPR.
y_train_all (array-like, shape = (n_samples, [n_output_dims])) – Original (untransformed) target values in training data of the model, including values classified as infinite, and thus not part of the training set of the GPR.
X_train_all_ (array-like, shape = (n_samples, n_features)) – (Possibly transformed) feature values in training data of the model, including points with target values classified as infinite.
y_train_all_ (array-like, shape = (n_samples, [n_output_dims])) – (Possibly transformed) target values in training data of the GPR model, including points with target values classified as infinite.
noise_level (array-like, shape = (n_samples, [n_output_dims]) or scalar) – The noise level (square-root of the variance) of the uncorrelated training data. This is un-transformed.
noise_level_ (array-like, shape = (n_samples, [n_output_dims]) or scalar) – The transformed noise level (if y is preprocessed)
alpha (array-like, shape = (n_samples, [n_output_dims]) or scalar) – The value which is added to the diagonal of the kernel. This is the square of noise_level.
kernel_ (kernels object) – The kernel used for prediction. The structure of the kernel is the same as the one passed as parameter but with optimized hyperparameters.
alpha_ (array-like, shape = (n_samples, n_samples)) – Not to be confused with alpha! The inverse Kernel matrix of the training points multiplied with y_train_ (Dual coefficients of training data points in kernel space). Needed at prediction.
V_ (array-like, shape = (n_samples, n_samples)) – Lower-triangular Cholesky decomposition of the inverse kernel in X_train_

Warning

L_ is not recomputed when using the append_to_data method without refitting the hyperparameters. As only K_inv_ and alpha_ are used at prediction this is not neccessary.
log_marginal_likelihood_value_ (float) – The log-marginal-likelihood of self.kernel_.theta

Methods:

`append_to_data`(X, y[, noise_level, fit_gpr, ...])	Append newly acquired data to the GP Model and updates it.
`fit`(X, y)	Fit Gaussian process regression model.
`_update_model`()	Updates a preexisting model using a single matrix inversion.
`predict`(X[, return_std, return_cov, ...])	Predict output for X.

property d#: Dimension of the feature space.

property y_max#: The max. posterior value in the training set.

property n#

Number of points in the training set.

This excludes infinite points if the GPR was initialized to account for them.

To get the total number of points added to the model, both finite and infinite, use the property GaussianProcessRegressor.n_total.

property n_finite#: Number of points in the training set. Alias of GaussianProcessRegressor.n.

property n_total#

Returns the total number of points added to the model, both finite and infinite.

Infinite points, if accounted for, are not part of the training set of the GPR.

property X_train_infinite#: X of points in the training set which have been classified as infinite.

property y_train_infinite#: X of points in the training set which have been classified as infinite.

property fitted#: Whether the GPR hyperparameters have been fitted at least once.

property last_appended#: Returns a copy of the last appended training points (finite/accepted or not), as (X, y).

property last_appended_finite#: Returns a copy of the last appended GPR (finite) training points, as (X, y).

property scales#: Kernel scales as (output_scale, (length_scale_1, ...)) in non-transformed coordinates.

training_set_as_df()[source]#: Returns the training set as a pandas DataFrame (created on-the-fly and not saved).

property abs_finite_threshold#: Absolute threshold for y values to be considered finite.

is_finite(y)[source]#

Returns the classification of y (target) values as finite (True) or not, by comparing them with the current threshold.

Notes

Use this method instead of the equivalent one of the ‘infinities_classifier’ attribute, since the arguments of that one may need to be transformed first.

If calling with an argument which is not either the training set or a subset of it results may be inconsistent, since new values may modify the threshold.

predict_is_finite(X, validate=True)[source]#

Returns a prediction for the classification of the target value at some given parameters.

Notes

Use this method instead of the equivalent one of the ‘infinities_classifier’ attribute, since the arguments of that one may need to be transformed first.

set_random_state(random_state)[source]#: (Re)sets the random state, including the SVM, if present.

update_trust_region()[source]#

Adjusts the boundaries of the trust region.

Keeps at least d points, temporarily lowering the threshold if needed.

append_to_data(X, y, noise_level=None, fit_gpr=True, fit_classifier=True)[source]#

Append newly acquired data to the GP Model and updates it.

Here updating refers to the re-calculation of the the GPR inverse matrix $(K(X,X)+\sigma_n^2 I)^{-1}$ which is needed for predictions.

The highest cost incurred by this method is the refitting of the GPR kernel hyperparameters $\theta$. It can be useful to disable it (fit_gpr=False) in cases where it is worth saving the computational expense in exchange for a loss of information, such as when performing parallelized active sampling (NB: this is only possible when the GPR hyperparameters have been fit at least once).

An intermediate option is to perform a single GPR hyperparameter optimization run (instead of the default number of restarts) from the current hyperparameter values, using fit_gpr='simple'.

For an additional speed boost, the refitting of the infinities classifier (if present) can be disabled with fit_classifier=False (.if a GPR refit is requested this value is overridden).

If called with X=None, y=None, it re-fits the model without adding new points.

The following calls should then be equivalent:

fit_gpr_kwargs = {"n_restarts": 10}
# A
gpr.append_to_data(new_X, new_y, fit_gpr=fit_gpr_kwargs)
# B
gpr.append_to_data(new_X, new_y, fit_gpr=False)
gpr.fit_gpr_hyperparameters(**fit_gpr_kwargs)
# C
gpr.append_to_data(new_X, new_y, fit_gpr=False, fit_classifier=False)
gpr.append_to_data(None, None, fit_gpr=fit_gpr_kwargs)

Parameters:

X (array-like, shape = (n_samples, n_features), or None) – Training data to append to the model.
y (array-like, shape = (n_samples, [n_output_dims]), or None) – Target values to append to the data
noise_level (array-like, shape = (n_samples, [n_output_dims])) – Uncorrelated standard deviations to add to the diagonal part of the covariance matrix. Needs to have the same number of entries as y. If None, the noise_level set in the instance is used. If you pass a single number the noise level will be overwritten. In this case it is advisable to refit the hyperparameters of the kernel.
fit_gpr (Bool or 'simple', dict, optional (default: True)) – Whether the GPR $\theta$-parameters are optimised ('simple' for a single run from last optimum; True for a more thorough search with multiple restarts), or a simple kernel matrix inversion is performed (False) with constant hyperparameters. Can also be passed a dict with arguments to be passed to the fit_gpr_hyperparameters method.
fit_classifier (Bool, optional (default: True)) – Whether the infinities classifier is refit. Overridden to True if fit_gpr is not False.

Returns:

Returns an instance of self.

Return type:

self

_validate_noise_level(noise_level, n_train)[source]#

Checks for type and inconsistencies for the given noise level. Separated from the update method to be performed before updating class attributes.

Returns a validated value that can be used directly.

_update_noise_level(noise_level)[source]#

Updates the noise level of the training set with the new values (or the lack thereof).

Assumes possible inconsistencies dealt with by _validate_noise_level.

remove_from_data(position, fit=True)[source]#

WARNING This function is currently outdated and raises NotImplementedError.

Removes data points from the GP model. Works very similarly to the append_to_data() method with the only difference being that the position(s) of the training points to delete are given instead of values.

Parameters:

position (int or array-like, shape = (n_samples,)) – The position (index) at which to delete from the training data. If an array is given the data is deleted at multiple points.
fit (Bool, optional (default: True)) – Whether the model is refit to new $\theta$-parameters or just updated.

Returns:

Returns an instance of self.

Return type:

self

log_marginal_likelihood(*args, **kwargs)[source]#: Log-marginal likelihood of the kernel hyperparameters given the training data.

fit_gpr_hyperparameters(simple=False, start_from_current=True, n_restarts=None, hyperparameter_bounds=None)[source]#

Optimizes the hyperparameters $\theta$ for the current training data. The algorithm used to perform the optimization is very similar to the one provided by Scikit-learn. The only major difference is, that gradient information is used in addition to the values of the marginalized log-likelihood.

Parameters:

n_restarts (int, default None) – Number of restarts of the optimizer. If not defined, uses the one set at instantiation. 1 means a single optimizer run.
start_from_current (bool, default: True) – Starts the first optimization run from the current hyperparameters (ignored if not previously fitted).
simple (bool, default: False) – If True, runs the optimiser only from the last optimum of the hyperparameters, without restarts. Shorthand for start_from_current=True, n_restarts=1. (it overrides them if True).
hyperparameter_bounds (array-like, default: None) – Bounds for the hyperparameters, if different from those declared at init.

Return type:

self

_update_model()[source]#

Updates a preexisting model using a single matrix inversion.

This method is used when a refitting of the $\theta$-parameters is not needed. In this case only the Inverse of the Covariance matrix is updated. This method does not take X or y as inputs and should only be called from the append_to_data method.

The X and y values used for training are taken internally from the instance.

Return type:: self

predict(X, return_std=False, return_cov=False, return_mean_grad=False, return_std_grad=False, validate=True, ignore_trust_region=False)[source]#

Predict output for X.

In addition to the mean of the predictive distribution, also its standard deviation (return_std=True) or covariance (return_cov=True), the gradient of the mean and the standard-deviation with respect to X can be optionally provided.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Query points where the GP is evaluated.
return_std (bool, default: False) – If True, the standard-deviation of the predictive distribution at the query points is returned along with the mean.
return_mean_grad (bool, default: False) – Whether or not to return the gradient of the mean. Only valid when X is a single point.
return_std_grad (bool, default: False) – Whether or not to return the gradient of the std. Only valid when X is a single point.
validate (bool, default: True) – If False, X is assumed to be correctly formatted (2-d float array, with points as rows and dimensions/features as columns, C-contiguous), and no checks are performed on it. Reduces overhead. Use only for repeated calls when the input is programmatically generated to be correct at each stage.
ignore_trust_region (bool (default: False)) – If True and trust-region definition was used (trust_region_factor defined at initialisation), it ignores the trust region and does not return a negative infinity outside the trust region.
note:: (..) – Note that in contrast to the sklearn GP Regressor our implementation cannot return the full covariance matrix. This is to save on some complexity and since the full covariance cannot be calculated if either values are infinite or a y-preprocessor is used.

Returns:

y_mean (array, shape = (n_samples, [n_output_dims])) – Mean of predictive distribution a query points
y_std (array, shape = (n_samples,), optional) – Standard deviation of predictive distribution at query points. Only returned when return_std is True.
y_cov (array, shape = (n_samples, n_samples), optional) – Covariance of joint predictive distribution a query points. Only returned when return_cov is True.
y_mean_grad (shape = (n_samples, n_features), optional) – The gradient of the predicted mean.
y_std_grad (shape = (n_samples, n_features), optional) – The gradient of the predicted std.

predict_std(X, validate=True)[source]#

Predict output standart deviation for X.

Parameters:

X (array-like, shape = (n_samples, n_features)) – Query points where the GP is evaluated.
validate (bool, default: True) – If False, X is assumed to be correctly formatted (2-d float array, with points as rows and dimensions/features as columns, C-contiguous), and no checks are performed on it. Reduces overhead. Use only for repeated calls when the input is programmatically generated to be correct at each stage.

Returns:

y_std – Standard deviation of predictive distribution at query points. Only returned when return_std is True.

Return type:

array, shape = (n_samples,), optional

_kernel_inverse(kernel)[source]#: Compute inverse of the kernel and store relevant quantities

Request metadata passed to the predict method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to predict if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to predict.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:

ignore_trust_region (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for ignore_trust_region parameter in predict.
return_cov (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_cov parameter in predict.
return_mean_grad (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_mean_grad parameter in predict.
return_std (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std parameter in predict.
return_std_grad (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for return_std_grad parameter in predict.
validate (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for validate parameter in predict.

Returns:

self – The updated object.

Return type:

object

set_score_request(*, sample_weight: bool | None | str = '$UNCHANGED$') → GaussianProcessRegressor#

Request metadata passed to the score method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

True: metadata is requested, and passed to score if provided. The request is ignored if metadata is not provided.
False: metadata is not requested and the meta-estimator will not pass it to score.
None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.
str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

Added in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters:: sample_weight (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) – Metadata routing for sample_weight parameter in score.
Returns:: self – The updated object.
Return type:: object

static compute_threshold_given_sigma(n_sigma, n_dimensions)[source]#: Computes threshold value given a number of $\sigma$ away from the maximum, assuming a $\chi^2$ distribution.

static _diff_threshold_if_keep_n_finite(y, n, reference_diff_threshold, epsilon=1e-06)[source]#

Recalculation of the relative threshold when imposing that at least n points must be kept finite (i.e. in the training set). The “at least” imposes that the minimum value returned is the given threshold.

It requires sorting the y values, which can be costly in extreme cases.

Gaussian Process Regressor

Contents

Gaussian Process Regressor#

The GPR model#

The infinities classifier#