Gaussian Process Acquisition#

This module implements tools for active sampling with Gaussian Process surrogate models.

Acquisition#

The acquisition module contains both the acquisition function as well as the optimization procedure for it. It operates similarly to the GP regressor module.

Acquisition function#

The acquisition function is the centerpiece of the Bayesion optimization procedure and decides which point the algorithm samples next. The acquisition_functions module has multiple inbuilt acquisition functions as well as building blocks for custom acquistion functions which can be constructed using the + and * operators. Since it tends to perform best we will use the standard acquisition_functions.Log_exp acquisition function with a \(\zeta\) value of 0.05 to encourage exploration (as we know that the shape of the posterior distribution is not very gaussian):

from gpry.acquisition_functions import Log_exp
af = Log_exp(zeta=0.1)

Then it is time for the actual GP Acquisition. For this we need to build our instance of the gp_acquisition.GPAcquisition class which also takes the acquisition function. Furthermore it needs the prior bounds so it knows which volume to sample in. Furthermore like with the GP regressor it is usually a good idea to scale the prior bounds to a unit hypercube (assuming that the mode occupies roughly the same portion of the prior in each dimension) as the optimizer tends to struggle with very different scales across different dimensions:

from gpry.gp_acquisition import GPAcquisition
acq = GPAcquisition(
    prior_bounds,
    acq_func=af,
    n_restarts_optimizer=10,
    preprocessing_X=Normalize_bounds(prior_bounds)
    )

Acquisition Functions#

This module contains the implementation of an “Aquisition Function” class with a structure similar to the one provided by the Kernels module of sklearn. Additionally to some internally provided base AF’s this module also overwrites arithmetic operators for AF’s in order to enable the construction of composite AF’s.

Note

Bear in mind, that you first need to initialize an instance of an acquisition function before calling it. Composite acquisition functions are possible too, e.g.:

from acquisition_functions import ConstantAcqFunc, Mu, Sigma
af = ConstantAcqFunc(2) * Mu + (-3) * Sigma ** 2.5

Warning

Currently only the +, * and ** operator are supported but I am sure you can figure out how to work around using - and / on your own ;)

The NORA acquisition engine#

[TODO]

Base Class#

All acquisition functions are derived from this class. If you want to define your own acquisition functions it needs to inherit from this class. A tutorial on how to define such a class is given in AcquisitionFunction

AcquisitionFunction()

Base class for all Acquisition Functions (AF's).

Inbuilt Acquisition Functions#

All inbuilt Acquisition Functions can evaluate the gradient in addition to the value of the acquisition function at point X.

The recommended acquisition that we derived in order to efficiently sample the parameter space is called LogExp:

LogExp([zeta, sigma_n, fixed, dimension, ...])

Acquisition function which is designed to efficiently sample log-probability distributions.

Furthermore there are more inbuilt acquisition functions (building blocks) which should offer a great deal of flexibility. If you want to define your own acquisition function it needs to inherit from the parent class AcquisitionFunction.

ConstantAcqFunc([constant_value, fixed, ...])

Constant Acquisition function.

Mu([a, fixed, dimension])

\(\mu(X)\) of the surrogate model.

Std([a, fixed, dimension])

\(\sigma(X)\) of the surrogate model.

ExponentialMu([a, fixed, dimension])

\(\exp[\mu(X)]\) of the surrogate model.

ExponentialStd([a, fixed, dimension])

\(\exp[\sigma(X)]\) of the surrogate model.

ExpectedImprovement([xi, fixed, dimension])

Computes the (negative) Expected improvement function.

Additional things#

The things listed here are tools and similar things which in normal operation should not be needed.

is_acquisition_function(acq_func)

Determines if a given object is an acquisition function or not.

Hyperparameter(name, value_type[, ...])

An acquisition function hyperparameter's specification in form of a namedtuple.

AcquisitionFunctionOperator(k1, k2)

Base class for all AF operators.

Sum(k1, k2)

Overwrites the + operator for two or more AF's.

Product(k1, k2)

Overwrites the * operator for two or more AF's.

Exponentiation(acquisition_function, exponent)

Defines the expontentiation of an AF with a real number.

acquisition_functions.builtin_names()[source]

Lists all names of all built-in acquisition functions criteria.

class acquisition_functions.AcquisitionFunction[source]

Base class for all Acquisition Functions (AF’s). All acquisition functions are derived from this class.

Currently several acquisition functions are supported which should be versatile enough for most tasks. If however one wants to specify a custom acquisition function it should be a class which inherits from this abstract class. This class needs to be of the format:

from Acquisition_functions import Acquisition_function
Class custom_acq_func(Acquisition_Function):
    def __init__(self, param_1, ..., fixed=..., dimension=...):
        # * 'hyperparam_i': The hyperparameters of the custom
        #   acquisition function.
        # * 'fixed': whether the hyperparameters of the acquisition
        #   function are to be kept fixed.
        # * 'dimension': the dimensionality of the target function,
        #   which can be used to automatically adapt hyperparameters
        #   function are to be kept fixed.
        # * 'hasgradient': Whether the acquisition function can return
        #   a gradient. Furthermore the bool hasgradient needs to be
        #   specified to 'True' if the acquisition function can return
        #   gradient(s) or 'False' otherwise.
        self.param_1 = param_1
        ...
        self.fixed=fixed
        self.hasgradient = True/False

    @property
    def hyperparameter_param_1(self):
        # Returns the type of hyperparameter and whether it is
        # fixed or not. This method needs to exist for every hyper-
        # parameter.
        return Hyperparameter(
            "param_1", "numeric", fixed=self.fixed)

        return Hyperparameter(
            "param_1", "numeric", fixed=self.fixed)

    def __call__(self, X, gp, eval_gradient=False):
        # * 'X': The value(s) at which the acquisition function is
        #    evaluated
        # * 'GP': The surrogate GP model which shall be used.
        # * 'eval_gradient': Whether the gradient shall be given or
        #   not. Only required if 'self.hasgradient' is true.
        ....
        # Returned are the value(s) of the acquisition function at
        # point(s) X and optionally their gradient(s)

Once the Acquisition function is defined in this way it can be used with the same operators as the inbuilt acquisition functions.

Note

If one of the operands of a composite acquisiton function does not return a gradient the same applies for all operands. Furthermore some optimizers require gradients, which cannot be used in this case.

get_params(deep=True)[source]

Get hyperparameters of this Acquisition function.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

set_params(**params)[source]

Set the parameters of this acquisition function. The method works on simple AF’s as well as on nested AF’s. The latter have parameters of the form <component>__<parameter> so that it’s possible to update each component of a nested object.

Parameters:

**params (dict) – Any number of parameters which shall be set. Should be of the form {"parameter_1_name" : parameter_1_value, "parameter_2_name" : parameter_2_value, ...}

Return type:

self

clone_with_theta(theta)[source]

Returns a clone of self with given hyperparameters theta.

Parameters:

theta (ndarray of shape (n_dims,)) – The hyperparameters

check_X(X)[source]

Internal method to check the dimensionality of any input X provided to an AF when called. Checks the correct type and turns X into a 2d array if a 1d array is provided.

Warning

This method only checks for the correct type of an input, inappropriate values might still cause problems.

Parameters:

X (ndarray of shape (n_samples, n_dims) or (ndims,)) – The input array for any X value passed to an acquisition function

Returns:

X_new – The reshaped array of input data X

Return type:

ndarray of shape (n_samples, n_dims)

property n_dims

Returns the number of non-fixed hyperparameters of the acquisition function.

property hyperparameters

Returns a list of all hyperparameter specifications.

property theta

Returns the (flattened, log-transformed) non-fixed hyperparameters.

Note that theta are typically the log-transformed values of the acquisition function’s hyperparameters as this representation of the search space is more amenable for hyperparameter search, as hyperparameters likelength-scales naturally live on a log-scale.

Returns:

theta – The non-fixed, log-transformed hyperparameters of the acquisition function.

Return type:

ndarray of shape (n_dims,)

property hasgradient

Specifies whether a certain acquisition function can return gradients or not.

abstract __call__(X, gp, eval_gradient=False)[source]

Evaluate the acquisition function.

class acquisition_functions.ConstantAcqFunc(constant_value=1.0, fixed=False, dimension=None)[source]

Constant Acquisition function.

Can be used as part of a product-Composition where it scales the magnitude of the other factor or as part of a sum.

\[A_f(X) = constant\_value \;\forall\; X\]
Parameters:
  • constant_value (float, default=1.0) – The constant value : A_f(X) = constant_value

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.Mu(a=1.0, fixed=False, dimension=None)[source]

\(\mu(X)\) of the surrogate model.

\[A_f(X) = a\cdot\mu(X)\]
Parameters:
  • a (float, default=1.0) – The value with which \(\mu\) is multiplied .

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.ExponentialMu(a=1.0, fixed=False, dimension=None)[source]

\(\exp[\mu(X)]\) of the surrogate model.

\[A_f(X) = \exp(a\cdot\mu(X))\]
Parameters:
  • a (float, default=1.0) – The value with which \(\mu\) is multiplied before exponentiating.

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.Std(a=1.0, fixed=False, dimension=None)[source]

\(\sigma(X)\) of the surrogate model.

\[A_f(X) = a\cdot\sigma(X)\]
Parameters:
  • a (float, default=1.0) – The value with which \(\sigma\) is multiplied before exponentiating.

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.ExponentialStd(a=1.0, fixed=False, dimension=None)[source]

\(\exp[\sigma(X)]\) of the surrogate model.

\[A_f(X) = \exp(a\cdot\sigma(X))\]
Parameters:
  • a (float, default=1.0) – The value with which \(\sigma\) is multiplied before exponentiating.

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.ExpectedImprovement(xi=0.01, fixed=False, dimension=None)[source]

Computes the (negative) Expected improvement function.

The conditional probability P(y=f(x) | x) form a gaussian with a certain mean and standard deviation approximated by the model.

The EI condition is derived by computing \(E[u(f(x))]\) where \(u(f(x)) = 0\), if \(f(x) > y_{\mathrm{opt}}\) and \(u(f(x)) = y_{\mathrm{opt}} - f(x)\), if \(f(x) < y_{\mathrm{opt}}\).

This solves one of the issues of the PI condition by giving a reward proportional to the amount of improvement got.

Parameters:
  • xi (float, default=0.01) – Controls how much improvement one wants over the previous best values. Useful only when method is set to “EI”

  • fixed (bool, default=False,) – whether the constant value shall be fixed or not.

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.BaseLogExp(zeta=None, sigma_n=None, fixed=False, dimension=None, zeta_scaling=0.85, linear=True)[source]

Acquisition function which is designed to efficiently sample log-probability distributions. This is achieved by transforming \(\tilde{\mu}\cdot\tilde{\sigma}\) (of the true, non-logarithmic probability distribution) to logarithmic space.

Parameters:
  • zeta (float, default=1) –

    Controls the exploration-exploitation tradeoff parameter. The value of \(\zeta\) should not exceed 1 under normal circumstances as a value <1 accounts for the fact that the GP’s estimate for \(\mu\) is not correct at the beginning. A good suggestion for setting zeta which is inspired by simulated annealing is

    \[\zeta = \exp(-N_0/N)\]

    where \(N_0\geq 0\) is a “decay constant” and \(N\) the number of training points in the GP.

  • sigma_n (float, default=None) – The (constant) noise level of the data. If set to None the square-root of alpha of the training data (or the square root of the mean of alpha if alpha is an array) will be used.

  • fixed (bool, default=False,) – whether zeta and sigma_n shall be fixed or not.

  • dimension (double, default=None) – the dimension of the parameter space used for auto-scaling the zeta

  • zeta_scaling (double, default=0.85) – the scaling power of the zeta with dimension, if auto-scaled

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

class acquisition_functions.LogExp(zeta=None, sigma_n=None, fixed=False, dimension=None, zeta_scaling=0.85, linear=True)[source]

Acquisition function which is designed to efficiently sample log-probability distributions. This is achieved by transforming \(\tilde{\mu}\cdot\tilde{\sigma}\) (of the true, non-logarithmic probability distribution) to logarithmic space which yields

\[A_{\mathrm{LE}}(X) = \exp(2\zeta\cdot\mu(X))\cdot (\sigma(X)-\sigma_n)\]

For numerical convenience we take the log of this expression which yields:

\[\log(A_{\mathrm{LE}})(X) = 2\zeta\cdot\mu(X) + \log(\sigma(X)-\sigma_n)\]

Note

\(\mu(x)\) and \(\sigma(X)\) are the mean and sigma of the GP regressor which follows the log-probability distribution.

Parameters:
  • zeta (float, default=1) –

    Controls the exploration-exploitation tradeoff parameter. The value of \(\zeta\) should not exceed 1 under normal circumstances as a value <1 accounts for the fact that the GP’s estimate for \(\mu\) is not correct at the beginning. A good suggestion for setting zeta which is inspired by simulated annealing is

    \[\zeta = \exp(-N_0/N)\]

    where \(N_0\geq 0\) is a “decay constant” and \(N\) the number of training points in the GP.

  • sigma_n (float, default=None) – The (constant) noise level of the data. If set to None the square-root of alpha of the training data (or the square root of the mean of alpha if alpha is an array) will be used.

  • fixed (bool, default=False,) – whether zeta and sigma_n shall be fixed or not.

  • dimension (double, default=None) – the dimension of the parameter space used for auto-scaling the zeta

  • zeta_scaling (double, default=0.85) – the scaling power of the zeta with dimension, if auto-scaled

static f(mu, std, baseline, noise_level, zeta)[source]

Linearized exponentiated log-error bar.

class acquisition_functions.NonlinearLogExp(zeta=None, sigma_n=None, fixed=False, dimension=None, zeta_scaling=0.85, linear=True)[source]

Warning

The gradients for this acquisition function are not yet implemented correctly. Use with caution!

An alternative approach which keeps both scales exponentiated:

\[A_{\mathrm{LE}}(X) = \exp(2\zeta\cdot\mu(X))\cdot \exp(\sigma(X)-\sigma_n)\]

Again we take the log of this.

Parameters:
  • zeta (float, default=1) –

    Controls the exploration-exploitation tradeoff parameter. The value of \(\zeta\) should not exceed 1 under normal circumstances as a value <1 accounts for the fact that the GP’s estimate for \(\mu\) is not correct at the beginning. A good suggestion for setting zeta which is inspired by simulated annealing is

    \[\zeta = \exp(-N_0/N)\]

    where \(N_0\geq 0\) is a “decay constant” and \(N\) the number of training points in the GP.

  • sigma_n (float, default=None) – The (constant) noise level of the data. If set to None the square-root of alpha of the training data (or the square root of the mean of alpha if alpha is an array) will be used.

  • fixed (bool, default=False,) – whether zeta and sigma_n shall be fixed or not.

  • dimension (double, default=None) – the dimension of the parameter space used for auto-scaling the zeta

  • zeta_scaling (double, default=0.85) – the scaling power of the zeta with dimension, if auto-scaled

static f(mu, std, baseline, noise_level, zeta)[source]

Exponentiated log-error bar

acquisition_functions.is_acquisition_function(acq_func)[source]

Determines if a given object is an acquisition function or not.

Parameters:

acq_func (Any) – The object which shall be examined

Returns:

is_acquisition_function – whether the specified object is an acquisition function.

Return type:

bool

class acquisition_functions.Hyperparameter(name, value_type, n_elements=1, fixed=False)[source]

An acquisition function hyperparameter’s specification in form of a namedtuple. This formalism is copied from the kernel module of Scikit-Learn.

Note

The current code does not support optimization of any hyperparameters of the acquisition functions. This might be added in the future (which is why the fixed parameter exists).

Attributes:
  • name (str) – The name of the hyperparameter. Unlike in the kernels for the GPRegressor the hyperparameters of the Acquisition functions do not have bounds.

  • value_type (str) – The type of the hyperparameter. Currently, only 'numeric' hyperparameters are supported.

  • n_elements (int, default=1) – The number of elements of the hyperparameter value. Defaults to 1, which corresponds to a scalar hyperparameter. n_elements > 1 corresponds to a hyperparameter which is vector-valued, such as, e.g., anisotropic length-scales.

  • fixed (bool, default=False) – Whether the value of this hyperparameter is fixed, i.e., cannot be changed during hyperparameter tuning.

class acquisition_functions.AcquisitionFunctionOperator(k1, k2)[source]

Base class for all AF operators.

get_params(deep=True)[source]

Get parameters of this acquisition function.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

property hyperparameters

Returns a list of all hyperparameter.

property theta

Returns the (flattened, log-transformed) non-fixed hyperparameters. Note that theta are typically the log-transformed values of the AF’s hyperparameters as this representation of the search space is more amenable for hyperparameter search, as hyperparameters like length-scales naturally live on a log-scale.

Returns:

theta – The non-fixed, log-transformed hyperparameters of the acquisition function

Return type:

ndarray of shape (n_dims,)

class acquisition_functions.Sum(k1, k2)[source]

Overwrites the + operator for two or more AF’s. Additionally gradients are computed and calculated together according to the rules of differentiation.

A sum of an AF can be either with another (composite) AF or a real number.

__call__(X, gp, eval_gradient=False)[source]

Evaluate the acquisition function.

class acquisition_functions.Product(k1, k2)[source]

Overwrites the * operator for two or more AF’s. Additionally gradients are computed and calculated together according to the rules of differentiation.

A product of an AF can be either with another (composite) AF or a real number.

__call__(X, gp, eval_gradient=False)[source]

Evaluate the acquisition function.

class acquisition_functions.Exponentiation(acquisition_function, exponent)[source]

Defines the expontentiation of an AF with a real number. Additionally gradients are computed and calculated together according to the rules of differentiation.

Warning

An AF can only be exponentiated with a number and not with another AF:

new_af = old_af ** number
get_params(deep=True)[source]

Get parameters of this Acquisition function.

Parameters:

deep (bool, default=True) – If True, will return the parameters for this estimator and contained subobjects that are estimators.

Returns:

params – Parameter names mapped to their values.

Return type:

dict

property hyperparameters

Returns a list of all hyperparameter.

property theta

Returns the (flattened, log-transformed) non-fixed hyperparameters. Note that theta are typically the log-transformed values of the acquisition function’s hyperparameters as this representation of the search space is more amenable for hyperparameter search, as hyperparameters like length-scales naturally live on a log-scale.

Returns:

theta – The non-fixed, log-transformed hyperparameters of the acquisition function

Return type:

ndarray of shape (n_dims,)

__call__(X, gp, eval_gradient=False)[source]

Return the Value of the AF at x (A_f(X, gp)) and optionally its gradient.

Parameters:
  • X (array-like of shape (n_samples_X, n_features) or list of object) – X-Value at which the Acquisition function shall be evaluated

  • gp (SKLearn GaussianProcessRegressor) – The GPRegressor (surrogate model) from which to evaluate GP(X) and optionally X_train and Y_train.

  • eval_gradient (bool, default=False) – Determines whether the gradient with respect to X is calculated.

Returns:

  • A_f (array of shape (n_samples_X)) – The value of the acquisition function at point(s) X

  • A_f_gradient (array of shape (n_samples_X, n_dim)) – The gradient of the Acquisition function with respect to X. Only returned when eval_gradient is True.

GPAcquisition classes, which take care of proposing new locations where to evaluate the true function.

gp_acquisition.builtin_names()[source]#

Lists all names of all built-in acquisition functions criteria.

class gp_acquisition.GenericGPAcquisition(bounds, preprocessing_X=None, verbose=1, acq_func='LogExp')[source]#

Bases: object

Generic class for acquisition objects.

multi_add(gpr, n_points=1, bounds=None, rng=None)[source]#

Method to query multiple points where the objective function shall be evaluated.

The strategy differs depending on the acquisition class.

When run in parallel (MPI), it must return the same values for all processes.

Parameters:
  • gpr (GaussianProcessRegressor) – The GP Regressor which is used as surrogate model.

  • n_points (int, optional (default=1)) – Number of points to be returned. A value large than 1 is useful if you can evaluate your objective in parallel, and thus obtain more objective function evaluations per unit of time.

  • bounds (np.array, optional) – Bounds inside which to look for the next proposals, e.g. the GPR trust region. If not defined, the prior bounds are used.

  • rng (int or numpy.random.Generator, optional) – The generator used to perform the acquisition process. If an integer is given, it is used as a seed for the default global numpy random number generator.

Returns:

  • X (numpy.ndarray, shape = (X_dim, n_points)) – The X values of the found optima

  • y_lies (numpy.ndarray, shape = (n_points,)) – The predicted values of the GP at the proposed sampling locations

  • fval (numpy.ndarray, shape = (n_points,)) – The values of the acquisition function at X_opt

class gp_acquisition.BatchOptimizer(bounds, preprocessing_X=None, verbose=1, acq_func='LogExp', proposer=None, acq_optimizer='fmin_l_bfgs_b', n_restarts_optimizer='5d', n_repeats_propose=10)[source]#

Bases: GenericGPAcquisition

Run Gaussian Process acquisition.

Works similarly to a GPRegressor but instead of optimizing the kernel’s hyperparameters it optimizes the Acquisition function in order to find one or multiple points at which the likelihood/posterior should be evaluated next.

Furthermore contains a framework for different lying strategies in order to improve the performance if multiple processors are available

Use this class directly if you want to control the iterations of your bayesian quadrature loop.

Parameters:
  • bounds (array) – Bounds in which to optimize the acquisition function, assumed to be of shape (d,2) for d dimensional prior

  • proposer (Proposer object, optional (default: "ParialProposer", producing a mixture) – of points drawn from an “UniformProposer” and from a “CentroidsProposer”) Proposes points from which the acquisition function should be optimized.

  • acq_func (GPry Acquisition Function, dict, optional (default: "LogExp")) – Acquisition function to maximize/minimize. If none is given the LogExp acquisition function will be used. Can also be a dictionary with the name of the acquisition function as the single key, and as value a dict of its arguments.

  • acq_optimizer (string or callable, optional (default: "auto")) –

    Can either be one of the internally supported optimizers for optimizing the acquisition function, specified by a string, or an externally defined optimizer passed as a callable. If a callable is passed, it must have the signature:

    def optimizer(obj_func, initial_guess, bounds):
        # * 'obj_func' is the objective function to be maximized, which
        #   takes the hyperparameters theta as parameter and an
        #   optional flag eval_gradient, which determines if the
        #   gradient is returned additionally to the function value
        # * 'initial_guess': the initial value for X, which can be
        #   used by local optimizers
        # * 'bounds': the bounds on the values of X
        ....
        # Returned are the best found X and
        # the corresponding value of the target function.
        return X_opt, func_min
    

    if set to ‘auto’ either the ‘fmin_l_bfgs_b’ or ‘sampling’ algorithm from scipy.optimize is used depending on whether gradient information is available or not.

    Note

    The default optimizers are designed to maximize the acquisition function.

  • preprocessing_X (X-preprocessor, Pipeline_X, optional (default: None)) – Single preprocessor or pipeline of preprocessors for X. Preprocessing makes sense if the scales along the different dimensions are vastly different which means that the optimizer struggles to find the maximum of the acquisition function. If None is passed the data is not preprocessed.

  • n_restarts_optimizer (int, default=0) –

    The number of restarts of the optimizer for finding the maximum of the acquisition function. The first run of the optimizer is performed from the last X fit to the model if available, otherwise it is drawn at random.

    The remaining ones (if any) from X’s sampled uniform randomly from the space of allowed X-values. Note that n_restarts_optimizer == 0 implies that one run is performed.

  • verbose (1, 2, 3, optional (default: 1)) – Level of verbosity. 3 prints Infos, Warnings and Errors, 2 Warnings and Errors, and 1 only Errors. Should be set to 2 or 3 if problems arise.

Attributes:

gpr_ (GaussianProcessRegressor) – The GP Regressor which is currently used for optimization.

optimize_acquisition_function(gpr, i, bounds=None, rng=None)[source]#

Exposes the optimization method for the acquisition function. When called it proposes a single point where for where to evaluate the true model next. It is internally called in the multi_add() method.

Parameters:
  • gpr (GaussianProcessRegressor) – The GP Regressor which is used as surrogate model.

  • i (int) – Internal counter which is used to enable MPI support. If you want to optimize from a single location and rerun the optimizer from multiple starting locations loop over this parameter.

  • rng (numpy.random.Generator, optional) – The generator used for the optimization process.

Returns:

  • X_opt (numpy.ndarray, shape = (X_dim,)) – The X value of the found optimum

  • func (float) – The value of the acquisition function at X_opt

multi_add(gpr, n_points=1, bounds=None, rng=None, force_resample=False)[source]#

Method to query multiple points where the objective function shall be evaluated. The strategy which is used to query multiple points is by using the \(f(x)\sim \mu(x)\) strategy and and not changing the hyperparameters of the model.

This is done to increase speed since then the blockwise matrix inversion lemma can be used to invert the K matrix. The optimization for a single point is done using the optimize_acquisition_func() method.

When run in parallel (MPI), returns the same values for all processes.

Parameters:
  • gpr (GaussianProcessRegressor) – The GP Regressor which is used as surrogate model.

  • n_points (int, optional (default=1)) – Number of points to be returned. A value large than 1 is useful if you can evaluate your objective in parallel, and thus obtain more objective function evaluations per unit of time.

  • bounds (np.array, optional) – Bounds inside which to look for the next proposals, e.g. the GPR trust region. If not defined, the prior bounds are used.

  • rng (int or numpy.random.Generator, optional) – The generator used to perform the acquisition process. If an integer is given, it is used as a seed for the default global numpy random number generator.

Returns:

  • X (numpy.ndarray, shape = (X_dim, n_points)) – The X values of the found optima

  • y_lies (numpy.ndarray, shape = (n_points,)) – The predicted values of the GP at the proposed sampling locations

  • fval (numpy.ndarray, shape = (n_points,)) – The values of the acquisition function at X_opt

class gp_acquisition.NORA(bounds, preprocessing_X=None, verbose=1, acq_func='LogExp', sampler=None, mc_every='1d', nlive_per_training=3, nlive_max='25d', nlive_per_dim_max=None, num_repeats='5d', num_repeats_per_dim=None, precision_criterion_target=0.01, nprior_per_nlive=10, max_ncalls=None, tmpdir=None)[source]#

Bases: GenericGPAcquisition

Run Gaussian Process acquisition with NORA (Nested sampling Optimization for Ranked Acquistion).

Uses kriging believer while it samples the acquisition function using nested sampling (with PolyChord or UltraNest).

Parameters:
  • bounds (array) – Bounds in which to optimize the acquisition function, assumed to be of shape (d,2) for d dimensional prior

  • acq_func (GPry Acquisition Function, dict, optional (default: "LogExp")) – Acquisition function to maximize/minimize. If none is given the LogExp acquisition function will be used. Can also be a dictionary with the name of the acquisition function as the single key, and as value a dict of its arguments.

  • mc_every (int) – If >1, only calls the MC sampler every mc_steps, and reuses previous X otherwise, recomputing y and sigma with the new GPR.

  • nlive_per_training (int) – live points per sample in the current training set. Not recommended to decrease it.

  • nlive_max (int) – live points max cap

  • num_repeats (int) – length of slice-chains

  • precision_criterion_target (float) – Cap on precision criterion of Nested Sampling

  • nprior_per_nlive (int) – Number of initial samples times dimension.

  • preprocessing_X (X-preprocessor, Pipeline_X, optional (default: None)) – Single preprocessor or pipeline of preprocessors for X. Preprocessing makes sense if the scales along the different dimensions are vastly different which means that the optimizer struggles to find the maximum of the acquisition function. If None is passed the data is not preprocessed.

  • verbose (1, 2, 3, optional (default: 1)) – Level of verbosity. 3 prints Infos, Warnings and Errors, 2 Warnings and Errors, and 1 only Errors. Should be set to 2 or 3 if problems arise.

Attributes:

gpr_ (GaussianProcessRegressor) – The GP Regressor which is currently used for optimization.

property pool_size#

Size of the pool of points.

update_NS_precision(gpr)[source]#

Updates NS precision parameters: - num_repeats: constant for now - nlive: nlive_per_training times the size of the training set, capped at

nlive_max (typically 25 * dimension).

  • precision_criterion: constant for now.

log(msg, level=None)[source]#

Print a message if its verbosity level is equal or lower than the given one (or always if level=None.

do_MC_sample(gpr, bounds, rng=None, sampler=None)[source]#
Returns:

May return None for any of y, sigma_y, weights

Return type:

X, y, sigma_y, weights

last_MC_sample(copy=False, warn_reweight=True)[source]#

Returns the last MC sample as (X, y, sigma_y, weights). y, sigma_y may be None if not computed while sampling. They can be generated with the gpr. If weights is None, all samples should be assumed to have equal weights.

Prints a warning if it is a reweighted sample.

last_MC_sample_getdist(params, warn_reweight=True)[source]#

Returns the last MC sample as a getdist.MCSamples instance.

Prints a warning if it is a reweighted sample.

multi_add(gpr, n_points=1, bounds=None, rng=None, force_resample=False)[source]#

Method to query multiple points where the objective function shall be evaluated.

The strategy which is used to query multiple points is by using the \(f(x)\sim \mu(x)\) strategy and and not changing the hyperparameters of the model.

It runs NS on the mean of the GP model, tracking the value of the acquisition function at every evaluation, and keeping a pool of candidates which is re-sorted whenever a new good candidate is found.

When run in parallel (MPI), returns the same values for all processes.

Parameters:
  • gpr (GaussianProcessRegressor) – The GP Regressor which is used as surrogate model.

  • n_points (int, optional (default=1)) – Number of points to be returned. A value large than 1 is useful if you can evaluate your objective in parallel, and thus obtain more objective function evaluations per unit of time.

  • bounds (np.array, optional) – Bounds inside which to look for the next proposals, e.g. the GPR trust region. If not defined, the prior bounds are used.

  • rng (int or numpy.random.Generator, optional) – The generator used to perform the acquisition process. If an integer is given, it is used as a seed for the default global numpy random number generator.

Returns:

  • X (numpy.ndarray, shape = (X_dim, n_points)) – The X values of the found optima

  • y_lies (numpy.ndarray, shape = (n_points,)) – The predicted values of the GP at the proposed sampling locations

  • fval (numpy.ndarray, shape = (n_points,)) – The values of the acquisition function at X_opt

class gp_acquisition.RankedPool(size, gpr, acq_func, verbose=1)[source]#

Bases: object

Keeps a ranked pool of sample proposal for Krigging-believer, given a GP regressor and an acquisition function.

Parameters:
  • size (int) – Number of points sampled proposals targeted.

  • gpr (GaussianProcessRegressor) – The GP Regressor which is used as surrogate model.

  • acq_func (callable) – Acquisition function used to rank the pool. Must be a function of (y, sigma) only, partially evaluated its hyperparameters, if necessary.

  • verbose (1, 2, 3, optional (default: 1)) – Level of verbosity. 3 prints Infos, Warnings and Errors, 2 Warnings and Errors, and 1 only Errors. Should be set to 2 or 3 if problems arise.

property min_acq#

Minimum acquisition function value in order for a point to be considered, i.e. the conditioned acquisition function value of the last element in the pool.

NB: while the pool is not yet fool, empty points are assigned minus infinity as acquisition function value, so one can still use the condition acq_value > RankedPool.min_acq in order to decide whether to add a point.

log(level=None, msg='')[source]#

Print a message if its verbosity level is equal or lower than the given one (or always if level=None.

str_point(X, y, sigma, acq, sigma_cond=None, acq_cond=None)[source]#

Retuns a standardised string to log a point.

str_pool(include_last=False, last_sorted=None, prefix=None, suffix_last=None)[source]#

Returns a string representation of the current pool.

log_pool(level=4, include_last=False, last_sorted=None, prefix=None, suffix_last=None)[source]#

Prints the current pool.

add(X, y=None, sigma=None, acq=None, method='single sort acq')[source]#

Adds points to the pool.

Parameters:
  • X (np.ndarray (1- or 2-dimensional)) – Position of the proposed sample.

  • y (np.ndarray (1 dimension fewer than X) or float, optional) – Predicted value under the GPR.

  • sigma (np.ndarray (1 dimension fewer than X) or float, optional) – Predicted standard deviation under the GPR. Will be computed if not passed.

  • acq (np.ndarray (1 dimension fewer than X) or float, optional) – Acquisition function values (unconditioned). Will be computed if not passed.

  • method ({"single", "single sort acq", "single sort y", "bulk"}) – Uses the one-by-one algorithm (“single”, with pre-sorting according to X if “single sort X”), or the bulk algorithm.

add_bulk(X, y, sigma, acq, i_start=0)[source]#

Tries to fill the pull using a batch of points at once:

  1. Compute their acquisition value conditioned to the position above (if any).

  2. Pick the best and delete infinities (acq cannot grow with more conditioning).

  3. Place it in the current position, and do a recursive call for the next one.

The advantage of this method with respect to add_one is that it can use vectorization to compute the std’s, but on the other hand it needs to compute many more of them, so it will be better only up to some dimension and some amount of training, and then add_one will take over.

add_one(X, y=None, sigma=None, acq=None, acq_nan_is_null=False)[source]#

Tries to add one point to the pool:

  1. Computes its acquisition function value. (If the pools is not full, just adds it and re-sorts.)

  2. Finds the provisional position of the point in the list, without KB. Notice that once KB is taken into account, the KB-ranked position can only be lower. (If the acq. fun. value is lower than the last point, it is discarded.)

  3. Updates its aquisition function value KB-informed by the points above, finds the new provisional position, and repeats until the position stabilises.

  4. Sorts the list of points below the new one, recursively applying KB.

Parameters:
  • X (np.ndarray with 1 dimension) – Position of the proposed sample.

  • y (float, optional) – Predicted value under the GPR.

  • sigma (float, optional) – Predicted standard deviation under the GPR.

  • acq (float, optional) – Value of the acquisition function.

  • acq_nan_is_null (bool, optional (default: False)) – Whether NaN’s in the acquisition function should be interpreted as null value.

Raises:

ValueError – if invalid acq. function value, unless acq_nan_is_null=True.:

cache_model(i)[source]#

Cache the GP model that contains the training set plus the pool points up to position i (0-based), with predicted dummy y, keeping the GPR hyperparameters unchanged.

Stores and returns the conditioned gpr (or the original one if i=-1).

reset_cache()[source]#

Deletes the cached GPR models when there are not needed any more.

copy(drop_empty=False)[source]#

Returns a copy of the pool, missing references to external objects.

If drop_empty=True (default: False), the returned copy has its size reduced to contain just the final set of finite conditioned acquisition points.

sort(i_start=0)[source]#

Sorts in descending order of acquisition function value, where the acq of the i-th element (0-based) is conditioned on the GPR model that includes the points with j<i with their predicted (mean) y.

If i_start!=0 is given, assumes the upper elements in the list are already sorted following this criterion.

This function assumes that the augmented model just abobe i_start has not already been cached, and starts by caching it and computing acquisition function values i_start down.