Source code for svm

r"""
This module uses a Support vector machine (SVM) with an RBF kernel to classify
regions which are "safe" to explore in contrast to regions which are "unsafe"
to explore since they are infinite. This is done in an attempt to hinder the
exploration of parts of the parameter space which have a :math:`-\infty` log-posterior
value. These values need to be filtered out since feeding them to the GP
Regressor will break it. Nevertheless this is important information that we
shouldn't throw away. We will also need the SVM later when doing the MCMC run
to tell the model which regions it shouldn't visit. In essence our process
shrinks the prior to a region where the model thinks that all values of the
log-posterior distribution are finite.
"""

import warnings
import numpy as np
from sklearn.svm import SVC
from gpry.tools import check_random_state


[docs] class SVM(SVC): r""" Wrapper for the sklearn `RBF kernel SVM <https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html>`_. Classifies points as finite of non-finite, in order to exclude the latter from the training set of a parent GPR. It keeps track of the full training set, including classified-infinite points. The classification is performed using a threshold understood as a positive difference against the current maximum ``y`` in the training set. The threshold is passed at fitting time and not at initialisation, in case the classifier is defined in a transformed coordinate space, with a transformation that changes through the training of the parent GPR. (NB: passing the threshold every time is a compromise that allows to keep the full training set stored in this object with non-reduced ``y`` values while avoiding preprocessing overhead.) Also in case there is a coordinate transformation, the training set of this object should not be obtained directly, but via properties of the parent GP instead that will undo the transformation. The same applying to calling any method directly. Parameters ---------- C : float, default=1e7 Regularization parameter. The strength of the regularization is inversely proportional to C. Must be strictly positive. The penalty is a squared l2 penalty. kernel : {'linear', 'poly', 'rbf', 'sigmoid', 'precomputed'}, default='rbf' Specifies the kernel type to be used in the algorithm. It must be one of 'linear', 'poly', 'rbf', 'sigmoid', 'precomputed' or a callable. If none is given, 'rbf' will be used. If a callable is given it is used to pre-compute the kernel matrix from data matrices; that matrix should be an array of shape ``(n_samples, n_samples)``. degree : int, default=3 Degree of the polynomial kernel function ('poly'). Ignored by all other kernels. gamma : {'scale', 'auto'} or float, default='scale' Kernel coefficient for 'rbf', 'poly' and 'sigmoid'. * if ``gamma='scale'`` (default) is passed then it uses 1 / (n_features * X.var()) as value of gamma, * if 'auto', uses 1 / n_features. coef0 : float, default=0.0 Independent term in kernel function. It is only significant in 'poly' and 'sigmoid'. shrinking : bool, default=True Whether to use the shrinking heuristic. probability : bool, default=False Whether to enable probability estimates. This must be enabled prior to calling `fit`, will slow down that method as it internally uses 5-fold cross-validation, and `predict_proba` may be inconsistent with `predict`. tol : float, default=1e-3 Tolerance for stopping criterion. cache_size : float, default=200 Specify the size of the kernel cache (in MB). class_weight : dict or 'balanced', default=None Set the parameter C of class i to class_weight[i]*C for SVC. If not given, all classes are supposed to have weight one. The "balanced" mode uses the values of y to automatically adjust weights inversely proportional to class frequencies in the input data as ``n_samples / (n_classes * np.bincount(y))``. verbose : bool, default=False Enable verbose output. Note that this setting takes advantage of a per-process runtime setting in libsvm that, if enabled, may not work properly in a multithreaded context. max_iter : int, default=-1 Hard limit on iterations within solver, or -1 for no limit. decision_function_shape : {'ovo', 'ovr'}, default='ovr' Whether to return a one-vs-rest ('ovr') decision function of shape (n_samples, n_classes) as all other classifiers, or the original one-vs-one ('ovo') decision function of libsvm which has shape (n_samples, n_classes * (n_classes - 1) / 2). However, one-vs-one ('ovo') is always used as multi-class strategy. The parameter is ignored for binary classification. break_ties : bool, default=False If true, ``decision_function_shape='ovr'``, and number of classes > 2, predict will break ties according to the confidence values of decision_function; otherwise the first class among the tied classes is returned. Please note that breaking ties comes at a relatively high computational cost compared to a simple predict. random_state : int, RandomState instance or None, default=None Controls the pseudo random number generation for shuffling the data for probability estimates. Ignored when `probability` is False. Pass an int for reproducible output across multiple function calls. Attributes ---------- all_finite : bool Is true when all posterior values which have been sampled are finite which removes the need for fitting the SVM. class_weight_ : ndarray of shape (n_classes,) Multipliers of parameter C for each class. Computed based on the ``class_weight`` parameter. classes_ : ndarray of shape (n_classes,) The classes labels. coef_ : ndarray of shape (n_classes * (n_classes - 1) / 2, n_features) Weights assigned to the features (coefficients in the primal problem). This is only available in the case of a linear kernel. `coef_` is a readonly property derived from `dual_coef_` and `support_vectors_`. dual_coef_ : ndarray of shape (n_classes -1, n_SV) Dual coefficients of the support vector in the decision function, multiplied by their targets. For multiclass, coefficient for all 1-vs-1 classifiers. The layout of the coefficients in the multiclass case is somewhat non-trivial. fit_status_ : int 0 if correctly fitted, 1 otherwise (will raise warning) intercept_ : ndarray of shape (n_classes * (n_classes - 1) / 2,) Constants in decision function. n_features_in_ : int Number of features seen during fit. feature_names_in_ : ndarray of shape (`n_features_in_`,) Names of features seen during fit. Defined only when `X` has feature names that are all strings. support_ : ndarray of shape (n_SV) Indices of support vectors. support_vectors_ : ndarray of shape (n_SV, n_features) Support vectors. n_support_ : ndarray of shape (n_classes,), dtype=int32 Number of support vectors for each class. probA_ : ndarray of shape (n_classes * (n_classes - 1) / 2) probB_ : ndarray of shape (n_classes * (n_classes - 1) / 2) If `probability=True`, it corresponds to the parameters learned in Platt scaling to produce probability estimates from decision values. If `probability=False`, it's an empty array. Platt scaling uses the logistic function ``1 / (1 + exp(decision_value * probA_ + probB_))`` where ``probA_`` and ``probB_`` are learned from the dataset.. shape_fit_ : tuple of int of shape (n_dimensions_of_X,) Array dimensions of training vector ``X``. """ def __init__( self, C=1e7, kernel="rbf", degree=3, gamma="scale", coef0=0.0, shrinking=True, probability=False, tol=0.001, cache_size=200, class_weight=None, verbose=False, max_iter=-1, decision_function_shape="ovr", break_ties=False, random_state=None, ): self.X_train = None self.y_train = None self.y_finite = None self.at_least_one_finite = False self.all_finite = False self.diff_threshold = None self._max_y = None # In the SVM, since we have not wrapper the calls to the RNG, # (as we have for the GPR), we need to repackage the new numpy Generator # as a RandomState, which is achieved by gpry.tools.check_random_state random_state = check_random_state(random_state, convert_to_random_state=True) super().__init__( C=C, kernel=kernel, degree=degree, gamma=gamma, coef0=coef0, shrinking=shrinking, probability=probability, tol=tol, cache_size=cache_size, class_weight=class_weight, verbose=verbose, max_iter=max_iter, decision_function_shape=decision_function_shape, break_ties=break_ties, random_state=random_state, ) @property def d(self): """Dimension of the feature space.""" if self.X_train is None: raise ValueError( "You need to add some data before determining its dimension." ) return self.X_train.shape[1] @property def abs_threshold(self): """ Current absolute threshold for y values, in the transformed space of the SVM. """ return self._max_y - self.diff_threshold @property def n(self): """Number of training points.""" if self.y_train is None: return 0 return len(self.y_train)
[docs] def fit(self, X, y, diff_threshold): r""" Fits the SVM with two categorial classes: * :math:`\tilde{y}=True` Finite points * :math:`\tilde{y}=False` Infinite points where :math:`\tilde{y}` is produced after checking the input ``y``'s against an internal threshold value, which may also be adjusted at this step. Parameters ---------- X : array-like, shape = (n_samples, n_features) Training data. y : array-like, shape = (n_samples, [n_output_dims]) Target values. Returns ------- y_finite : array-like bool Classification of current points. """ self.X_train = np.copy(X) self.y_train = np.copy(y) # Corner case: only -inf points being trained on: nothing to do. if np.all(self.y_train == -np.inf): self.at_least_one_finite = False self.y_finite = np.full(len(X), False) return self.y_finite self.at_least_one_finite = True # Update threshold value self.diff_threshold = diff_threshold self._max_y = max(self.y_train) # Turn into boolean categorial values self.y_finite = self._is_finite_raw( self.y_train, self.diff_threshold, max_y=self._max_y ) # If no value below the threshold, nothing to do. Save test for faster checks. if np.all(self.y_finite): self.all_finite = True return self.y_finite self.all_finite = False super().fit(self.X_train, self.y_finite) return self.y_finite
@staticmethod def _is_finite_raw(y, diff_threshold, max_y=None): """ Returns the indices of the finite points, depending on some delta-like threshold, in the same space (transformed or not) as the y's. This is a static method for an untrained SVM, meaning that the maximum of ``y`` to compare with must either be passed or it uses the maximum of the input. Notes ----- This is *not* a predictor method, but a simple threshold check, i.e. it does not predict whether the value at some particular location is expected to be finite. For that purpose, use the ``predict`` method. This may lead to inconsistencies, such as classifying as finite the ``y`` computed for an ``X`` for which an infinite value is predicted, or vice versa. """ if max_y is None: max_y = np.max(y) # There are two corner cases here: # - If y=inf and diff_threshold=inf --> True & False = False (needs the isfinite!) # - If y=np.nan --> False & False = False return np.greater_equal(y, max_y - diff_threshold) & np.isfinite(y)
[docs] def is_finite(self, y): """ Returns the indices of the finite points, depending on the current threshold and maximum ``y`` value in the training set (not the input). The ``y`` input must be passed in the space in which the SVM was defined. """ if self.y_train is None: raise ValueError("Cannot do anything: the SVM has not been trained yet!") return self._is_finite_raw(y, self.diff_threshold, self._max_y)
[docs] def predict(self, X, validate=True): """ Wrapper for the predict method of the SVM which does the preprocessing. Returns a boolean array which is true at locations where the SVM predicts a finite posterior distribution and False where it predicts infinite values. Parameters ---------- X : array-like, shape = (n_samples, n_features) Query points where SVM is evaluated. Returns ------- A boolean array which is True at locations predicted finite posterior and False at locations with predicted infinite posterior. Raises ------ ValueError: "ndarray is not C-contiguous" May be raised if ``validate`` is False. Call ``numpy.ascontiguousarray()`` on the input before the call. """ if self.y_train is None: raise ValueError("The SVM has not been trained yet.") if validate: X = np.atleast_2d(X) if self.all_finite: return np.full(len(X), True) if not self.at_least_one_finite: warnings.warn( "Only -inf points added to the classifier so far. " "Returning False unconditionally." ) return np.full(len(X), False) if validate: return super().predict(X) else: # valid for our use only (dense, 2 classes), when input is guaranteed valid y = self._dense_predict(X) return self.classes_.take(np.asarray(y, dtype=np.intp))