SparsePCA#
Warning
In the old version of abess (before 0.4.0), this model is named abess.pca.abessPCA
.
Please note that it will be deprecated in version 0.6.0.
- class abess.decomposition.SparsePCA[source]#
Adaptive Best-Subset Selection(ABESS) algorithm for principal component analysis.
- Parameters
support_size (array-like, optional) -- default=range(min(n, int(n/(log(log(n))log(p))))). An integer vector representing the alternative support sizes.
group (int, optional, default=np.ones(p)) -- The group index for each variable.
ic_type ({'aic', 'bic', 'gic', 'ebic', 'loss'}, optional, default='loss') -- The type of criterion for choosing the support size if cv=1.
ic_coef (float, optional, default=1.0) -- Constant that controls the regularization strength on chosen information criterion.
cv (int, optional, default=1) --
The folds number when use the cross-validation method.
If cv=1, cross-validation would not be used.
If cv>1, support size will be chosen by CV's test loss, instead of IC.
cv_score ({'test_loss'}, optional, default='test_loss') -- The score used on test data for CV. Only 'test_loss' is supported for PCA now.
thread (int, optional, default=1) --
Max number of multithreads.
If thread = 0, the maximum number of threads supported by the device will be used.
A_init (array-like, optional, default=None) -- Initial active set before the first splicing.
always_select (array-like, optional, default=None) -- An array contains the indexes of variables we want to consider in the model.
max_iter (int, optional, default=20) -- Maximum number of iterations taken for the splicing algorithm to converge. Due to the limitation of loss reduction, the splicing algorithm must be able to converge. The number of iterations is only to simplify the implementation.
is_warm_start (bool, optional, default=True) -- When tuning the optimal parameter combination, whether to use the last solution as a warm start to accelerate the iterative convergence of the splicing algorithm.
screening_size (int, optional, default=-1) --
The number of variables remaining after screening. It should be a non-negative number smaller than p, but larger than any value in support_size.
If screening_size=-1, screening will not be used.
If screening_size=0, screening_size will be set as \(\\min(p, int(n / (\\log(\\log(n))\\log(p))))\).
splicing_type ({0, 1}, optional, default=1) -- The type of splicing. "0" for decreasing by half, "1" for decresing by one.
- coef_#
The first \(k\) principal axes in feature space, which are sorted by decreasing explained variance.
- Type
array-like, shape(p_features, ) or (p_features, k)
References
Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123, 2020.
Examples
Results may differ with different version of numpy.
>>> ### Sparsity known >>> >>> from abess.decomposition import SparsePCA >>> import numpy as np >>> np.random.seed(12345) >>> model = SparsePCA(support_size = 10) >>> >>> ### X known >>> X = np.random.randn(100, 50) >>> model.fit(X) SparsePCA(support_size=10) >>> print(np.nonzero(model.coef_)[0]) [10 26 31 33 35 36 38 42 43 49] >>> >>> ### X unknown, but Sigma known >>> model.fit(Sigma = np.cov(X.T)) SparsePCA(support_size=10) >>> print(np.nonzero(model.coef_)[0]) [10 26 31 33 35 36 38 42 43 49]
- __init__(support_size=None, group=None, ic_type='loss', ic_coef=1.0, cv=1, cv_score='test_loss', thread=1, A_init=None, always_select=None, max_iter=20, exchange_num=5, is_warm_start=True, splicing_type=1, screening_size=-1)[source]#
- transform(X)[source]#
For PCA model, apply dimensionality reduction to given data.
- Parameters
X (array-like, shape (n_samples, p_features)) -- Sample matrix to be transformed.
- ratio(X)[source]#
Give new data, and it returns the explained ratio.
- Parameters
X (array-like, shape (n_samples, n_features)) -- Sample matrix.
- fit(X=None, y=None, is_normal=False, Sigma=None, number=1, n=None, sparse_matrix=False)[source]#
The fit function is used to transfer the information of data and return the fit result.
- Parameters
X (array-like, shape(n_samples, p_features)) -- Training data.
y (ignore) -- Ignore.
is_normal (bool, optional, default=False) -- whether normalize the variables array before fitting the algorithm.
weight (array-like, shape(n_samples,), optional, default=np.ones(n)) -- Individual weights for each sample. Only used for is_weight=True.
Sigma (array-like, shape(p_features, p_features), optional) -- default=np.cov(X.T). Sample covariance matrix. For PCA, it can be given as input, instead of X. But if X is given, Sigma will be set to np.cov(X.T).
number (int, optional, default=1) -- Indicates the number of PCs returned.
n (int, optional, default=X.shape[0] or 1) --
Sample size.
if X is given, it would be X.shape[0] by default;
if X is not given (Sigma is given), it would be 1 by default.
sparse_matrix (bool, optional, default=False) -- Set as True to treat X as sparse matrix during fitting. It would be automatically set as True when X has the sparse matrix type defined in scipy.sparse.
- fit_transform(X=None, y=None, is_normal=False, Sigma=None, number=1, n=None, sparse_matrix=False)[source]#
Fit and transform the sample matrix. Returns transformed data in expected dimension.
- Parameters
X (array-like, shape(n_samples, p_features)) -- Training data.
y (ignore) -- Ignore.
is_normal (bool, optional, default=False) -- whether normalize the variables array before fitting the algorithm.
weight (array-like, shape(n_samples,), optional, default=np.ones(n)) -- Individual weights for each sample. Only used for is_weight=True.
Sigma (array-like, shape(p_features, p_features), optional) -- default=np.cov(X.T). Sample covariance matrix. For PCA, it can be given as input, instead of X. But if X is given, Sigma will be set to np.cov(X.T).
number (int, optional, default=1) -- Indicates the number of PCs returned.
n (int, optional, default=X.shape[0] or 1) --
Sample size.
if X is given, it would be X.shape[0] by default;
if X is not given (Sigma is given), it would be 1 by default.
- set_fit_request(*, Sigma='$UNCHANGED$', is_normal='$UNCHANGED$', n='$UNCHANGED$', number='$UNCHANGED$', sparse_matrix='$UNCHANGED$')#
Request metadata passed to the
fit
method.Note that this method is only relevant if
enable_metadata_routing=True
(seesklearn.set_config()
). Please see User Guide on how the routing mechanism works.The options for each parameter are:
True
: metadata is requested, and passed tofit
if provided. The request is ignored if metadata is not provided.False
: metadata is not requested and the meta-estimator will not pass it tofit
.None
: metadata is not requested, and the meta-estimator will raise an error if the user provides it.str
: metadata should be passed to the meta-estimator with this given alias instead of the original name.
The default (
sklearn.utils.metadata_routing.UNCHANGED
) retains the existing request. This allows you to change the request for some parameters and not others.New in version 1.3.
Note
This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a
Pipeline
. Otherwise it has no effect.- Parameters
Sigma (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for
Sigma
parameter infit
.is_normal (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for
is_normal
parameter infit
.n (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for
n
parameter infit
.number (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for
number
parameter infit
.sparse_matrix (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for
sparse_matrix
parameter infit
.self (SparsePCA) --
- Returns
self -- The updated object.
- Return type