RobustPCA#

Warning

In the old version of abess (before 0.4.0), this class is named abess.pca.abessRPCA. Please note that it will be deprecated in version 0.6.0.

class abess.decomposition.RobustPCA[source]#

Adaptive Best-Subset Selection(ABESS) algorithm for robust principal component analysis.

Parameters
  • support_size (array-like, optional) -- default=range(min(n, int(n/(log(log(n))log(p))))). An integer vector representing the alternative support sizes.

  • ic_type ({'aic', 'bic', 'gic', 'ebic', 'loss'}, optional, default='gic') -- The type of criterion for choosing the support size.

  • ic_coef (float, optional, default=1.0) -- Constant that controls the regularization strength on chosen information criterion.

  • thread (int, optional, default=1) --

    Max number of multithreads.

    • If thread = 0, the maximum number of threads supported by the device will be used.

  • A_init (array-like, optional, default=None) -- Initial active set before the first splicing.

  • always_select (array-like, optional, default=None) -- An array contains the indexes of variables we want to consider in the model.

  • max_iter (int, optional, default=20) -- Maximum number of iterations taken for the splicing algorithm to converge. Due to the limitation of loss reduction, the splicing algorithm must be able to converge. The number of iterations is only to simplify the implementation.

  • is_warm_start (bool, optional, default=True) -- When tuning the optimal parameter combination, whether to use the last solution as a warm start to accelerate the iterative convergence of the splicing algorithm.

  • splicing_type ({0, 1}, optional, default=1) -- The type of splicing. "0" for decreasing by half, "1" for decresing by one.

coef_#

The transformed sample matrix after robust PCA.

Type

array-like, shape(n_samples, p_features)

References

  • Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123, 2020.

Examples

Results may differ with different version of numpy.

>>> ### Sparsity known
>>>
>>> from abess.decomposition import RobustPCA
>>> import numpy as np
>>> np.random.seed(12345)
>>> model = RobustPCA(support_size = 10)
>>>
>>> ### X known
>>> X = np.random.randn(100, 50)
>>> model.fit(X, r = 10)
RobustPCA(support_size=10)
>>> print(np.vstack(np.nonzero(model.coef_)))
[[ 6 10 24 30 33 35 40 61 73 85]
 [43 21 23 30 44 32 49  8 48 19]]
__init__(support_size=None, ic_type='gic', ic_coef=1.0, thread=1, A_init=None, always_select=None, max_iter=20, exchange_num=5, is_warm_start=True, splicing_type=1)[source]#
fit(X, y=None, r=None, sparse_matrix=False)[source]#

The fit function is used to transfer the information of data and return the fit result.

Parameters
  • X (array-like, shape(n_samples, p_features)) -- Training data.

  • y (ignore) -- Ignore.

  • r (int) -- Rank of the (recovered) information matrix L. It should be smaller than rank of X (at least smaller than X.shape[1]).

  • sparse_matrix (bool, optional, default=False) -- Set as True to treat X as sparse matrix during fitting. It would be automatically set as True when X has the sparse matrix type defined in scipy.sparse.

set_fit_request(*, r='$UNCHANGED$', sparse_matrix='$UNCHANGED$')#

Request metadata passed to the fit method.

Note that this method is only relevant if enable_metadata_routing=True (see sklearn.set_config()). Please see User Guide on how the routing mechanism works.

The options for each parameter are:

  • True: metadata is requested, and passed to fit if provided. The request is ignored if metadata is not provided.

  • False: metadata is not requested and the meta-estimator will not pass it to fit.

  • None: metadata is not requested, and the meta-estimator will raise an error if the user provides it.

  • str: metadata should be passed to the meta-estimator with this given alias instead of the original name.

The default (sklearn.utils.metadata_routing.UNCHANGED) retains the existing request. This allows you to change the request for some parameters and not others.

New in version 1.3.

Note

This method is only relevant if this estimator is used as a sub-estimator of a meta-estimator, e.g. used inside a Pipeline. Otherwise it has no effect.

Parameters
  • r (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for r parameter in fit.

  • sparse_matrix (str, True, False, or None, default=sklearn.utils.metadata_routing.UNCHANGED) -- Metadata routing for sparse_matrix parameter in fit.

  • self (RobustPCA) --

Returns

self -- The updated object.

Return type

object