RobustPCA

Warning

In the old version of abess (before 0.4.0), this class is named abess.pca.abessRPCA. Please note that it will be deprecated in version 0.6.0.

class abess.decomposition.RobustPCA(support_size=None, ic_type='gic', ic_coef=1.0, thread=1, A_init=None, always_select=None, max_iter=20, exchange_num=5, is_warm_start=True, splicing_type=1)[source]

Adaptive Best-Subset Selection(ABESS) algorithm for robust principal component analysis.

Parameters
  • support_size (array-like, optional) -- default=range(min(n, int(n/(log(log(n))log(p))))). An integer vector representing the alternative support sizes.

  • ic_type ({'aic', 'bic', 'gic', 'ebic', 'loss'}, optional, default='gic') -- The type of criterion for choosing the support size.

  • ic_coef (float, optional, default=1.0) -- Constant that controls the regularization strength on chosen information criterion.

  • thread (int, optional, default=1) --

    Max number of multithreads.

    • If thread = 0, the maximum number of threads supported by the device will be used.

  • A_init (array-like, optional, default=None) -- Initial active set before the first splicing.

  • always_select (array-like, optional, default=None) -- An array contains the indexes of variables we want to consider in the model.

  • max_iter (int, optional, default=20) -- Maximum number of iterations taken for the splicing algorithm to converge. Due to the limitation of loss reduction, the splicing algorithm must be able to converge. The number of iterations is only to simplify the implementation.

  • is_warm_start (bool, optional, default=True) -- When tuning the optimal parameter combination, whether to use the last solution as a warm start to accelerate the iterative convergence of the splicing algorithm.

  • splicing_type ({0, 1}, optional, default=1) -- The type of splicing. "0" for decreasing by half, "1" for decresing by one.

coef_

The transformed sample matrix after robust PCA.

Type

array-like, shape(n_samples, p_features)

References

  • Junxian Zhu, Canhong Wen, Jin Zhu, Heping Zhang, and Xueqin Wang. A polynomial algorithm for best-subset selection problem. Proceedings of the National Academy of Sciences, 117(52):33117-33123, 2020.

Examples

>>> ### Sparsity known
>>>
>>> from abess.decomposition import RobustPCA
>>> import numpy as np
>>> np.random.seed(12345)
>>> model = RobustPCA(support_size = 10)
>>>
>>> ### X known
>>> X = np.random.randn(100, 50)
>>> model.fit(X, r = 10)
RobustPCA(always_select=[], support_size=10)
>>> print(model.coef_)
[[0.         0.         0.         ... 0.         3.71203604 0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 ...
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]
 [0.         0.         0.         ... 0.         0.         0.        ]]
fit(X, y=None, r=None, sparse_matrix=False)[source]

The fit function is used to transfer the information of data and return the fit result.

Parameters
  • X (array-like, shape(n_samples, p_features)) -- Training data.

  • y (ignore) -- Ignore.

  • r (int) -- Rank of the (recovered) information matrix L. It should be smaller than rank of X (at least smaller than X.shape[1]).

  • sparse_matrix (bool, optional, default=False) -- Set as True to treat X as sparse matrix during fitting. It would be automatically set as True when X has the sparse matrix type defined in scipy.sparse.