Note
Click here to download the full example code
ABESS Algorithm: Details¶
Introduction¶
With the abess
library, users can use the ABESS algorithm to efficiently solve many best subset selection problems.
The aim of this page is providing a complete and coherent documentation for ABESS algorithm under linear model
such that users can easily understand the ABESS algorithm,
thereby facilitating the usage of abess
software.
linear regression¶
Sacrifices¶
Consider the \(\ell_{0}\) constraint minimization problem,
where \(\mathcal{L}_{n}(\boldsymbol \beta)=\frac{1}{2 n}\|y-X \beta\|_{2}^{2} .\) Without loss of generality, we consider \(\|\boldsymbol{\beta}\|_{0}=\mathrm{s}\). Given any initial set \(\mathcal{A} \subset \mathcal{S}=\{1,2, \ldots, p\}\) with cardinality \(|\mathcal{A}|=s\), denote \(\mathcal{I}=\mathcal{A}^{\mathrm{c}}\) and compute:
We call \(\mathcal{A}\) and \(\mathcal{I}\) as the active set and the inactive set, respectively.
Given the active set \(\mathcal{A}\) and \(\hat{\boldsymbol{\beta}}\), we can define the following two types of sacrifices:
1. Backward sacrifice: For any \(j \in \mathcal{A}\), the magnitude of discarding variable \(j\) is,
2. Forward sacrifice: For any \(j \in \mathcal{I}\), the magnitude of adding variable \(j\) is,
Algorithm¶
Best-Subset Selection with a Given Support Size¶
Unfortunately, it is noteworthy that these two sacrifices are incomparable because they have different sizes of support set. However, if we exchange some "irrelevant" variables in \(\mathcal{A}\) and some "important" variables in \(\mathcal{I}\), it may result in a higher-quality solution. This intuition motivates our splicing method. Specifically, given any splicing size \(k \leq s\), define
to represent \(k\) least relevant variables in \(\mathcal{A}\) and,
to represent \(k\) most relevant variables in \(\mathcal{I} .\)
Algorithm 1: BESS.Fix(s): Best-Subset Selection with a given support size \(s\).¶
Input: \(X, y\), a positive integer \(k_{\max }\), and a threshold \(\tau_{s}\).
Initialize:
and \(\left(\boldsymbol\beta^{0}, d^{0}\right):\)
For \(m=0,1, \ldots\), do
\[\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1}, \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)= \text{Splicing} \left(\boldsymbol{\beta}^{m}, \boldsymbol{d}^{m}, \mathcal{A}^{m}, \mathcal{I}^{m}, k_{\max }, \tau_{s}\right).\]If \(\left(\mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)=\left(\mathcal{A}^{m},\mathcal{I}^{m}\right)\), then stop.
End For
Output \((\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1} \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right).\)
Algorithm 2: Splicing \(\left(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}, k_{\max }, \tau_{s}\right)\)¶
Input: \(\boldsymbol{\beta}, \boldsymbol{d}, \mathcal{A}, \mathcal{I}, k_{\max }\), and \(\tau_{\mathrm{s}} .\)
Initialize: \(L_{0}=L=\frac{1}{2 n}\|y-X \beta\|_{2}^{2}\), and set
\[\xi_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\beta_{j}\right)^{2}, \zeta_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\frac{d_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}, j=1, \ldots, p.\]For \(k=1,2, \ldots, k_{\max }\), do
\[\begin{split}\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\},\\ \mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} \mathrm{I}\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\}.\end{split}\]Let \(\tilde{\mathcal{A}}_{k}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}, \tilde{\mathcal{I}}_{k}=\left(\mathcal{I} \backslash \mathcal{I}_{k}\right) \cup \mathcal{A}_{k}\) and solve:
\[\begin{split}\tilde{\boldsymbol{\beta}}_{{\mathcal{A}}_{k}}=\left(\boldsymbol{X}_{\mathcal{A}_{k}}^{\top} \boldsymbol{X}_{{\mathcal{A}}_{k}}\right)^{-1} \boldsymbol{X}_{{\mathcal{A}_{k}}}^{\top} y, \quad \tilde{\boldsymbol{\beta}}_{{\mathcal{I}}_{k}}=0\\ \tilde{\boldsymbol d}_{\mathcal{I}^k}=X_{\mathcal{I}^k}^{\top}(y-X \tilde{\beta}) / n,\quad \tilde{\boldsymbol d}_{\mathcal{A}^k} = 0.\end{split}\]Compute: \(\mathcal{L}_{n}(\tilde{\boldsymbol\beta})=\frac{1}{2 n}\|y-X \tilde{\boldsymbol\beta}\|_{2}^{2}.\) If \(L>\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), then
\[\begin{split}(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\tilde{\boldsymbol{\beta}}, \tilde{\boldsymbol{d}}, \tilde{\mathcal{A}}_{k}, \tilde{\mathcal{I}}_{k}\right)\\ L=\mathcal{L}_{n}(\tilde{\boldsymbol\beta}).\end{split}\]End for
If \(L_{0}-L<\tau_{s}\), then \((\hat{\boldsymbol\beta}, \hat{d}, \hat{A}, \hat{I})=(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}).\)
Output \((\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})\).
Determining the Best Support Size with SIC¶
In practice, the support size is usually unknown. We use a datadriven procedure to determine s. For any active set \(\mathcal{A}\), define an \(\mathrm{SIC}\) as follows:
where \(\mathcal{L}_{\mathcal{A}}=\min _{\beta_{\mathcal{I}}=0} \mathcal{L}_{n}(\beta), \mathcal{I}=(\mathcal{A})^{c}\). To identify the true model, the model complexity penalty is \(\log p\) and the slow diverging rate \(\log \log n\) is set to prevent underfitting. Theorem 4 states that the following ABESS algorithm selects the true support size via SIC.
Let \(s_{\max }\) be the maximum support size. We suggest \(s_{\max }=o\left(\frac{n}{\log p}\right)\) as the maximum possible recovery size. Typically, we set \(s_{\max }=\left[\frac{n}{\log p \log \log n}\right]\) where \([x]\) denotes the integer part of \(x\).
Algorithm 3: ABESS.¶
Input: \(X, y\), and the maximum support size \(s_{\max } .\)
For \(s=1,2, \ldots, s_{\max }\), do
\[\left(\hat{\boldsymbol{\beta}}_{s}, \hat{\boldsymbol{d}}_{s}, \hat{\mathcal{A}}_{s}, \hat{\mathcal{I}}_{s}\right)= \text{BESS.Fixed}(s).\]End for
Compute the minimum of SIC:
\[s_{\min }=\arg \min _{s} \operatorname{SIC}\left(\hat{\mathcal{A}}_{s}\right).\]Output \(\left(\hat{\boldsymbol{\beta}}_{s_{\min}}, \hat{\boldsymbol{d}}_{s_{\min }}, \hat{A}_{s_{\min }}, \hat{\mathcal{I}}_{s_{\min }}\right) .\)
Now, enjoy the data analysis with abess
library:
import abess
# sphinx_gallery_thumbnail_path = 'Tutorial/figure/icon_noborder.png'
Total running time of the script: ( 0 minutes 0.000 seconds)