Note
Go to the end to download the full example code
ABESS Algorithm: Details#
Introduction#
With the abess
library, users can use the ABESS algorithm to efficiently solve many best subset selection problems.
The aim of this page is providing a complete and coherent documentation for ABESS algorithm under linear model
such that users can easily understand the ABESS algorithm,
thereby facilitating the usage of abess
software.
linear regression#
Sacrifices#
Consider the \(\ell_{0}\) constraint minimization problem,
where \(\mathcal{L}_{n}(\boldsymbol \beta)=\frac{1}{2 n}\|y-X \beta\|_{2}^{2} .\) Without loss of generality, we consider \(\|\boldsymbol{\beta}\|_{0}=\mathrm{s}\). Given any initial set \(\mathcal{A} \subset \mathcal{S}=\{1,2, \ldots, p\}\) with cardinality \(|\mathcal{A}|=s\), denote \(\mathcal{I}=\mathcal{A}^{\mathrm{c}}\) and compute:
We call \(\mathcal{A}\) and \(\mathcal{I}\) as the active set and the inactive set, respectively.
Given the active set \(\mathcal{A}\) and \(\hat{\boldsymbol{\beta}}\), we can define the following two types of sacrifices:
1. Backward sacrifice: For any \(j \in \mathcal{A}\), the magnitude of discarding variable \(j\) is,
2. Forward sacrifice: For any \(j \in \mathcal{I}\), the magnitude of adding variable \(j\) is,
Algorithm#
Best-Subset Selection with a Given Support Size#
Unfortunately, it is noteworthy that these two sacrifices are incomparable because they have different sizes of support set. However, if we exchange some "irrelevant" variables in \(\mathcal{A}\) and some "important" variables in \(\mathcal{I}\), it may result in a higher-quality solution. This intuition motivates our splicing method. Specifically, given any splicing size \(k \leq s\), define
to represent \(k\) least relevant variables in \(\mathcal{A}\) and,
to represent \(k\) most relevant variables in \(\mathcal{I} .\)
Algorithm 1: BESS.Fix(s): Best-Subset Selection with a given support size \(s\).#
Input: \(X, y\), a positive integer \(k_{\max }\), and a threshold \(\tau_{s}\).
Initialize:
and \(\left(\boldsymbol\beta^{0}, d^{0}\right):\)
For \(m=0,1, \ldots\), do
\[\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1}, \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)= \text{Splicing} \left(\boldsymbol{\beta}^{m}, \boldsymbol{d}^{m}, \mathcal{A}^{m}, \mathcal{I}^{m}, k_{\max }, \tau_{s}\right).\]If \(\left(\mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right)=\left(\mathcal{A}^{m},\mathcal{I}^{m}\right)\), then stop.
End For
Output \((\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\boldsymbol{\beta}^{m+1}, \boldsymbol{d}^{m+1} \mathcal{A}^{m+1}, \mathcal{I}^{m+1}\right).\)
Algorithm 2: Splicing \(\left(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}, k_{\max }, \tau_{s}\right)\)#
Input: \(\boldsymbol{\beta}, \boldsymbol{d}, \mathcal{A}, \mathcal{I}, k_{\max }\), and \(\tau_{\mathrm{s}} .\)
Initialize: \(L_{0}=L=\frac{1}{2 n}\|y-X \beta\|_{2}^{2}\), and set
\[\xi_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\beta_{j}\right)^{2}, \zeta_{j}=\frac{X_{j}^{\top} X_{j}}{2 n}\left(\frac{d_{j}}{X_{j}^{\top} X_{j} / n}\right)^{2}, j=1, \ldots, p.\]For \(k=1,2, \ldots, k_{\max }\), do
\[\begin{split}\mathcal{A}_{k}=\left\{j \in \mathcal{A}: \sum_{i \in \mathcal{A}} \mathrm{I}\left(\xi_{j} \geq \xi_{i}\right) \leq k\right\},\\ \mathcal{I}_{k}=\left\{j \in \mathcal{I}: \sum_{i \in \mathcal{I}} \mathrm{I}\left(\zeta_{j} \leq \zeta_{i}\right) \leq k\right\}.\end{split}\]Let \(\tilde{\mathcal{A}}_{k}=\left(\mathcal{A} \backslash \mathcal{A}_{k}\right) \cup \mathcal{I}_{k}, \tilde{\mathcal{I}}_{k}=\left(\mathcal{I} \backslash \mathcal{I}_{k}\right) \cup \mathcal{A}_{k}\) and solve:
\[\begin{split}\tilde{\boldsymbol{\beta}}_{{\mathcal{A}}_{k}}=\left(\boldsymbol{X}_{\mathcal{A}_{k}}^{\top} \boldsymbol{X}_{{\mathcal{A}}_{k}}\right)^{-1} \boldsymbol{X}_{{\mathcal{A}_{k}}}^{\top} y, \quad \tilde{\boldsymbol{\beta}}_{{\mathcal{I}}_{k}}=0\\ \tilde{\boldsymbol d}_{\mathcal{I}^k}=X_{\mathcal{I}^k}^{\top}(y-X \tilde{\beta}) / n,\quad \tilde{\boldsymbol d}_{\mathcal{A}^k} = 0.\end{split}\]Compute: \(\mathcal{L}_{n}(\tilde{\boldsymbol\beta})=\frac{1}{2 n}\|y-X \tilde{\boldsymbol\beta}\|_{2}^{2}.\) If \(L>\mathcal{L}_{n}(\tilde{\boldsymbol\beta})\), then
\[\begin{split}(\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})=\left(\tilde{\boldsymbol{\beta}}, \tilde{\boldsymbol{d}}, \tilde{\mathcal{A}}_{k}, \tilde{\mathcal{I}}_{k}\right)\\ L=\mathcal{L}_{n}(\tilde{\boldsymbol\beta}).\end{split}\]End for
If \(L_{0}-L<\tau_{s}\), then \((\hat{\boldsymbol\beta}, \hat{d}, \hat{A}, \hat{I})=(\boldsymbol\beta, d, \mathcal{A}, \mathcal{I}).\)
Output \((\hat{\boldsymbol{\beta}}, \hat{\boldsymbol{d}}, \hat{\mathcal{A}}, \hat{\mathcal{I}})\).
Determining the Best Support Size with SIC#
In practice, the support size is usually unknown. We use a datadriven procedure to determine s. For any active set \(\mathcal{A}\), define an \(\mathrm{SIC}\) as follows:
where \(\mathcal{L}_{\mathcal{A}}=\min _{\beta_{\mathcal{I}}=0} \mathcal{L}_{n}(\beta), \mathcal{I}=(\mathcal{A})^{c}\). To identify the true model, the model complexity penalty is \(\log p\) and the slow diverging rate \(\log \log n\) is set to prevent underfitting. Theorem 4 states that the following ABESS algorithm selects the true support size via SIC.
Let \(s_{\max }\) be the maximum support size. We suggest \(s_{\max }=o\left(\frac{n}{\log p}\right)\) as the maximum possible recovery size. Typically, we set \(s_{\max }=\left[\frac{n}{\log p \log \log n}\right]\) where \([x]\) denotes the integer part of \(x\).
Algorithm 3: ABESS.#
Input: \(X, y\), and the maximum support size \(s_{\max } .\)
For \(s=1,2, \ldots, s_{\max }\), do
\[\left(\hat{\boldsymbol{\beta}}_{s}, \hat{\boldsymbol{d}}_{s}, \hat{\mathcal{A}}_{s}, \hat{\mathcal{I}}_{s}\right)= \text{BESS.Fixed}(s).\]End for
Compute the minimum of SIC:
\[s_{\min }=\arg \min _{s} \operatorname{SIC}\left(\hat{\mathcal{A}}_{s}\right).\]Output \(\left(\hat{\boldsymbol{\beta}}_{s_{\min}}, \hat{\boldsymbol{d}}_{s_{\min }}, \hat{A}_{s_{\min }}, \hat{\mathcal{I}}_{s_{\min }}\right) .\)
Now, enjoy the data analysis with abess
library:
import abess
# sphinx_gallery_thumbnail_path = 'Tutorial/figure/icon_noborder.png'
Total running time of the script: (0 minutes 0.000 seconds)