Note
Go to the end to download the full example code
Initial Active Set¶
User-specified initial active set¶
We believe that it worth allowing given an initial active set so that the splicing process starts from this set for each sparsity. It might come from prior analysis, whose result is not quite precise but better than random selection, so the algorithm can run more efficiently. Or you just want to give different initial sets to test the stability of the algorithm.
Note that this is NOT equivalent to always_select
, since they can be exchanged to inactive set when splicing.
To specify initial active set, an additive argument A_init
should be
given in fit()
.
import numpy as np
from abess.datasets import make_glm_data
from abess.linear import LinearRegression
n = 100
p = 10
k = 3
np.random.seed(2)
data = make_glm_data(n=n, p=p, k=k, family='gaussian')
model = LinearRegression(support_size=range(0, 5), A_init=[0, 1, 2])
model.fit(data.x, data.y)
Some strategies for initial active set are:
If
sparsity = len(A_init)
, the splicing process would start fromA_init
.If
sparsity > len(A_init)
, the initial set includesA_init
and other variables with larger forward sacrifices chooses.If
sparsity < len(A_init)
, the initial set includes part ofA_init
.If both
A_init
andalways_select
are given,always_select
first.For warm-start,
A_init
will only affect splicing under the first sparsity insupport_size
.For CV,
A_init
will affect each fold but not the re-fitting on full data.
The abess
R package also supports user-defined initial active set.
For R tutorial, please view
https://abess-team.github.io/abess/articles/v07-advancedFeatures.html.
Total running time of the script: (0 minutes 0.003 seconds)