make_multivariate_glm_data#
- class abess.datasets.make_multivariate_glm_data[source]#
Generate a dataset with multi-responses.
- Parameters
n (int, optional, default=100) -- The number of observations.
p (int, optional, default=100) -- The number of predictors of interest.
family ({multigaussian, multinomial, poisson}, optional) -- default="multigaussian". The distribution of the simulated multi-response. "multigaussian" for multivariate quantitative responses, "multinomial" for multiple classification responses, "poisson" for counting responses.
k (int, optional, default=10) -- The number of nonzero coefficients in the underlying regression model.
M (int, optional, default=1) -- The number of responses.
rho (float, optional, default=0.5) -- A parameter used to characterize the pairwise correlation in predictors.
corr_type (string, optional, default="const") -- The structure of correlation matrix. "const" for constant pairwise correlation, "exp" for pairwise correlation with exponential decay.
coef (array_like, optional, default=None) -- The coefficient values in the underlying regression model.
sparse_ratio (float, optional, default=None) -- The sparse ratio of predictor matrix (x).
- x#
Design matrix of predictors.
- Type
array-like, shape(n, p)
- y#
Response variable.
- Type
array-like, shape(n, M)
- coef_#
The coefficients used in the underlying regression model. It is rowwise sparse, with k nonzero rows.
- Type
array-like, shape(p, M)
Notes
The output, whose type is named
data
, contains three elements:x
,y
andcoef_
, which correspond the variables, responses and coefficients, respectively.Note that the
y
andcoef_
here are both matrix:each row of
x
andy
indicates a sample;each column of
coef_
corresponds to the effect on one response. It is rowwise sparsity. Under this setting, a "useful" variable is relevant to all responses.
We \(x, y, \beta\) for one sample in the math formulas below.
Multitask Regression
Usage:
family='multigaussian'
Model: \(y \sim MVN(\mu, \Sigma),\ \mu^T=x^T \beta\).
the variance \(\Sigma = \text{diag}(1, 1, \cdots, 1)\);
the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.
Multinomial Regression
Usage:
family='multinomial'
Model: \(y\) is a "0-1" array with only one "1". Its index is chosed under probabilities \(\pi = \exp(x^T \beta)\).
the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.
- __init__(n=100, p=100, k=10, family='multigaussian', rho=0.5, corr_type='const', coef_=None, M=1, sparse_ratio=None)[source]#
- __new__(*args, **kwargs)#