make_multivariate_glm_data#

class abess.datasets.make_multivariate_glm_data[source]#

Generate a dataset with multi-responses.

Parameters

n (int, optional, default=100) -- The number of observations.
p (int, optional, default=100) -- The number of predictors of interest.
family ({multigaussian, multinomial, poisson}, optional) -- default="multigaussian". The distribution of the simulated multi-response. "multigaussian" for multivariate quantitative responses, "multinomial" for multiple classification responses, "poisson" for counting responses.
k (int, optional, default=10) -- The number of nonzero coefficients in the underlying regression model.
M (int, optional, default=1) -- The number of responses.
rho (float, optional, default=0.5) -- A parameter used to characterize the pairwise correlation in predictors.
corr_type (string, optional, default="const") -- The structure of correlation matrix. "const" for constant pairwise correlation, "exp" for pairwise correlation with exponential decay.
coef (array_like, optional, default=None) -- The coefficient values in the underlying regression model.
sparse_ratio (float, optional, default=None) -- The sparse ratio of predictor matrix (x).

x#

Design matrix of predictors.

Type: array-like, shape(n, p)

y#

Response variable.

Type: array-like, shape(n, M)

coef_#

The coefficients used in the underlying regression model. It is rowwise sparse, with k nonzero rows.

Type: array-like, shape(p, M)

Notes

The output, whose type is named data, contains three elements: x, y and coef_, which correspond the variables, responses and coefficients, respectively.

Note that the y and coef_ here are both matrix:

each row of x and y indicates a sample;
each column of coef_ corresponds to the effect on one response. It is rowwise sparsity. Under this setting, a "useful" variable is relevant to all responses.

We \(x, y, \beta\) for one sample in the math formulas below.

Multitask Regression
- Usage: family='multigaussian'
- Model: \(y \sim MVN(\mu, \Sigma),\ \mu^T=x^T \beta\).
  the variance \(\Sigma = \text{diag}(1, 1, \cdots, 1)\);
  
  the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.
Multinomial Regression
- Usage: family='multinomial'
- Model: \(y\) is a "0-1" array with only one "1". Its index is chosed under probabilities \(\pi = \exp(x^T \beta)\).
  the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.

__init__(n=100, p=100, k=10, family='multigaussian', rho=0.5, corr_type='const', coef_=None, M=1, sparse_ratio=None)[source]#

__new__(*args, **kwargs)#