make_multivariate_glm_data#

class abess.datasets.make_multivariate_glm_data[source]#

Generate a dataset with multi-responses.

Parameters
  • n (int, optional, default=100) -- The number of observations.

  • p (int, optional, default=100) -- The number of predictors of interest.

  • family ({multigaussian, multinomial, poisson}, optional) -- default="multigaussian". The distribution of the simulated multi-response. "multigaussian" for multivariate quantitative responses, "multinomial" for multiple classification responses, "poisson" for counting responses.

  • k (int, optional, default=10) -- The number of nonzero coefficients in the underlying regression model.

  • M (int, optional, default=1) -- The number of responses.

  • rho (float, optional, default=0.5) -- A parameter used to characterize the pairwise correlation in predictors.

  • corr_type (string, optional, default="const") -- The structure of correlation matrix. "const" for constant pairwise correlation, "exp" for pairwise correlation with exponential decay.

  • coef (array_like, optional, default=None) -- The coefficient values in the underlying regression model.

  • sparse_ratio (float, optional, default=None) -- The sparse ratio of predictor matrix (x).

x#

Design matrix of predictors.

Type

array-like, shape(n, p)

y#

Response variable.

Type

array-like, shape(n, M)

coef_#

The coefficients used in the underlying regression model. It is rowwise sparse, with k nonzero rows.

Type

array-like, shape(p, M)

Notes

The output, whose type is named data, contains three elements: x, y and coef_, which correspond the variables, responses and coefficients, respectively.

Note that the y and coef_ here are both matrix:

  1. each row of x and y indicates a sample;

  2. each column of coef_ corresponds to the effect on one response. It is rowwise sparsity. Under this setting, a "useful" variable is relevant to all responses.

We \(x, y, \beta\) for one sample in the math formulas below.

  • Multitask Regression

    • Usage: family='multigaussian'

    • Model: \(y \sim MVN(\mu, \Sigma),\ \mu^T=x^T \beta\).

      • the variance \(\Sigma = \text{diag}(1, 1, \cdots, 1)\);

      • the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.

  • Multinomial Regression

    • Usage: family='multinomial'

    • Model: \(y\) is a "0-1" array with only one "1". Its index is chosed under probabilities \(\pi = \exp(x^T \beta)\).

      • the coefficient \(\beta\) contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from \(N(0, 10)\), \(N(0, 5)\) and \(N(0, 2)\), respectively.

__init__(n=100, p=100, k=10, family='multigaussian', rho=0.5, corr_type='const', coef_=None, M=1, sparse_ratio=None)[source]#
__new__(*args, **kwargs)#