# make_multivariate_glm_data¶

class abess.datasets.make_multivariate_glm_data(n=100, p=100, k=10, family='multigaussian', rho=0.5, corr_type='const', coef_=None, M=1, sparse_ratio=None)[source]

Generate a dataset with multi-responses.

Parameters
• n (int, optional, default=100) -- The number of observations.

• p (int, optional, default=100) -- The number of predictors of interest.

• family ({multigaussian, multinomial, poisson}, optional) -- default="multigaussian". The distribution of the simulated multi-response. "multigaussian" for multivariate quantitative responses, "multinomial" for multiple classification responses, "poisson" for counting responses.

• k (int, optional, default=10) -- The number of nonzero coefficients in the underlying regression model.

• M (int, optional, default=1) -- The number of responses.

• rho (float, optional, default=0.5) -- A parameter used to characterize the pairwise correlation in predictors.

• corr_type (string, optional, default="const") -- The structure of correlation matrix. "const" for constant pairwise correlation, "exp" for pairwise correlation with exponential decay.

• coef (array_like, optional, default=None) -- The coefficient values in the underlying regression model.

• sparse_ratio (float, optional, default=None) -- The sparse ratio of predictor matrix (x).

x

Design matrix of predictors.

Type

array-like, shape(n, p)

y

Response variable.

Type

array-like, shape(n, M)

coef_

The coefficients used in the underlying regression model. It is rowwise sparse, with k nonzero rows.

Type

array-like, shape(p, M)

Notes

The output, whose type is named data, contains three elements: x, y and coef_, which correspond the variables, responses and coefficients, respectively.

Note that the y and coef_ here are both matrix:

1. each row of x and y indicates a sample;

2. each column of coef_ corresponds to the effect on one response. It is rowwise sparsity. Under this setting, a "useful" variable is relevant to all responses.

We $$x, y, \beta$$ for one sample in the math formulas below.

• Usage: family='multigaussian'

• Model: $$y \sim MVN(\mu, \Sigma),\ \mu^T=x^T \beta$$.

• the variance $$\Sigma = \text{diag}(1, 1, \cdots, 1)$$;

• the coefficient $$\beta$$ contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from $$N(0, 10)$$, $$N(0, 5)$$ and $$N(0, 2)$$, respectively.

• Multinomial Regression

• Usage: family='multinomial'

• Model: $$y$$ is a "0-1" array with only one "1". Its index is chosed under probabilities $$\pi = \exp(x^T \beta)$$.

• the coefficient $$\beta$$ contains 30% "strong" values, 40% "moderate" values and the rest are "weak". They come from $$N(0, 10)$$, $$N(0, 5)$$ and $$N(0, 2)$$, respectively.