adelie.data.dense#

adelie.data.dense(n: int, p: int, G: int, *, K: int = 1, glm: str = 'gaussian', equal_groups=False, rho: float = 0, sparsity: float = 0.95, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#

Creates a dense dataset.

  • The groups and group sizes are generated randomly such that G groups are created and the sum of the group sizes is p.

  • The data matrix X is generated from a normal distribution where each feature is equicorrelated with the other features by rho.

  • The true coefficients \(\beta\) are such that sparsity proportion of the entries are set to \(0\).

  • The response y is generated from the GLM specified by glm.

  • The penalty factors are by default set to np.sqrt(group_sizes), however if zero_penalty > 0, a random set of penalties will be set to zero, in which case, penalty is rescaled such that the \(\ell_2\) norm squared equals p.

Parameters:
nint

Number of data points.

pint

Number of features.

Gint

Number of groups.

Kint, optional

Number of classes for multi-response GLMs. Default is 1.

glmstr, optional

GLM name. It must be one of the following:

  • "binomial"

  • "cox"

  • "gaussian"

  • "multigaussian"

  • "multinomial"

  • "poisson"

Default is "gaussian".

equal_groupsbool, optional

If True, group sizes are made as equal as possible. Default is False.

rhofloat, optional

Feature (equi)-correlation. Default is 0 so that the features are independent.

sparsityfloat, optional

Proportion of \(\beta\) entries to be zeroed out. Default is 0.95.

zero_penaltyfloat, optional

Proportion of penalty entries to be zeroed out. Default is 0.

snrfloat, optional

Signal-to-noise ratio. Default is 1.

seedint, optional

Random seed. Default is 0.

Returns:
datadict

A dictionary containing the generated data:

  • "X": feature matrix.

  • "y": response vector.

  • "groups": mapping of group index to the starting column index of X.

  • "group_sizes": mapping of group index to the group size.

  • "penalty": penalty factor for each group index.