adelie.data.dense#

adelie.data.dense(n: int, p: int, G: int, *, K: int = 1, glm: str = 'gaussian', equal_groups=False, rho: float = 0, sparsity: float = 0.95, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#

Creates a dense dataset.

The groups and group sizes are generated randomly such that G groups are created and the sum of the group sizes is p.
The data matrix X is generated from a normal distribution where each feature is equicorrelated with the other features by rho.
The true coefficients \(\beta\) are such that sparsity proportion of the entries are set to \(0\).
The response y is generated from the GLM specified by glm.
The penalty factors are by default set to np.sqrt(group_sizes), however if zero_penalty > 0, a random set of penalties will be set to zero, in which case, penalty is rescaled such that the \(\ell_2\) norm squared equals p.

Parameters:

nint

Number of data points.

pint

Number of features.

Gint

Number of groups.

Kint, optional

Number of classes for multi-response GLMs. Default is 1.

glmstr, optional

GLM name. It must be one of the following:

"binomial"

"cox"

"gaussian"

"multigaussian"

"multinomial"

"poisson"

Default is "gaussian".

equal_groupsbool, optional

If True, group sizes are made as equal as possible. Default is False.

rhofloat, optional

Feature (equi)-correlation. Default is 0 so that the features are independent.

sparsityfloat, optional

Proportion of \(\beta\) entries to be zeroed out. Default is 0.95.

zero_penaltyfloat, optional

Proportion of penalty entries to be zeroed out. Default is 0.

snrfloat, optional

Signal-to-noise ratio. Default is 1.

seedint, optional

Random seed. Default is 0.

Returns:

datadict

A dictionary containing the generated data:

"X": feature matrix.

"y": response vector.

"groups": mapping of group index to the starting column index of X.

"group_sizes": mapping of group index to the group size.

"penalty": penalty factor for each group index.