adelie.data.dense#
- adelie.data.dense(n: int, p: int, G: int, *, K: int = 1, glm: str = 'gaussian', equal_groups=False, rho: float = 0, sparsity: float = 0.95, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#
Creates a dense dataset.
The groups and group sizes are generated randomly such that
Ggroups are created and the sum of the group sizes isp.The data matrix
Xis generated from a normal distribution where each feature is equicorrelated with the other features byrho.The true coefficients \(\beta\) are such that
sparsityproportion of the entries are set to \(0\).The response
yis generated from the GLM specified byglm.The penalty factors are by default set to
np.sqrt(group_sizes), however ifzero_penalty > 0, a random set of penalties will be set to zero, in which case,penaltyis rescaled such that the \(\ell_2\) norm squared equalsp.
- Parameters:
- nint
Number of data points.
- pint
Number of features.
- Gint
Number of groups.
- Kint, optional
Number of classes for multi-response GLMs. Default is
1.- glmstr, optional
GLM name. It must be one of the following:
"binomial""cox""gaussian""multigaussian""multinomial""poisson"
Default is
"gaussian".- equal_groupsbool, optional
If
True, group sizes are made as equal as possible. Default isFalse.- rhofloat, optional
Feature (equi)-correlation. Default is
0so that the features are independent.- sparsityfloat, optional
Proportion of \(\beta\) entries to be zeroed out. Default is
0.95.- zero_penaltyfloat, optional
Proportion of
penaltyentries to be zeroed out. Default is0.- snrfloat, optional
Signal-to-noise ratio. Default is
1.- seedint, optional
Random seed. Default is
0.
- Returns:
- datadict
A dictionary containing the generated data:
"X": feature matrix."y": response vector."groups": mapping of group index to the starting column index ofX."group_sizes": mapping of group index to the group size."penalty": penalty factor for each group index.