adelie.data.snp_unphased#
- adelie.data.snp_unphased(n: int, p: int, *, K: int = 1, glm: str = 'gaussian', sparsity: float = 0.95, missing_ratio: float = 0.1, one_ratio: float = 0.25, two_ratio: float = 0.05, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#
Creates a SNP unphased dataset.
This dataset is only used for lasso, so
groupsis simply each individual feature andgroup_sizesis a vector of ones.The calldata matrix
Xhas sparsity ratio1 - one_ratio - two_ratiowhereone_ratioof the entries are randomly set to1andtwo_ratioare randomly set to2. The user only sees a masked version ofXwheremissing_ratioof the entries are set to-9.The true coefficients \(\beta\) are such that
sparsityproportion of the entries are set to \(0\).The response
yis generated from the GLM specified byglm.The penalty factors are by default set to
np.sqrt(group_sizes), however ifzero_penalty > 0, a random set of penalties will be set to zero, in which case,penaltyis rescaled such that the \(\ell_2\) norm squared isp.
- Parameters:
- nint
Number of data points.
- pint
Number of SNPs.
- Kint, optional
Number of classes for multi-response GLMs. Default is
1.- glmstr, optional
GLM name. It must be one of the following:
"binomial""cox""gaussian""multigaussian""multinomial""poisson"
Default is
"gaussian".- sparsityfloat, optional
Proportion of \(\beta\) entries to be zeroed out. Default is
0.95.- missing_ratiofloat, optional
Proportion of the entries of
Xthat is set to-9(missing). Default is0.1.- one_ratiofloat, optional
Proportion of the entries of
Xthat is set to1. Default is0.25.- two_ratiofloat, optional
Proportion of the entries of
Xthat is set to2. Default is0.05.- zero_penaltyfloat, optional
Proportion of
penaltyentries to be zeroed out. Default is0.- snrfloat, optional
Signal-to-noise ratio. Default is
1.- seedint, optional
Random seed. Default is
0.
- Returns:
- datadict
A dictionary containing the generated data:
"X": feature matrix."y": response vector."groups": mapping of group index to the starting column index ofX."group_sizes": mapping of group index to the group size."penalty": penalty factor for each group index.