adelie.data.snp_unphased#
- adelie.data.snp_unphased(n: int, p: int, *, K: int = 1, glm: str = 'gaussian', sparsity: float = 0.95, missing_ratio: float = 0.1, one_ratio: float = 0.25, two_ratio: float = 0.05, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#
Creates a SNP unphased dataset.
This dataset is only used for lasso, so
groups
is simply each individual feature andgroup_sizes
is a vector of ones.The calldata matrix
X
has sparsity ratio1 - one_ratio - two_ratio
whereone_ratio
of the entries are randomly set to1
andtwo_ratio
are randomly set to2
. The user only sees a masked version ofX
wheremissing_ratio
of the entries are set to-9
.The true coefficients \(\beta\) are such that
sparsity
proportion of the entries are set to \(0\).The response
y
is generated from the GLM specified byglm
.The penalty factors are by default set to
np.sqrt(group_sizes)
, however ifzero_penalty > 0
, a random set of penalties will be set to zero, in which case,penalty
is rescaled such that the \(\ell_2\) norm squared isp
.
- Parameters:
- nint
Number of data points.
- pint
Number of SNPs.
- Kint, optional
Number of classes for multi-response GLMs. Default is
1
.- glmstr, optional
GLM name. It must be one of the following:
"binomial"
"cox"
"gaussian"
"multigaussian"
"multinomial"
"poisson"
Default is
"gaussian"
.- sparsityfloat, optional
Proportion of \(\beta\) entries to be zeroed out. Default is
0.95
.- missing_ratiofloat, optional
Proportion of the entries of
X
that is set to-9
(missing). Default is0.1
.- one_ratiofloat, optional
Proportion of the entries of
X
that is set to1
. Default is0.25
.- two_ratiofloat, optional
Proportion of the entries of
X
that is set to2
. Default is0.05
.- zero_penaltyfloat, optional
Proportion of
penalty
entries to be zeroed out. Default is0
.- snrfloat, optional
Signal-to-noise ratio. Default is
1
.- seedint, optional
Random seed. Default is
0
.
- Returns:
- datadict
A dictionary containing the generated data:
"X"
: feature matrix."y"
: response vector."groups"
: mapping of group index to the starting column index ofX
."group_sizes"
: mapping of group index to the group size."penalty"
: penalty factor for each group index.