adelie.data.snp_phased_ancestry#
- adelie.data.snp_phased_ancestry(n: int, s: int, A: int, *, K: int = 1, glm: str = 'gaussian', sparsity: float = 0.95, one_ratio: float = 0.25, two_ratio: float = 0.05, zero_penalty: float = 0, snr: float = 1, seed: int = 0)[source]#
Creates a SNP phased, ancestry dataset.
The groups and group sizes are generated randomly such that
Ggroups are created and the sum of the group sizes isp.The calldata matrix
Xis a phased version of a matrix with sparsity ratio1 - one_ratio - two_ratiowhereone_ratioof the entries are randomly set to1andtwo_ratioare randomly set to2.The ancestry matrix randomly generates integers in the range
[0, A).The true coefficients \(\beta\) is such that
sparsityproportion of the entries are set to \(0\).The response
yis generated from the GLM specified byglm.The penalty factors are by default set to
np.sqrt(group_sizes), however ifzero_penalty > 0, a random set of penalties will be set to zero, in which case,penaltyis rescaled such that the \(\ell_2\) norm squared isp.
- Parameters:
- nint
Number of data points.
- sint
Number of SNPs.
- Aint
Number of ancestries.
- Kint, optional
Number of classes for multi-response GLMs. Default is
1.- glmstr, optional
GLM name. It must be one of the following:
"binomial""cox""gaussian""multigaussian""multinomial""poisson"
Default is
"gaussian".- sparsityfloat, optional
Proportion of \(\beta\) entries to be zeroed out. Default is
0.95.- one_ratiofloat, optional
Proportion of the entries of
Xthat is set to1. Default is0.25.- two_ratiofloat, optional
Proportion of the entries of
Xthat is set to2. Default is0.05.- zero_penaltyfloat, optional
Proportion of
penaltyentries to be zeroed out. Default is0.- snrfloat, optional
Signal-to-noise ratio. Default is
1.- seedint, optional
Random seed. Default is
0.
- Returns:
- datadict
A dictionary containing the generated data:
"X": feature matrix."ancestries": ancestry label of the same shape asX."y": response vector."groups": mapping of group index to the starting column index ofX."group_sizes": mapping of group index to the group size."penalty": penalty factor for each group index.