adelie.io.snp_unphased#
- class adelie.io.snp_unphased(filename: str, read_mode: str = 'file')[source]#
IO handler for SNP unphased matrix.
A SNP unphased matrix is a matrix that contains values in the set
{0, 1, 2, NA}
whereNA
indicates a missing value. Typically,NA
is encoded as-9
, but for more generality we assume any negative value is equivalent toNA
.- Parameters:
- filenamestr
File name to either read or write the SNP unphased matrix in
.snpdat
format.- read_modestr, optional
Reading mode of the file
filename
. It must be one of the following:"file"
: reads the file using standard file IO. This method is the most general and portable method, however, with large files, it is the slowest one."mmap"
: reads the file using mmap. This method is only supported on Linux and MacOS. It is the most efficient way to read large files.
Default is
"file"
.
Methods
__init__
(self, filename, read_mode)read
(self)Reads and loads the matrix from file.
to_dense
(self[, n_threads])Creates a dense SNP unphased matrix from the file.
write
(calldata[, impute_method, n_threads])Writes a dense SNP unphased matrix to the file in
.snpdat
format.Attributes
Number of columns.
Endianness used in the file.
Imputed value for each column.
True
if the IO handler has read the file content and otherwiseFalse
.Number of non-missing entries for each column.
Number of non-zero entries for each column.
Number of rows.
Number of SNPs.
- read(self: adelie.adelie_core.io.IOSNPBase) int #
Reads and loads the matrix from file.
- Returns:
- total_bytesint
Number of bytes read.
- to_dense(self: adelie.adelie_core.io.IOSNPUnphased, n_threads: int = 1) numpy.ndarray[numpy.int8[m, n]] #
Creates a dense SNP unphased matrix from the file.
Note
The missing values are always encoded as
-9
even if they were different (negative) values when writing to the file.- Parameters:
- n_threadsint, optional
Number of threads. Default is
1
.
- Returns:
- dense(n, p) ndarray
Dense SNP unphased matrix.
- write(calldata: ndarray, impute_method: str | ndarray = 'mean', n_threads: int = 1)[source]#
Writes a dense SNP unphased matrix to the file in
.snpdat
format.- Parameters:
- calldata(n, p) ndarray
SNP unphased matrix in dense format.
- impute_methodUnion[str, ndarray], optional
Impute method for missing values. It must be one of the following:
"mean"
: mean-imputation. Missing values in columnj
ofcalldata
are replaced with the mean of columnj
where the mean is computed using the non-missing values. If every value is missing, we impute with0
.numpy.ndarray
: user-specified vector of imputed values for each column ofcalldata
.
Default is
"mean"
.- n_threadsint, optional
Number of threads. Default is
1
.
- Returns:
- total_bytesint
Number of bytes written.
- benchmarkdict
Dictionary of benchmark timings for each step of the serializer.
- cols#
Number of columns.
- endian#
Endianness used in the file. It is
"big"
if the system is big-endian otherwise"little"
.Note
We recommend that users read/write from/to the file on the same machine. The
.snpdat
format depends on the endianness of the machine. So, unless the endianness is the same across two different machines, it is undefined behavior reading a file that was generated on a different machine.
- impute#
Imputed value for each column.
- is_read#
True
if the IO handler has read the file content and otherwiseFalse
.
- nnm#
Number of non-missing entries for each column.
Note
Missing values are counted even if you wrote the matrix with imputation method as
"zero"
.
- nnz#
Number of non-zero entries for each column.
- rows#
Number of rows.
- snps#
Number of SNPs.