adelie.io.snp_unphased#

class adelie.io.snp_unphased(filename: str, read_mode: str = 'file')[source]#

IO handler for SNP unphased matrix.

A SNP unphased matrix is a matrix that contains values in the set {0, 1, 2, NA} where NA indicates a missing value. Typically, NA is encoded as -9, but for more generality we assume any negative value is equivalent to NA.

Parameters:
filenamestr

File name to either read or write the SNP unphased matrix in .snpdat format.

read_modestr, optional

Reading mode of the file filename. It must be one of the following:

  • "file": reads the file using standard file IO. This method is the most general and portable method, however, with large files, it is the slowest one.

  • "mmap": reads the file using mmap. This method is only supported on Linux and MacOS. It is the most efficient way to read large files.

Default is "file".

Methods

__init__(self, filename, read_mode)

read(self)

Reads and loads the matrix from file.

to_dense(self[, n_threads])

Creates a dense SNP unphased matrix from the file.

write(calldata[, impute_method, n_threads])

Writes a dense SNP unphased matrix to the file in .snpdat format.

Attributes

cols

Number of columns.

endian

Endianness used in the file.

impute

Imputed value for each column.

is_read

True if the IO handler has read the file content and otherwise False.

nnm

Number of non-missing entries for each column.

nnz

Number of non-zero entries for each column.

rows

Number of rows.

snps

Number of SNPs.

__init__(self: adelie.adelie_core.io.IOSNPUnphased, filename: str, read_mode: str) None[source]#
read(self: adelie.adelie_core.io.IOSNPBase) int#

Reads and loads the matrix from file.

Returns:
total_bytesint

Number of bytes read.

to_dense(self: adelie.adelie_core.io.IOSNPUnphased, n_threads: int = 1) numpy.ndarray[numpy.int8[m, n]]#

Creates a dense SNP unphased matrix from the file.

Note

The missing values are always encoded as -9 even if they were different (negative) values when writing to the file.

Parameters:
n_threadsint, optional

Number of threads. Default is 1.

Returns:
dense(n, p) ndarray

Dense SNP unphased matrix.

write(calldata: ndarray, impute_method: str | ndarray = 'mean', n_threads: int = 1)[source]#

Writes a dense SNP unphased matrix to the file in .snpdat format.

Parameters:
calldata(n, p) ndarray

SNP unphased matrix in dense format.

impute_methodUnion[str, ndarray], optional

Impute method for missing values. It must be one of the following:

  • "mean": mean-imputation. Missing values in column j of calldata are replaced with the mean of column j where the mean is computed using the non-missing values. If every value is missing, we impute with 0.

  • numpy.ndarray: user-specified vector of imputed values for each column of calldata.

Default is "mean".

n_threadsint, optional

Number of threads. Default is 1.

Returns:
total_bytesint

Number of bytes written.

benchmarkdict

Dictionary of benchmark timings for each step of the serializer.

cols#

Number of columns.

endian#

Endianness used in the file. It is "big" if the system is big-endian otherwise "little".

Note

We recommend that users read/write from/to the file on the same machine. The .snpdat format depends on the endianness of the machine. So, unless the endianness is the same across two different machines, it is undefined behavior reading a file that was generated on a different machine.

impute#

Imputed value for each column.

is_read#

True if the IO handler has read the file content and otherwise False.

nnm#

Number of non-missing entries for each column.

Note

Missing values are counted even if you wrote the matrix with imputation method as "zero".

nnz#

Number of non-zero entries for each column.

rows#

Number of rows.

snps#

Number of SNPs.