adelie.matrix.one_hot#

adelie.matrix.one_hot(mat: ndarray, levels: ndarray | None = None, *, copy: bool = False, n_threads: int = 1)[source]#

Creates a one-hot encoded matrix.

This matrix \(X \in \mathbb{R}^{n \times p}\) represents a one-hot encoding of a given base matrix \(Z \in \mathbb{R}^{n \times d}\). We assume \(Z\) contains, in general, a combination of continuous and discrete features (as columns). Denote \(L : \{1, \ldots, d\} \to \mathbb{N}\) as the mapping that maps each feature index of \(Z\) to the number of levels of that feature where a value of \(0\) means the feature is continuous and otherwise means it is discrete with that many levels (or categories). For every \(j\) th column of \(Z\), define the possibly one-hot encoded version \(\tilde{Z}_j\) as

\[\begin{split}\begin{align*} \tilde{Z}_j &:= \begin{cases} Z_j ,& L(j) = 0 \\ I_{Z_j} ,& L(j) > 0 \end{cases} \end{align*}\end{split}\]

Here, \(I_{v}\) is the indicator matrix, or one-hot encoding, of \(v\).

Then, \(X\) is defined as the column-wise concatenation of \(\tilde{Z}_j\) in order of \(j\).

Note

Every discrete feature of Z must take on values in the set \(\{0, \ldots, \ell-1\}\) where \(\ell\) is the number of levels for that feature.

Note

This matrix only works for naive method!

Parameters:
mat(n, d) ndarray

The dense matrix \(Z\) from which to construct one-hot encodings.

levels(d,) ndarray, optional

Number of levels for each column in mat. A non-positive value indicates that the column is a continuous variable whereas a positive value indicates that it is a discrete variable with that many levels (or categories). If None, it is initialized to be np.zeros(d) so that every column is a continuous variable. Default is None.

copybool, optional

If True, a copy of mat is stored internally. Otherwise, a reference is stored instead. Default is False.

n_threadsint, optional

Number of threads. Default is 1.

Returns:
wrap

Wrapper matrix object.