adelie.matrix.one_hot#

adelie.matrix.one_hot(mat: ndarray, levels: ndarray | None = None, *, copy: bool = False, n_threads: int = 1)[source]#

Creates a one-hot encoded matrix.

This matrix \(X \in \mathbb{R}^{n \times p}\) represents a one-hot encoding of a given base matrix \(Z \in \mathbb{R}^{n \times d}\). We assume \(Z\) contains, in general, a combination of continuous and discrete features (as columns). Denote \(L : \{1, \ldots, d\} \to \mathbb{N}\) as the mapping that maps each feature index of \(Z\) to the number of levels of that feature where a value of \(0\) means the feature is continuous and otherwise means it is discrete with that many levels (or categories). For every \(j\) th column of \(Z\), define the possibly one-hot encoded version \(\tilde{Z}_j\) as

\[\begin{split}\begin{align*} \tilde{Z}_j &:= \begin{cases} Z_j ,& L(j) = 0 \\ I_{Z_j} ,& L(j) > 0 \end{cases} \end{align*}\end{split}\]

Here, \(I_{v}\) is the indicator matrix, or one-hot encoding, of \(v\).

Then, \(X\) is defined as the column-wise concatenation of \(\tilde{Z}_j\) in order of \(j\).

Note

Every discrete feature of Z must take on values in the set \(\{0, \ldots, \ell-1\}\) where \(\ell\) is the number of levels for that feature.

Note

This matrix only works for naive method!

Parameters:

mat(n, d) ndarray: The base matrix \(Z\) from which to construct one-hot encodings.
levels(d,) ndarray, optional: Number of levels for each column in mat. A non-positive value indicates that the column is a continuous variable whereas a positive value indicates that it is a discrete variable with that many levels (or categories). If None, it is initialized to be np.zeros(d) so that every column is a continuous variable. Default is None.
copybool, optional: If True, a copy of mat is stored internally. Otherwise, a reference is stored instead. Default is False.
n_threadsint, optional: Number of threads. Default is 1.

Returns:

wrap: Wrapper matrix object.