adelie.matrix.one_hot#
- adelie.matrix.one_hot(mat: ndarray, levels: ndarray | None = None, *, copy: bool = False, n_threads: int = 1)[source]#
Creates a one-hot encoded matrix.
This matrix \(X \in \mathbb{R}^{n \times p}\) represents a one-hot encoding of a given base matrix \(Z \in \mathbb{R}^{n \times d}\). We assume \(Z\) contains, in general, a combination of continuous and discrete features (as columns). Denote \(L : \{1, \ldots, d\} \to \mathbb{N}\) as the mapping that maps each feature index of \(Z\) to the number of levels of that feature where a value of \(0\) means the feature is continuous and otherwise means it is discrete with that many levels (or categories). For every \(j\) th column of \(Z\), define the possibly one-hot encoded version \(\tilde{Z}_j\) as
\[\begin{split}\begin{align*} \tilde{Z}_j &:= \begin{cases} Z_j ,& L(j) = 0 \\ I_{Z_j} ,& L(j) > 0 \end{cases} \end{align*}\end{split}\]Here, \(I_{v}\) is the indicator matrix, or one-hot encoding, of \(v\).
Then, \(X\) is defined as the column-wise concatenation of \(\tilde{Z}_j\) in order of \(j\).
Note
Every discrete feature of Z must take on values in the set \(\{0, \ldots, \ell-1\}\) where \(\ell\) is the number of levels for that feature.
Note
This matrix only works for naive method!
- Parameters:
- mat(n, d) ndarray
The base matrix \(Z\) from which to construct one-hot encodings.
- levels(d,) ndarray, optional
Number of levels for each column in
mat
. A non-positive value indicates that the column is a continuous variable whereas a positive value indicates that it is a discrete variable with that many levels (or categories). IfNone
, it is initialized to benp.zeros(d)
so that every column is a continuous variable. Default isNone
.- copybool, optional
If
True
, a copy ofmat
is stored internally. Otherwise, a reference is stored instead. Default isFalse
.- n_threadsint, optional
Number of threads. Default is
1
.
- Returns:
- wrap
Wrapper matrix object.