adelie.solver.css_cov#

adelie.solver.css_cov(S: ndarray, subset_size: int | None = None, *, subset: ndarray | None = None, method: str = 'swapping', loss: str = 'least_squares', max_iters: int = 1000, n_threads: int = 1)[source]#

Solves column subset selection via covariance method.

Column subset selection via covariance method solves

\[\begin{align*} \mathrm{minimize}_{T \subseteq [p] : |T|=k} &\quad \ell\left( \Sigma, T \right) \end{align*}\]

for a variety of loss functions \(\ell\) where \(\Sigma \in \mathbb{R}^{p \times p}\) is a positive semi-definite matrix and \(T\) is an index set of size \(k\).

The least squares loss is given by

\[\begin{align*} \ell(\Sigma, T) = \mathrm{Tr}\left( \Sigma - \Sigma_{\cdot T} \Sigma_{T,T}^\dagger \Sigma_{T \cdot} \right) \end{align*}\]

The subset factor loss is given by

\[\begin{align*} \ell(\Sigma, T) = \log|\Sigma_T| + \log(|\mathrm{diag}( \Sigma_{-T,-T} - \Sigma_{-T,T} \Sigma_{T,T}^\dagger \Sigma_{T,-T} )|) \end{align*}\]

The minimum determinant loss is given by

\[\begin{align*} \ell(\Sigma, T) = \left|\Sigma_T\right| \end{align*}\]

The greedy method begins with the empty set for \(T\) and adds at every iteration the column that most minimizes the loss until \(T\) is of size \(k\). The swapping method begins with an initial set \(T\) of size \(k\) and iteratively attempts to swap out each entry of \(T\) for the column that most improves the loss (strictly).

Note

The greedy method is generally significantly faster than the swapping method. However, the swapping method yields a much more accurate solution to the CSS problem. We recommend using the greedy method only if the swapping method is too time-consuming or an accurate solution is not necessary.

Parameters:
S(p, p) ndarray

Positive semi-definite matrix \(\Sigma\).

subset_sizeint, optional

Subset size \(k\). It must satisfy the following conditions for each method type:

  • "greedy": must be an integer.

  • "swapping": must satisfy the conditions for "greedy" if subset is None. Otherwise, it is ignored.

Default is None.

subsetndarray, optional

Initial subset \(T\). This argument is only used by the swapping method. If None, the greedy method is used to first initialize a subset of size subset_size. Default is None.

methodstr, optional

Search method to identify the optimal \(T\). It must be one of the following:

  • "greedy": greedy method.

  • "swapping": swapping method.

Default is "swapping".

lossstr, optional

Loss type. It must be one of the following:

  • "least_squares": least squares loss.

  • "subset_factor": subset factor loss.

  • "min_det": minimum determinant loss.

Default is "least_squares".

max_itersint, optional

Maximum number of cycles. Default is int(1e3).

n_threadsint, optional

Number of threads. Default is 1.

Returns:
state

The resulting state after running the solver.