adelie.state.multigaussian_naive#

adelie.state.multigaussian_naive(*, X: MatrixNaiveBase32 | MatrixNaiveBase64, y: ndarray, X_means: ndarray, y_var: float, resid: ndarray, resid_sum: float, constraints: list[ConstraintBase32 | ConstraintBase64], groups: ndarray, group_sizes: ndarray, alpha: float, penalty: ndarray, weights: ndarray, offsets: ndarray, screen_set: ndarray, screen_beta: ndarray, screen_is_active: ndarray, active_set_size: int, active_set: ndarray, rsq: float, lmda: float, grad: ndarray, lmda_path: ndarray | None = None, lmda_max: float | None = None, max_iters: int = 100000, tol: float = 1e-07, adev_tol: float = 0.9, ddev_tol: float = 0, newton_tol: float = 1e-12, newton_max_iters: int = 1000, n_threads: int = 1, early_exit: bool = True, intercept: bool = True, screen_rule: str = 'pivot', min_ratio: float = 0.01, lmda_path_size: int = 100, max_screen_size: int | None = None, max_active_size: int | None = None, pivot_subset_ratio: float = 0.1, pivot_subset_min: int = 1, pivot_slack_ratio: float = 1.25)[source]#

Creates a MultiGaussian, naive method state object.

Define the following quantities:

  • \(\tilde{X} = X\otimes I_K\) if intercept is False, and otherwise \([1 \otimes I_K, X \otimes I_K]\).

  • \(\tilde{y}\) as the flattened version of \(y-\eta^0\) as row-major.

  • \(\tilde{W} = K^{-1} (W \otimes I_K)\).

Parameters:
X(n, p) Union[MatrixNaiveBase32, MatrixNaiveBase64]

Feature matrix. It is typically one of the matrices defined in adelie.matrix submodule.

y(n, K) ndarray

Response matrix.

Note

This is the original response vector not offsetted!

X_means((p+intercept)*K,) ndarray

Column means (weighted by \(\tilde{W}\)) of \(\tilde{X}\).

y_varfloat

The average of the variance for each response vector where variance is given by \(\|y_{k,c} - \eta_{k,c}^0\|_W^2\) and \(z_{k,c}\) is the k th column of \(z\), centered if intercept is True. This is only used for outputting the training \(R^2\) relative to this value, i.e. this quantity is the “null” model MSE.

resid(n*K,) ndarray

Residual \(\tilde{y} - \tilde{X} \beta\) where \(\beta\) is given by screen_beta.

resid_sumfloat

Weighted (by \(\tilde{W}\)) sum of resid.

constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]], optional

List of constraints for each group. constraints[i] is the constraint object corresponding to group i. If constraints[i] is None, then the i th group is unconstrained. If None, every group is unconstrained.

groups(G,) ndarray

List of starting indices to each group where G is the number of groups. groups[i] is the starting index of the i th group.

group_sizes(G,) ndarray

List of group sizes corresponding to each element of groups. group_sizes[i] is the size of the i th group.

alphafloat

Elastic net parameter. It must be in the range \([0,1]\).

penalty(G,) ndarray

Penalty factor for each group in the same order as groups. It must be a non-negative vector.

weights(n,) ndarray

Observation weights \(W\). The weights must sum to 1.

offsets(n, K) ndarray

Observation offsets \(\eta^0\).

screen_set(s,) ndarray

List of indices into groups that correspond to the screen groups. screen_set[i] is i th screen group. screen_set must contain at least the true (optimal) active groups when the regularization is given by lmda.

screen_beta(ws,) ndarray

Coefficient vector on the screen set. screen_beta[b:b+p] is the coefficient for the i th screen group where k = screen_set[i], b = screen_begins[i], and p = group_sizes[k]. The values can be arbitrary but it is recommended to be close to the solution at lmda.

screen_is_active(s,) ndarray

Boolean vector that indicates whether each screen group in groups is active or not. screen_is_active[i] is True if and only if screen_set[i] is active.

active_set_sizeint

Number of active groups. active_set[i] is only well-defined for i in the range [0, active_set_size).

active_set(G,) ndarray

List of indices into screen_set that correspond to active groups. screen_set[active_set[i]] is the i th active group. An active group is one with non-zero coefficient block, that is, for every i th active group, screen_beta[b:b+p] == 0 where j = active_set[i], k = screen_set[j], b = screen_begins[j], and p = group_sizes[k].

rsqfloat

The change in unnormalized \(R^2\) given by \(\|\tilde{y}-\tilde{X}\beta_{\mathrm{old}}\|_{\tilde{W}}^2 - \|\tilde{y}-\tilde{X}\beta_{\mathrm{curr}}\|_{\tilde{W}}^2\). Usually, \(\beta_{\mathrm{old}} = 0\) and \(\beta_{\mathrm{curr}}\) is given by screen_beta.

lmdafloat

The last regularization parameter that was attempted to be solved.

grad((p+intercept)*K,) ndarray

The full gradient \(\tilde{X}^\top \tilde{W} (\tilde{y} - \tilde{X}\beta)\) where \(\beta\) is given by screen_beta.

lmda_path(L,) ndarray, optional

The regularization path to solve for. The full path is not considered if early_exit is True. It is recommended that the path is sorted in decreasing order. If None, the path will be generated. Default is None.

lmda_maxfloat, optional

The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm). If None, it will be computed. Default is None.

max_itersint, optional

Maximum number of coordinate descents. Default is int(1e5).

tolfloat, optional

Coordinate descent convergence tolerance. Default is 1e-7.

adev_tolfloat, optional

Percent deviance explained tolerance. If the training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.9.

ddev_tolfloat, optional

Difference in percent deviance explained tolerance. If the difference of the last two training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.

newton_tolfloat, optional

Convergence tolerance for the BCD update. Default is 1e-12.

newton_max_itersint, optional

Maximum number of iterations for the BCD update. Default is 1000.

n_threadsint, optional

Number of threads. Default is 1.

early_exitbool, optional

True if the function should early exit based on training percent deviance explained. Default is True.

min_ratiofloat, optional

The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated. Default is 1e-2.

lmda_path_sizeint, optional

Number of regularizations in the path if it is to be generated. Default is 100.

interceptbool, optional

True if the function should fit with intercept for each class. Default is True.

screen_rulestr, optional

The type of screening rule to use. It must be one of the following options:

  • "strong": adds groups whose active scores are above the strong threshold.

  • "pivot": adds groups whose active scores are above the pivot cutoff with slack.

Default is "pivot".

max_screen_sizeint, optional

Maximum number of screen groups allowed. The function will return a valid state and guarantees to have screen set size less than or equal to max_screen_size. If None, it will be set to the total number of groups. Default is None.

max_active_sizeint, optional

Maximum number of active groups allowed. The function will return a valid state and guarantees to have active set size less than or equal to max_active_size. If None, it will be set to the total number of groups. Default is None.

pivot_subset_ratiofloat, optional

If screening takes place, then the (1 + pivot_subset_ratio) * s largest active scores are used to determine the pivot point where s is the current screen set size. It is only used if screen_rule="pivot". Default is 0.1.

pivot_subset_minint, optional

If screening takes place, then at least pivot_subset_min number of active scores are used to determine the pivot point. It is only used if screen_rule="pivot". Default is 1.

pivot_slack_ratiofloat, optional

If screening takes place, then pivot_slack_ratio number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack. It is only used if screen_rule="pivot". Default is 1.25.

Returns:
wrap

Wrapper state object.