adelie.state.gaussian_naive#

adelie.state.gaussian_naive(*, X: MatrixNaiveBase32 | MatrixNaiveBase64, y: ndarray, X_means: ndarray, y_mean: float, y_var: float, resid: ndarray, resid_sum: float, constraints: list[ConstraintBase32 | ConstraintBase64], groups: ndarray, group_sizes: ndarray, alpha: float, penalty: ndarray, weights: ndarray, offsets: ndarray, screen_set: ndarray, screen_beta: ndarray, screen_is_active: ndarray, active_set_size: int, active_set: ndarray, rsq: float, lmda: float, grad: ndarray, lmda_path: ndarray | None = None, lmda_max: float | None = None, max_iters: int = 100000, tol: float = 1e-07, adev_tol: float = 0.9, ddev_tol: float = 0, newton_tol: float = 1e-12, newton_max_iters: int = 1000, n_threads: int = 1, early_exit: bool = True, intercept: bool = True, screen_rule: str = 'pivot', min_ratio: float = 0.01, lmda_path_size: int = 100, max_screen_size: int | None = None, max_active_size: int | None = None, pivot_subset_ratio: float = 0.1, pivot_subset_min: int = 1, pivot_slack_ratio: float = 1.25)[source]#

Creates a Gaussian, naive method state object.

Define the following quantities:

\(X_c\) as \(X\) if intercept is False and otherwise the column-centered version.

\(y_c\) as \(y - \eta^0\) if intercept is False and otherwise the centered version.

Parameters:

X(n, p) Union[MatrixNaiveBase32, MatrixNaiveBase64]

Feature matrix. It is typically one of the matrices defined in adelie.matrix submodule.

y(n,) ndarray

Response vector.

Note

This is the original response vector not offsetted!

X_means(p,) ndarray

Column means of X (weighted by \(W\)).

y_meanfloat

Mean of the offsetted response vector \(y-\eta^0\) (weighted by \(W\)), i.e. \(\mathbf{1}^\top W (y-\eta^0)\).

y_varfloat

Variance of the offsetted response vector \(y-\eta^0\) (weighted by \(W\)), i.e. \(\|y_c\|_{W}^2\). This is only used for outputting the training \(R^2\) relative to this value, i.e. this quantity is the “null” model MSE.

resid(n,) ndarray

Residual \(y_c - X \beta\) where \(\beta\) is given by screen_beta.

resid_sumfloat

Weighted (by \(W\)) sum of resid.

constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]]

List of constraints for each group. constraints[i] is the constraint object corresponding to group i. If constraints[i] is None, then the i th group is unconstrained. If None, every group is unconstrained.

groups(G,) ndarray

List of starting indices to each group where G is the number of groups. groups[i] is the starting index of the i th group.

group_sizes(G,) ndarray

List of group sizes corresponding to each element of groups. group_sizes[i] is the size of the i th group.

alphafloat

Elastic net parameter. It must be in the range \([0,1]\).

penalty(G,) ndarray

Penalty factor for each group in the same order as groups. It must be a non-negative vector.

weights(n,) ndarray

Observation weights \(W\). The weights must sum to 1.

offsets(n,) ndarray

Observation offsets \(\eta^0\).

screen_set(s,) ndarray

List of indices into groups that correspond to the screen groups. screen_set[i] is i th screen group. screen_set must contain at least the true (optimal) active groups when the regularization is given by lmda.

screen_beta(ws,) ndarray

Coefficient vector on the screen set. screen_beta[b:b+p] is the coefficient for the i th screen group where k = screen_set[i], b = screen_begins[i], and p = group_sizes[k]. The values can be arbitrary but it is recommended to be close to the solution at lmda.

screen_is_active(s,) ndarray

Boolean vector that indicates whether each screen group in groups is active or not. screen_is_active[i] is True if and only if screen_set[i] is active.

active_set_sizeint

Number of active groups. active_set[i] is only well-defined for i in the range [0, active_set_size).

active_set(G,) ndarray

List of indices into screen_set that correspond to active groups. screen_set[active_set[i]] is the i th active group. An active group is one with non-zero coefficient block, that is, for every i th active group, screen_beta[b:b+p] == 0 where j = active_set[i], k = screen_set[j], b = screen_begins[j], and p = group_sizes[k].

rsqfloat

The change in unnormalized \(R^2\) given by \(\|y_c-X_c\beta_{\mathrm{old}}\|_{W}^2 - \|y_c-X_c\beta_{\mathrm{curr}}\|_{W}^2\). Usually, \(\beta_{\mathrm{old}} = 0\) and \(\beta_{\mathrm{curr}}\) is given by screen_beta.

lmdafloat

The last regularization parameter that was attempted to be solved.

grad(p,) ndarray

The full gradient \(X_c^\top W (y_c - X_c\beta)\) where \(\beta\) is given by screen_beta.

lmda_path(L,) ndarray, optional

The regularization path to solve for. The full path is not considered if early_exit is True. It is recommended that the path is sorted in decreasing order. If None, the path will be generated. Default is None.

lmda_maxfloat, optional

The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm). If None, it will be computed. Default is None.

max_itersint, optional

Maximum number of coordinate descents. Default is int(1e5).

tolfloat, optional

Coordinate descent convergence tolerance. Default is 1e-7.

adev_tolfloat, optional

Percent deviance explained tolerance. If the training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.9.

ddev_tolfloat, optional

Difference in percent deviance explained tolerance. If the difference of the last two training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.

newton_tolfloat, optional

Convergence tolerance for the BCD update. Default is 1e-12.

newton_max_itersint, optional

Maximum number of iterations for the BCD update. Default is 1000.

n_threadsint, optional

Number of threads. Default is 1.

early_exitbool, optional

True if the function should early exit based on training percent deviance explained. Default is True.

min_ratiofloat, optional

The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated. Default is 1e-2.

lmda_path_sizeint, optional

Number of regularizations in the path if it is to be generated. Default is 100.

interceptbool, optional

True if the function should fit with intercept. Default is True.

screen_rulestr, optional

The type of screening rule to use. It must be one of the following options:

"strong": adds groups whose active scores are above the strong threshold.

"pivot": adds groups whose active scores are above the pivot cutoff with slack.

Default is "pivot".

max_screen_sizeint, optional

Maximum number of screen groups allowed. The function will return a valid state and guarantees to have screen set size less than or equal to max_screen_size. If None, it will be set to the total number of groups. Default is None.

max_active_sizeint, optional

Maximum number of active groups allowed. The function will return a valid state and guarantees to have active set size less than or equal to max_active_size. If None, it will be set to the total number of groups. Default is None.

pivot_subset_ratiofloat, optional

If screening takes place, then the (1 + pivot_subset_ratio) * s largest active scores are used to determine the pivot point where s is the current screen set size. It is only used if screen_rule="pivot". Default is 0.1.

pivot_subset_minint, optional

If screening takes place, then at least pivot_subset_min number of active scores are used to determine the pivot point. It is only used if screen_rule="pivot". Default is 1.

pivot_slack_ratiofloat, optional

If screening takes place, then pivot_slack_ratio number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack. It is only used if screen_rule="pivot". Default is 1.25.

Returns:

wrap: Wrapper state object.