adelie.state.multiglm_naive#

adelie.state.multiglm_naive(*, X: MatrixNaiveBase32 | MatrixNaiveBase64, glm: GlmMultiBase32 | GlmMultiBase64, constraints: list[ConstraintBase32 | ConstraintBase64], groups: ndarray, group_sizes: ndarray, alpha: float, penalty: ndarray, offsets: ndarray, screen_set: ndarray, screen_beta: ndarray, screen_is_active: ndarray, active_set_size: int, active_set: ndarray, lmda: float, grad: ndarray, eta: ndarray, resid: ndarray, loss_full: float, loss_null: float | None = None, lmda_path: ndarray | None = None, lmda_max: float | None = None, irls_max_iters: int = 10000, irls_tol: float = 1e-07, max_iters: int = 100000, tol: float = 1e-07, adev_tol: float = 0.9, ddev_tol: float = 0, newton_tol: float = 1e-12, newton_max_iters: int = 1000, n_threads: int = 1, early_exit: bool = True, intercept: bool = True, screen_rule: str = 'pivot', min_ratio: float = 0.01, lmda_path_size: int = 100, max_screen_size: int | None = None, max_active_size: int | None = None, pivot_subset_ratio: float = 0.1, pivot_subset_min: int = 1, pivot_slack_ratio: float = 1.25)[source]#

Creates a multi-response GLM, naive method state object.

Define the following quantities:

\(\tilde{X} = X\otimes I_K\) if intercept is False, and otherwise \([1 \otimes I_K, X \otimes I_K]\).

\(\tilde{y}\) as the flattened version of \(y\) as row-major.

\(\tilde{W} = K^{-1} (W \otimes I_K)\).

\(\tilde{\eta}\) as the flattened version of \(\eta\) as row-major and similarly for \(\tilde{\eta}^0\).

Parameters:

X(n, p) Union[MatrixNaiveBase32, MatrixNaiveBase64]

Feature matrix. It is typically one of the matrices defined in adelie.matrix submodule.

glmUnion[GlmMultiBase32, GlmMultiBase64]

Multi-response GLM object. It is typically one of the GLM classes defined in adelie.glm submodule.

constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]]

List of constraints for each group. constraints[i] is the constraint object corresponding to group i. If constraints[i] is None, then the i th group is unconstrained. If None, every group is unconstrained.

groups(G,) ndarray

List of starting indices to each group where G is the number of groups. groups[i] is the starting index of the i th group.

group_sizes(G,) ndarray

List of group sizes corresponding to each element of groups. group_sizes[i] is the size of the i th group.

alphafloat

Elastic net parameter. It must be in the range \([0,1]\).

penalty(G,) ndarray

Penalty factor for each group in the same order as groups. It must be a non-negative vector.

offsets(n, K) ndarray

Observation offsets \(\eta^0\).

screen_set(s,) ndarray

List of indices into groups that correspond to the screen groups. screen_set[i] is i th screen group. screen_set must contain at least the true (optimal) active groups when the regularization is given by lmda.

screen_beta(ws,) ndarray

Coefficient vector on the screen set. screen_beta[b:b+p] is the coefficient for the i th screen group where k = screen_set[i], b = screen_begins[i], and p = group_sizes[k]. The values can be arbitrary but it is recommended to be close to the solution at lmda.

screen_is_active(s,) ndarray

Boolean vector that indicates whether each screen group in groups is active or not. screen_is_active[i] is True if and only if screen_set[i] is active.

active_set_sizeint

Number of active groups. active_set[i] is only well-defined for i in the range [0, active_set_size).

active_set(G,) ndarray

List of indices into screen_set that correspond to active groups. screen_set[active_set[i]] is the i th active group. An active group is one with non-zero coefficient block, that is, for every i th active group, screen_beta[b:b+p] == 0 where j = active_set[i], k = screen_set[j], b = screen_begins[j], and p = group_sizes[k].

lmdafloat

The last regularization parameter that was attempted to be solved.

grad((p+intercept)*K,) ndarray

The full gradient \(-\tilde{X}^\top \nabla \ell(\tilde{\eta})\) where \(\tilde{\eta}\) is given by eta.

eta(n*K,) ndarray

The natural parameter \(\tilde{\eta} = \tilde{X}\beta + \tilde{\eta}^0\) where \(\beta\), and \(\tilde{\eta}^0\) are given by screen_beta and offsets.

resid(n*K,) ndarray

Residual \(-\nabla \ell(\tilde{\eta})\) where \(\tilde{\eta}\) is given by eta.

loss_fullfloat

Full loss \(\ell(\eta^\star)\) where \(\eta^\star\) is the minimizer.

loss_nullfloat, optional

Null loss \(\ell(\mathbf{1} \beta_0^{\star\top} + \eta^0)\) from fitting an intercept-only model (if intercept is True) where an intercept is given for each class and otherwise \(\ell(\eta^0)\). If None, it will be computed. Default is None.

lmda_path(L,) ndarray, optional

The regularization path to solve for. The full path is not considered if early_exit is True. It is recommended that the path is sorted in decreasing order. If None, the path will be generated. Default is None.

lmda_maxfloat, optional

The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm). If None, it will be computed. Default is None.

irls_max_itersint, optional

Maximum number of IRLS iterations. Default is int(1e4).

irls_tolfloat, optional

IRLS convergence tolerance. Default is 1e-7.

max_itersint, optional

Maximum number of coordinate descents. Default is int(1e5).

tolfloat, optional

Coordinate descent convergence tolerance. Default is 1e-7.

adev_tolfloat, optional

Percent deviance explained tolerance. If the training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.9.

ddev_tolfloat, optional

Difference in percent deviance explained tolerance. If the difference of the last two training percent deviance explained exceeds this quantity and early_exit is True, then the solver terminates. Default is 0.

newton_tolfloat, optional

Convergence tolerance for the BCD update. Default is 1e-12.

newton_max_itersint, optional

Maximum number of iterations for the BCD update. Default is 1000.

n_threadsint, optional

Number of threads. Default is 1.

early_exitbool, optional

True if the function should early exit based on training percent deviance explained. Default is True.

min_ratiofloat, optional

The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated. Default is 1e-2.

lmda_path_sizeint, optional

Number of regularizations in the path if it is to be generated. Default is 100.

interceptbool, optional

True if the function should fit with intercept. Default is True.

screen_rulestr, optional

The type of screening rule to use. It must be one of the following options:

"strong": adds groups whose active scores are above the strong threshold.

"pivot": adds groups whose active scores are above the pivot cutoff with slack.

Default is "pivot".

max_screen_sizeint, optional

Maximum number of screen groups allowed. The function will return a valid state and guarantees to have screen set size less than or equal to max_screen_size. If None, it will be set to the total number of groups. Default is None.

max_active_sizeint, optional

Maximum number of active groups allowed. The function will return a valid state and guarantees to have active set size less than or equal to max_active_size. If None, it will be set to the total number of groups. Default is None.

pivot_subset_ratiofloat, optional

If screening takes place, then the (1 + pivot_subset_ratio) * s largest active scores are used to determine the pivot point where s is the current screen set size. It is only used if screen_rule="pivot". Default is 0.1.

pivot_subset_minint, optional

If screening takes place, then at least pivot_subset_min number of active scores are used to determine the pivot point. It is only used if screen_rule="pivot". Default is 1.

pivot_slack_ratiofloat, optional

If screening takes place, then pivot_slack_ratio number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack. It is only used if screen_rule="pivot". Default is 1.25.

Returns:

wrap: Wrapper state object.