adelie.state.multiglm_naive#
- adelie.state.multiglm_naive(*, X: MatrixNaiveBase32 | MatrixNaiveBase64, glm: GlmMultiBase32 | GlmMultiBase64, constraints: list[ConstraintBase32 | ConstraintBase64], groups: ndarray, group_sizes: ndarray, alpha: float, penalty: ndarray, offsets: ndarray, screen_set: ndarray, screen_beta: ndarray, screen_is_active: ndarray, active_set_size: int, active_set: ndarray, lmda: float, grad: ndarray, eta: ndarray, resid: ndarray, loss_full: float, loss_null: float | None = None, lmda_path: ndarray | None = None, lmda_max: float | None = None, irls_max_iters: int = 10000, irls_tol: float = 1e-07, max_iters: int = 100000, tol: float = 1e-07, adev_tol: float = 0.9, ddev_tol: float = 0, newton_tol: float = 1e-12, newton_max_iters: int = 1000, n_threads: int = 1, early_exit: bool = True, intercept: bool = True, screen_rule: str = 'pivot', min_ratio: float = 0.01, lmda_path_size: int = 100, max_screen_size: int | None = None, max_active_size: int | None = None, pivot_subset_ratio: float = 0.1, pivot_subset_min: int = 1, pivot_slack_ratio: float = 1.25)[source]#
Creates a multi-response GLM, naive method state object.
Define the following quantities:
\(\tilde{X} = X\otimes I_K\) if
intercept
isFalse
, and otherwise \([1 \otimes I_K, X \otimes I_K]\).\(\tilde{y}\) as the flattened version of \(y\) as row-major.
\(\tilde{W} = K^{-1} (W \otimes I_K)\).
\(\tilde{\eta}\) as the flattened version of \(\eta\) as row-major and similarly for \(\tilde{\eta}^0\).
- Parameters:
- X(n, p) Union[MatrixNaiveBase32, MatrixNaiveBase64]
Feature matrix. It is typically one of the matrices defined in
adelie.matrix
submodule.- glmUnion[GlmMultiBase32, GlmMultiBase64]
Multi-response GLM object. It is typically one of the GLM classes defined in
adelie.glm
submodule.- constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]]
List of constraints for each group.
constraints[i]
is the constraint object corresponding to groupi
. Ifconstraints[i]
isNone
, then thei
th group is unconstrained. IfNone
, every group is unconstrained.- groups(G,) ndarray
List of starting indices to each group where G is the number of groups.
groups[i]
is the starting index of thei
th group.- group_sizes(G,) ndarray
List of group sizes corresponding to each element of
groups
.group_sizes[i]
is the size of thei
th group.- alphafloat
Elastic net parameter. It must be in the range \([0,1]\).
- penalty(G,) ndarray
Penalty factor for each group in the same order as
groups
. It must be a non-negative vector.- offsets(n, K) ndarray
Observation offsets \(\eta^0\).
- screen_set(s,) ndarray
List of indices into
groups
that correspond to the screen groups.screen_set[i]
isi
th screen group.screen_set
must contain at least the true (optimal) active groups when the regularization is given bylmda
.- screen_beta(ws,) ndarray
Coefficient vector on the screen set.
screen_beta[b:b+p]
is the coefficient for thei
th screen group wherek = screen_set[i]
,b = screen_begins[i]
, andp = group_sizes[k]
. The values can be arbitrary but it is recommended to be close to the solution atlmda
.- screen_is_active(s,) ndarray
Boolean vector that indicates whether each screen group in
groups
is active or not.screen_is_active[i]
isTrue
if and only ifscreen_set[i]
is active.- active_set_sizeint
Number of active groups.
active_set[i]
is only well-defined fori
in the range[0, active_set_size)
.- active_set(G,) ndarray
List of indices into
screen_set
that correspond to active groups.screen_set[active_set[i]]
is thei
th active group. An active group is one with non-zero coefficient block, that is, for everyi
th active group,screen_beta[b:b+p] == 0
wherej = active_set[i]
,k = screen_set[j]
,b = screen_begins[j]
, andp = group_sizes[k]
.- lmdafloat
The last regularization parameter that was attempted to be solved.
- grad((p+intercept)*K,) ndarray
The full gradient \(-\tilde{X}^\top \nabla \ell(\tilde{\eta})\) where \(\tilde{\eta}\) is given by
eta
.- eta(n*K,) ndarray
The natural parameter \(\tilde{\eta} = \tilde{X}\beta + \tilde{\eta}^0\) where \(\beta\), and \(\tilde{\eta}^0\) are given by
screen_beta
andoffsets
.- resid(n*K,) ndarray
Residual \(-\nabla \ell(\tilde{\eta})\) where \(\tilde{\eta}\) is given by
eta
.- loss_fullfloat
Full loss \(\ell(\eta^\star)\) where \(\eta^\star\) is the minimizer.
- loss_nullfloat, optional
Null loss \(\ell(\mathbf{1} \beta_0^{\star\top} + \eta^0)\) from fitting an intercept-only model (if
intercept
isTrue
) where an intercept is given for each class and otherwise \(\ell(\eta^0)\). IfNone
, it will be computed. Default isNone
.- lmda_path(L,) ndarray, optional
The regularization path to solve for. The full path is not considered if
early_exit
isTrue
. It is recommended that the path is sorted in decreasing order. IfNone
, the path will be generated. Default isNone
.- lmda_maxfloat, optional
The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm). If
None
, it will be computed. Default isNone
.- irls_max_itersint, optional
Maximum number of IRLS iterations. Default is
int(1e4)
.- irls_tolfloat, optional
IRLS convergence tolerance. Default is
1e-7
.- max_itersint, optional
Maximum number of coordinate descents. Default is
int(1e5)
.- tolfloat, optional
Coordinate descent convergence tolerance. Default is
1e-7
.- adev_tolfloat, optional
Percent deviance explained tolerance. If the training percent deviance explained exceeds this quantity and
early_exit
isTrue
, then the solver terminates. Default is0.9
.- ddev_tolfloat, optional
Difference in percent deviance explained tolerance. If the difference of the last two training percent deviance explained exceeds this quantity and
early_exit
isTrue
, then the solver terminates. Default is0
.- newton_tolfloat, optional
Convergence tolerance for the BCD update. Default is
1e-12
.- newton_max_itersint, optional
Maximum number of iterations for the BCD update. Default is
1000
.- n_threadsint, optional
Number of threads. Default is
1
.- early_exitbool, optional
True
if the function should early exit based on training percent deviance explained. Default isTrue
.- min_ratiofloat, optional
The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated. Default is
1e-2
.- lmda_path_sizeint, optional
Number of regularizations in the path if it is to be generated. Default is
100
.- interceptbool, optional
True
if the function should fit with intercept. Default isTrue
.- screen_rulestr, optional
The type of screening rule to use. It must be one of the following options:
"strong"
: adds groups whose active scores are above the strong threshold."pivot"
: adds groups whose active scores are above the pivot cutoff with slack.
Default is
"pivot"
.- max_screen_sizeint, optional
Maximum number of screen groups allowed. The function will return a valid state and guarantees to have screen set size less than or equal to
max_screen_size
. IfNone
, it will be set to the total number of groups. Default isNone
.- max_active_sizeint, optional
Maximum number of active groups allowed. The function will return a valid state and guarantees to have active set size less than or equal to
max_active_size
. IfNone
, it will be set to the total number of groups. Default isNone
.- pivot_subset_ratiofloat, optional
If screening takes place, then the
(1 + pivot_subset_ratio) * s
largest active scores are used to determine the pivot point wheres
is the current screen set size. It is only used ifscreen_rule="pivot"
. Default is0.1
.- pivot_subset_minint, optional
If screening takes place, then at least
pivot_subset_min
number of active scores are used to determine the pivot point. It is only used ifscreen_rule="pivot"
. Default is1
.- pivot_slack_ratiofloat, optional
If screening takes place, then
pivot_slack_ratio
number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack. It is only used ifscreen_rule="pivot"
. Default is1.25
.
- Returns:
- wrap
Wrapper state object.