adelie.adelie_core.state.StateMultiGaussianNaive64#
- class adelie.adelie_core.state.StateMultiGaussianNaive64#
Core state class for MultiGaussian, naive method.
Methods
__init__
(*args, **kwargs)Overloaded function.
solve
(self, arg0, arg1)Solves the state-specific problem.
Attributes
Feature matrix.
Column means of
X
(weighted by \(W\)).The \(\ell_2\) norms of (corrected)
grad
across each group.List of indices into
screen_set
that correspond to active groups.Number of active groups.
Active set size for every saved solution.
Percent deviance explained tolerance.
Elastic net parameter.
Fit time on the active set for each iteration.
Fit time on the screen set for each iteration.
Invariance time for each iteration.
KKT time for each iteration.
Screen time for each iteration.
betas[i]
is the solution atlmdas[i]
.Max constraint buffer size.
List of constraints for each group.
Difference in percent deviance explained tolerance.
devs[i]
is the (normalized) \(R^2\) atbetas[i]
.List of starting indices to each dual group where G is the number of groups.
duals[i]
is the dual atlmdas[i]
.True
if the function should early exit based on training percent deviance explained.The full gradient \(-X^\top \nabla \ell(\eta)\).
List of group sizes corresponding to each element in
groups
.List of starting indices to each group where G is the number of groups.
True
if the function should fit with intercept.intercepts[i]
is the intercept atlmdas[i]
for each class.The last regularization parameter that was attempted to be solved.
The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm).
The regularization path to solve for.
Number of regularizations in the path if it is to be generated.
lmdas[i]
is the regularization \(\lambda\) used for thei
th solution.Full loss \(-\frac{1}{2} \|y\|_W^2\).
Null loss \(-\frac{1}{2} \overline{y}^2\) where \(\overline{y}\) is given by
y_mean
.Maximum number of active groups allowed.
Maximum number of coordinate descents.
Maximum number of screen groups allowed.
The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated.
True
if an intercept is added for each response.Number of classes.
Number of threads.
Number of valid solutions for each iteration.
Maximum number of iterations for the BCD update.
Convergence tolerance for the BCD update.
Penalty factor for each group in the same order as
groups
.If screening takes place, then
pivot_slack_ratio
number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack.If screening takes place, then at least
pivot_subset_min
number of active scores are used to determine the pivot point.If screening takes place, then the
(1 + pivot_subset_ratio) * s
largest active scores are used to determine the pivot point wheres
is the current screen set size.Residual \(y_c - X \beta\) where \(\beta\) is given by
screen_beta
.Weighted (by \(W\)) sum of
resid
.The change in unnormalized \(R^2\) given by \(\|y_c-X_c\beta_{\mathrm{old}}\|_{W}^2 - \|y_c-X_c\beta_{\mathrm{curr}}\|_{W}^2\).
Column means of \(X\) for screen groups (weighted by \(W\)).
List of indices that index a corresponding list of values for each screen group.
Coefficient vector on the screen set.
Hashmap containing the same values as
screen_set
.Boolean vector that indicates whether each screen group in
groups
is active or not.Strong rule type.
List of indices into
groups
that correspond to the screen groups.Strong set size for every saved solution.
List of \(V_k\) where \(V_k\) is from the SVD of \(\sqrt{W} X_{c,k}\) along the screen groups \(k\) and for possibly column-centered (weighted by \(W\)) \(X_k\).
List of \(D_k^2\) where \(D_k\) is from the SVD of \(\sqrt{W} X_{c,k}\) along the screen groups \(k\) and for possibly column-centered (weighted by \(W\)) \(X_k\).
True
if the function should setup \(\lambda_\max\).True
if the function should setup the regularization path.Coordinate descent convergence tolerance.
Observation weights \(W\).
Mean of the response vector \(y\) (weighted by \(W\)), i.e. \(\mathbf{1}^\top W y\).
Variance of the response vector \(y\) (weighted by \(W\)), i.e. \(\|y_c\|_{W}^2\).
- __init__(*args, **kwargs)#
Overloaded function.
__init__(self: adelie.adelie_core.state.StateMultiGaussianNaive64, n_classes: int, multi_intercept: bool, X: adelie.adelie_core.matrix.MatrixNaiveBase64, X_means: numpy.ndarray[numpy.float64[1, n]], y_mean: float, y_var: float, resid: numpy.ndarray[numpy.float64[1, n]], resid_sum: float, constraints: adelie.adelie_core.constraint.VectorConstraintBase64, groups: numpy.ndarray[numpy.int64[1, n]], group_sizes: numpy.ndarray[numpy.int64[1, n]], dual_groups: numpy.ndarray[numpy.int64[1, n]], alpha: float, penalty: numpy.ndarray[numpy.float64[1, n]], weights: numpy.ndarray[numpy.float64[1, n]], lmda_path: numpy.ndarray[numpy.float64[1, n]], lmda_max: float, min_ratio: float, lmda_path_size: int, max_screen_size: int, max_active_size: int, pivot_subset_ratio: float, pivot_subset_min: int, pivot_slack_ratio: float, screen_rule: str, max_iters: int, tol: float, adev_tol: float, ddev_tol: float, newton_tol: float, newton_max_iters: int, early_exit: bool, setup_lmda_max: bool, setup_lmda_path: bool, intercept: bool, n_threads: int, screen_set: numpy.ndarray[numpy.int64[1, n]], screen_beta: numpy.ndarray[numpy.float64[1, n]], screen_is_active: numpy.ndarray[bool[1, n]], active_set_size: int, active_set: numpy.ndarray[numpy.int64[1, n]], rsq: float, lmda: float, grad: numpy.ndarray[numpy.float64[1, n]]) -> None
__init__(self: adelie.adelie_core.state.StateMultiGaussianNaive64, arg0: adelie.adelie_core.state.StateMultiGaussianNaive64) -> None
- solve(self: adelie.adelie_core.state.StateMultiGaussianNaive64, arg0: bool, arg1: Callable[[adelie.adelie_core.state.StateMultiGaussianNaive64], bool]) dict #
Solves the state-specific problem.
- X#
Feature matrix.
- X_means#
Column means of
X
(weighted by \(W\)).
- abs_grad#
The \(\ell_2\) norms of (corrected)
grad
across each group.abs_grad[i]
is given bynp.linalg.norm(grad[g:g+gs] - lmda * penalty[i] * (1-alpha) * beta[g:g+gs] - correction)
whereg = groups[i]
,gs = group_sizes[i]
,beta
is the full solution vector represented byscreen_beta
, andcorrection
is the output from callingconstraints[i].gradient()
.
- active_set#
List of indices into
screen_set
that correspond to active groups.screen_set[active_set[i]]
is thei
th active group. An active group is one with non-zero coefficient block, that is, for everyi
th active group,screen_beta[b:b+p] == 0
wherej = active_set[i]
,k = screen_set[j]
,b = screen_begins[j]
, andp = group_sizes[k]
.
- active_set_size#
Number of active groups.
active_set[i]
is only well-defined fori
in the range[0, active_set_size)
.
- active_sizes#
Active set size for every saved solution.
- adev_tol#
Percent deviance explained tolerance.
- alpha#
Elastic net parameter.
- benchmark_fit_active#
Fit time on the active set for each iteration.
- benchmark_fit_screen#
Fit time on the screen set for each iteration.
- benchmark_invariance#
Invariance time for each iteration.
- benchmark_kkt#
KKT time for each iteration.
- benchmark_screen#
Screen time for each iteration.
- betas#
betas[i]
is the solution atlmdas[i]
.
- constraint_buffer_size#
Max constraint buffer size. Equivalent to
np.max([0 if c is None else c.buffer_size() for c in constraints])
.
- constraints#
List of constraints for each group.
constraints[i]
is the constraint object corresponding to groupi
.
- ddev_tol#
Difference in percent deviance explained tolerance.
- devs#
devs[i]
is the (normalized) \(R^2\) atbetas[i]
.
- dual_groups#
List of starting indices to each dual group where G is the number of groups.
dual_groups[i]
is the starting index of thei
th dual group.
- duals#
duals[i]
is the dual atlmdas[i]
.
- early_exit#
True
if the function should early exit based on training percent deviance explained.
- grad#
The full gradient \(-X^\top \nabla \ell(\eta)\).
- group_sizes#
List of group sizes corresponding to each element in
groups
.group_sizes[i]
is the group size of thei
th group.
- groups#
List of starting indices to each group where G is the number of groups.
groups[i]
is the starting index of thei
th group.
- intercept#
True
if the function should fit with intercept.
- intercepts#
intercepts[i]
is the intercept atlmdas[i]
for each class.
- lmda#
The last regularization parameter that was attempted to be solved.
- lmda_max#
The smallest \(\lambda\) such that the true solution is zero for all coefficients that have a non-vanishing group lasso penalty (\(\ell_2\)-norm).
- lmda_path#
The regularization path to solve for.
- lmda_path_size#
Number of regularizations in the path if it is to be generated.
- lmdas#
lmdas[i]
is the regularization \(\lambda\) used for thei
th solution.
- loss_full#
Full loss \(-\frac{1}{2} \|y\|_W^2\).
- loss_null#
Null loss \(-\frac{1}{2} \overline{y}^2\) where \(\overline{y}\) is given by
y_mean
.
- max_active_size#
Maximum number of active groups allowed.
- max_iters#
Maximum number of coordinate descents.
- max_screen_size#
Maximum number of screen groups allowed.
- min_ratio#
The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated.
- multi_intercept#
True
if an intercept is added for each response.
- n_classes#
Number of classes.
- n_threads#
Number of threads.
- n_valid_solutions#
Number of valid solutions for each iteration.
- newton_max_iters#
Maximum number of iterations for the BCD update.
- newton_tol#
Convergence tolerance for the BCD update.
- penalty#
Penalty factor for each group in the same order as
groups
.
- pivot_slack_ratio#
If screening takes place, then
pivot_slack_ratio
number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack.
- pivot_subset_min#
If screening takes place, then at least
pivot_subset_min
number of active scores are used to determine the pivot point.
- pivot_subset_ratio#
If screening takes place, then the
(1 + pivot_subset_ratio) * s
largest active scores are used to determine the pivot point wheres
is the current screen set size.
- resid#
Residual \(y_c - X \beta\) where \(\beta\) is given by
screen_beta
.
- resid_sum#
Weighted (by \(W\)) sum of
resid
.
- rsq#
The change in unnormalized \(R^2\) given by \(\|y_c-X_c\beta_{\mathrm{old}}\|_{W}^2 - \|y_c-X_c\beta_{\mathrm{curr}}\|_{W}^2\).
- screen_X_means#
Column means of \(X\) for screen groups (weighted by \(W\)).
- screen_begins#
List of indices that index a corresponding list of values for each screen group.
screen_begins[i]
is the starting index corresponding to thei
th screen group. From this index, readinggroup_sizes[screen_set[i]]
number of elements will grab values corresponding to the fulli
th screen group block.
- screen_beta#
Coefficient vector on the screen set.
screen_beta[b:b+p]
is the coefficient for thei
th screen group wherek = screen_set[i]
,b = screen_begins[i]
, andp = group_sizes[k]
.
- screen_hashset#
Hashmap containing the same values as
screen_set
.
- screen_is_active#
Boolean vector that indicates whether each screen group in
groups
is active or not.screen_is_active[i]
isTrue
if and only ifscreen_set[i]
is active.
- screen_rule#
Strong rule type.
- screen_set#
List of indices into
groups
that correspond to the screen groups.screen_set[i]
isi
th screen group.
- screen_sizes#
Strong set size for every saved solution.
- screen_transforms#
List of \(V_k\) where \(V_k\) is from the SVD of \(\sqrt{W} X_{c,k}\) along the screen groups \(k\) and for possibly column-centered (weighted by \(W\)) \(X_k\). It only needs to be properly initialized for groups with size > 1.
screen_transforms[i]
is \(V_k\) for thei
th screen group wherek = screen_set[i]
.
- screen_vars#
List of \(D_k^2\) where \(D_k\) is from the SVD of \(\sqrt{W} X_{c,k}\) along the screen groups \(k\) and for possibly column-centered (weighted by \(W\)) \(X_k\).
screen_vars[b:b+p]
is \(D_k^2\) for thei
th screen group wherek = screen_set[i]
,b = screen_begins[i]
, andp = group_sizes[k]
.
- setup_lmda_max#
True
if the function should setup \(\lambda_\max\).
- setup_lmda_path#
True
if the function should setup the regularization path.
- tol#
Coordinate descent convergence tolerance.
- weights#
Observation weights \(W\).
- y_mean#
Mean of the response vector \(y\) (weighted by \(W\)), i.e. \(\mathbf{1}^\top W y\).
- y_var#
Variance of the response vector \(y\) (weighted by \(W\)), i.e. \(\|y_c\|_{W}^2\).