adelie.solver.grpnet#

adelie.solver.grpnet(X: ndarray | MatrixNaiveBase32 | MatrixNaiveBase64, glm: GlmBase32 | GlmBase64 | GlmMultiBase32 | GlmMultiBase64, *, constraints: list[ConstraintBase32 | ConstraintBase64] | None = None, groups: ndarray | None = None, alpha: float = 1, penalty: ndarray | None = None, offsets: ndarray | None = None, lmda_path: ndarray | None = None, irls_max_iters: int = 10000, irls_tol: float = 1e-07, max_iters: int = 100000, tol: float = 1e-07, adev_tol: float = 0.9, ddev_tol: float = 0, newton_tol: float = 1e-12, newton_max_iters: int = 1000, n_threads: int = 1, early_exit: bool = True, intercept: bool = True, screen_rule: str = 'pivot', min_ratio: float = 0.01, lmda_path_size: int = 100, max_screen_size: int | None = None, max_active_size: int | None = None, pivot_subset_ratio: float = 0.1, pivot_subset_min: int = 1, pivot_slack_ratio: float = 1.25, check_state: bool = False, progress_bar: bool = True, warm_start=None, exit_cond: Callable | None = None)[source]#

Solves group elastic net via naive method.

The group elastic net problem minimizes the following:

\[\begin{split}\begin{align*} \mathrm{minimize}_{\beta, \beta_0} \quad& \ell(\eta) + \lambda \sum\limits_{g=1}^G \omega_g \left( \alpha \|\beta_g\|_2 + \frac{1-\alpha}{2} \|\beta_g\|_2^2 \right) \\ \text{subject to} \quad& \eta = X\beta + \beta_0 \mathbf{1} + \eta^0 \end{align*}\end{split}\]

where \(\beta_0\) is the intercept, \(\beta\) is the coefficient vector, \(X\) is the feature matrix, \(\eta^0\) is a fixed offset vector, \(\lambda \geq 0\) is the regularization parameter, \(G\) is the number of groups, \(\omega \geq 0\) is the penalty factor, \(\alpha \in [0,1]\) is the elastic net parameter, \(\beta_g\) are the coefficients for the \(g\) th group, and \(\ell(\cdot)\) is the loss function defined by a GLM.

For multi-response problems (i.e. when \(y\) is 2-dimensional) such as in multigaussian or multinomial, the group elastic net problem minimizes the following:

\[\begin{split}\begin{align*} \mathrm{minimize}_{\beta, \beta_0} \quad& \ell(\eta) + \lambda \sum\limits_{g=1}^G \omega_g \left( \alpha \|\beta_g\|_2 + \frac{1-\alpha}{2} \|\beta_g\|_2^2 \right) \\ \text{subject to} \quad& \mathrm{vec}(\eta^\top) = (X\otimes I_K) \beta + (\mathbf{1}\otimes I_K) \beta_0 + \mathrm{vec}(\eta^{0\top}) \end{align*}\end{split}\]

where \(\mathrm{vec}(\cdot)\) is the operator that flattens the input as column-major. Note that if intercept is True, then an intercept for each class is provided as additional unpenalized features in the data matrix and the global intercept is turned off.

Parameters:
X(n, p) Union[ndarray, MatrixNaiveBase32, MatrixNaiveBase64]

Feature matrix. It is typically one of the matrices defined in adelie.matrix submodule or numpy.ndarray.

glmUnion[GlmBase32, GlmBase64, GlmMultiBase32, GlmMultiBase64]

GLM object. It is typically one of the GLM classes defined in adelie.glm submodule.

constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]], optional

List of constraints for each group. constraints[i] is the constraint object corresponding to group i. If constraints[i] is None, then the i th group is unconstrained. If None, every group is unconstrained. Default is None.

groups(G,) ndarray, optional

List of starting indices to each group where G is the number of groups. groups[i] is the starting index of the i th group. If glm is multi-response type, then we only allow two types of groupings:

  • "grouped": coefficients for each predictor is grouped across the classes.

  • "ungrouped": every coefficient is its own group.

Default is None, in which case it is set to np.arange(p) if y is single-response and "grouped" if multi-response.

alphafloat, optional

Elastic net parameter. It must be in the range \([0,1]\). Default is 1.

penalty(G,) ndarray, optional

Penalty factor for each group in the same order as groups. It must be a non-negative vector. Default is None, in which case, it is set to np.sqrt(group_sizes).

offsets(n,) or (n, K) ndarray, optional

Observation offsets \(\eta^0\). Default is None, in which case, it is set to np.zeros(n) if y is single-response and np.zeros((n, K)) if multi-response.

lmda_path(L,) ndarray, optional

The regularization path to solve for. The full path is not considered if early_exit is True. It is recommended that the path is sorted in decreasing order. If None, the path will be generated. Default is None.

irls_max_itersint, optional

Maximum number of IRLS iterations. This parameter is only used if glm is not of gaussian type. Default is int(1e4).

irls_tolfloat, optional

IRLS convergence tolerance. This parameter is only used if glm is not of gaussian type. Default is 1e-7.

max_itersint, optional

Maximum number of coordinate descents. Default is int(1e5).

tolfloat, optional

Coordinate descent convergence tolerance. Default is 1e-7.

adev_tolfloat, optional

Percent deviance explained tolerance. Default is 0.9.

ddev_tolfloat, optional

Difference in percent deviance explained tolerance. Default is 0.

newton_tolfloat, optional

Convergence tolerance for the BCD update. Default is 1e-12.

newton_max_itersint, optional

Maximum number of iterations for the BCD update. Default is 1000.

n_threadsint, optional

Number of threads. Default is 1.

early_exitbool, optional

True if the function should early exit based on training deviance explained. Default is True.

min_ratiofloat, optional

The ratio between the largest and smallest \(\lambda\) in the regularization sequence if it is to be generated. Default is 1e-2.

lmda_path_sizeint, optional

Number of regularizations in the path if it is to be generated. Default is 100.

interceptbool, optional

True if the function should fit with intercept. If y is multi-response, then an intercept for each class is added and the global intercept is turned off. Default is True.

screen_rulestr, optional

The type of screening rule to use. It must be one of the following options:

  • "strong": adds groups whose active scores are above the strong threshold.

  • "pivot": adds groups whose active scores are above the pivot cutoff with slack.

Default is "pivot".

max_screen_sizeint, optional

Maximum number of screen groups allowed. The function will return a valid state and guarantees to have screen set size less than or equal to max_screen_size. If None, it will be set to the total number of groups. Default is None.

max_active_sizeint, optional

Maximum number of active groups allowed. The function will return a valid state and guarantees to have active set size less than or equal to max_active_size. If None, it will be set to the total number of groups. Default is None.

pivot_subset_ratiofloat, optional

If screening takes place, then the (1 + pivot_subset_ratio) * s largest active scores are used to determine the pivot point where s is the current screen set size. It is only used if screen_rule="pivot". Default is 0.1.

pivot_subset_minint, optional

If screening takes place, then at least pivot_subset_min number of active scores are used to determine the pivot point. It is only used if screen_rule="pivot". Default is 1.

pivot_slack_ratiofloat, optional

If screening takes place, then pivot_slack_ratio number of groups with next smallest (new) active scores below the pivot point are also added to the screen set as slack. It is only used if screen_rule="pivot". Default is 1.25.

check_statebool, optional

True is state should be checked for inconsistencies before calling solver. Default is False.

Warning

The check may take a long time if the inputs are big!

progress_barbool, optional

True to enable progress bar. Default is True.

warm_startoptional

If no warm-start is provided, the initial solution is set to 0 and other invariance quantities are set accordingly. Otherwise, the warm-start is used to extract all necessary state variables. If warm-start is used, the user must still provide consistent inputs, that is, warm-start will not overwrite most arguments passed into this function. However, changing configuration settings such as tolerance levels is well-defined. Default is None.

Note

The primary use-case is when a user already called the function with warm_start=False but would like to continue fitting down a longer path of regularizations. This way, the user does not have to restart the fit at the beginning, but can simply continue from the last returned state.

Warning

We have only tested warm-starts in the setting described in the note above, that is, when lmda_path and possibly static configurations have changed. Use with caution in other settings!

exit_condCallable, optional

If not None, it must be a callable object that takes in a single argument. The argument is the current state object of the same type as the return value. During the optimization, after obtaining the solution at each regularization value, exit_cond(state) is evaluated as an opportunity for the user to early exit the program based on their own rule. Default is None.

Note

The algorithm early exits if exit_cond(state) evaluates to True or the built-in early exit function evaluates to True (if early_exit is True).

Returns:
state

The resulting state after running the solver.