adelie.diagnostic.gradient_norms#

adelie.diagnostic.gradient_norms(grads: ndarray, betas: csr_matrix, duals: csr_matrix, lmdas: ndarray, *, constraints: list[ConstraintBase32 | ConstraintBase64] | None = None, groups: ndarray | None = None, alpha: float = 1, penalty: ndarray | None = None)[source]#

Computes the group-wise gradient norms.

The group-wise gradient norm is given by \(\hat{h} \in \mathbb{R}^{G}\) where

\[\begin{align*} \hat{h}_g = \| \hat{\gamma}_g - \lambda (1-\alpha) \omega_g \beta_g - \phi_g'(\beta_g)^\top \mu_g \|_2 \quad g=1,\ldots, G \end{align*}\]

where \(\hat{\gamma}_g\) is the gradient as in adelie.diagnostic.gradients(), \(\lambda\) is the regularization, \(\alpha\) is the elastic net proportion, \(\omega_g\) is the penalty factor, \(\beta_g\) is the coefficient block for group \(g\), \(\phi_g\) is the constraint function for group \(g\), and \(\mu_g\) is the dual block for group \(g\).

Parameters:
grads(L, p) or (L, p, K) ndarray

Gradients.

betas(L, p) or (L, p*K) csr_matrix

Coefficient vectors \(\beta\).

duals(L, d) csr_matrix

Dual vectors \(\mu\).

lmdas(L,) ndarray

Regularization parameters \(\lambda\).

constraints(G,) list[Union[ConstraintBase32, ConstraintBase64]], optional

List of constraints for each group. constraints[i] is the constraint object corresponding to group i. If constraints[i] is None, then the i th group is unconstrained. If None, every group is unconstrained. Default is None.

groups(G,) ndarray, optional

List of starting indices to each group where G is the number of groups. groups[i] is the starting index of the i th group. If the gradient is of multi-response type, then we only allow two types of groupings:

  • "grouped": coefficients for each predictor is grouped across the classes.

  • "ungrouped": every coefficient is its own group.

Default is None, in which case it is set to np.arange(p) if y is single-response and "grouped" if multi-response.

alphafloat, optional

Elastic net parameter \(\alpha\). It must be in the range \([0,1]\). Default is 1.

penalty(G,) ndarray, optional

Penalty factor for each group in the same order as groups. It must be a non-negative vector. Default is None, in which case, it is set to np.sqrt(group_sizes).

Returns:
norms(L, G) ndarray

Gradient norms.