Introduction to Group Elastic Net#
In this notebook, we give a brief overview of the group elastic net problem that adelie
solves.
Single-Response Group Elastic Net#
The single-response group elastic net problem is given by
where \(\beta_0\) is the intercept, \(\beta\) is the coefficient vector, \(X\) is the feature matrix, \(\eta^0\) is a fixed offset vector, \(\lambda \geq 0\) is the regularization parameter, \(G\) is the number of groups, \(\omega \geq 0\) is the penalty factor, \(\alpha \in [0,1]\) is the elastic net parameter, and \(\beta_g\) are the coefficients for the \(g\) th group. \(\ell(\cdot)\) is the loss function defined by the GLM. As an example, the Gaussian GLM (ad.glm.gaussian) defines the loss function as
where \(w \geq 0\) is the observation weight vector, \(y\) is the response vector, and \(\eta\) is the linear prediction vector as in the optimization problem above.
Specifically for the Gaussian GLM, we employ a specialized optimizer based on coordinate descent to solve the group elastic net problem. For other general GLMs, we use a proximal Newton method, which leads to an Iterative Reweighted Least Squares (IRLS) algorithm, That is, we iteratively perform a quadratic approximation to \(\ell(\cdot)\), which yields a sequence of Gaussian GLM group elastic net problems that we solve using our special solver based on coordinate descent.
The Gaussian GLM also admits a different algorithm, which we call the the covariance method, using summary statistics rather than individual-level data. The covariance method solves the following problem:
This method would be equivalent to the usual single-response Gaussian group elastic net problem if \(A \equiv X_c^\top W X_c\) and \(v \equiv X_c^\top W y_c\) where \(X_c\) is column-centered version of \(X\) and \(y_c\) is the centered version of \(y-\eta^0\) where the means are computed with weights \(W\) (if intercept is to be fit).
This method only works for the Gaussian case since the proximal Newton method changes the weights \(W\) at every IRLS iteration, so that without access to \(X\), it is not possible to compute the new “\(A\)” and “\(v\)”.
Multi-Response Group Elastic Net#
The multi-response group elastic net problem is given by
where \(\mathrm{vec}(\cdot)\) is the operator that flattens a column-major matrix into a vector, and \(A \otimes B\) is the Kronecker product operator. The more familiar (but equivalent) constraint form is
where \(\beta \equiv \mathrm{vec}(B^\top)\). This way, we have possibly different linear predictions for each response. Note that if an intercept is included in the model, an intercept is added for each response.
As indicated above, the multi-response group elastic net problem is technically of the same form as the single-response group elastic net problem. In fact, adelie
reuses the single-response solver for multi-response problems by modifying the inputs appropriately (e.g. using ad.matrix.kronecker_eye to represent \(X \otimes I_K\)). For the MultiGaussian family, we wrap the specialized single-response
Gaussian solver and otherwise for general multi-response GLMs, we wrap the single-response GLM solver.