# __Introduction to Group Elastic Net__

In this notebook, we give a brief overview of the group elastic net problem that `adelie` solves.

## __Single-Response Group Elastic Net__

The single-response group elastic net problem is given by
$$
\begin{align*}
 \mathrm{minimize}_{\beta, \beta_0} \quad&
 \ell(\eta) + \lambda \sum\limits_{g=1}^G \omega_g \left(
 \alpha \|\beta_g\|_2 + \frac{1-\alpha}{2} \|\beta_g\|_2^2
 \right)
 \\\text{subject to}\quad&
 \eta = X \beta + \beta_0 \mathbf{1} + \eta^0
\end{align*}
$$
where 
$\beta_0$ is the intercept,
$\beta$ is the coefficient vector,
$X$ is the feature matrix,
$\eta^0$ is a fixed offset vector,
$\lambda \geq 0$ is the regularization parameter,
$G$ is the number of groups,
$\omega \geq 0$ is the penalty factor,
$\alpha \in [0,1]$ is the elastic net parameter,
and $\beta_g$ are the coefficients for the $g$ th group.
$\ell(\cdot)$ is the loss function defined by the GLM.
As an example, the Gaussian GLM 
([ad.glm.gaussian](https://jamesyang007.github.io/adelie/generated/adelie.glm.gaussian.html))
defines the loss function as
$$
\begin{align*}
 \ell(\eta)
 &=
 \sum\limits_{i=1}^n w_i \left(
 -y_i \eta_i + \frac{\eta_i^2}{2}
 \right)
\end{align*}
$$
where
$w \geq 0$ is the observation weight vector,
$y$ is the response vector,
and $\eta$ is the linear prediction vector as in the optimization problem above.

Specifically for the Gaussian GLM, we employ a specialized optimizer based on coordinate descent
to solve the group elastic net problem.
For other general GLMs, we use a proximal Newton method, 
which leads to an Iterative Reweighted Least Squares (IRLS) algorithm,
That is, we iteratively perform a quadratic approximation to $\ell(\cdot)$, 
which yields a sequence of Gaussian GLM group elastic net problems
that we solve using our special solver based on coordinate descent.

The Gaussian GLM also admits a different algorithm, which we call the _the covariance method_,
using summary statistics rather than individual-level data.
The covariance method solves the following problem:
$$
\begin{align*}
 \mathrm{minimize}_{\beta} \quad&
 \frac{1}{2} \beta^\top A \beta
 - v^\top \beta
 + 
 \lambda \sum\limits_{g=1}^G \omega_g \left(
 \alpha \|\beta_g\|_2 + \frac{1-\alpha}{2} \|\beta_g\|_2^2
 \right)
\end{align*}
$$
This method would be equivalent to the usual single-response Gaussian group elastic net problem
if $A \equiv X_c^\top W X_c$ and $v \equiv X_c^\top W y_c$
where $X_c$ is column-centered version of $X$ 
and $y_c$ is the centered version of $y-\eta^0$
where the means are computed with weights $W$
(if intercept is to be fit).

This method only works for the Gaussian case since the proximal Newton method
changes the weights $W$ at every IRLS iteration,
so that without access to $X$, it is not possible to compute the new "$A$" and "$v$".

## __Multi-Response Group Elastic Net__

The multi-response group elastic net problem is given by
$$
\begin{align*}
 \mathrm{minimize}_{\beta, \beta_0} \quad&
 \ell(\eta) + \lambda \sum\limits_{g=1}^G \omega_g \left(
 \alpha \|\beta_g\|_2 + \frac{1-\alpha}{2} \|\beta_g\|_2^2
 \right)
 \\\text{subject to}\quad&
 \mathrm{vec}(\eta^\top) = (X \otimes I_K) \beta + (\mathbf{1} \otimes I_K) \beta_0 + \mathrm{vec}(\eta^{0\top})
\end{align*}
$$
where $\mathrm{vec}(\cdot)$ is the operator that flattens a column-major matrix into a vector,
and $A \otimes B$ is the Kronecker product operator.
The more familiar (but equivalent) constraint form is
$$
\begin{align*}
 \eta = X B + \mathbf{1} \beta_0^\top + \eta^0
\end{align*}
$$
where $\beta \equiv \mathrm{vec}(B^\top)$.
This way, we have possibly different linear predictions for each response.
Note that if an intercept is included in the model, an intercept is added for each response.

As indicated above, the multi-response group elastic net problem is technically of the same form
as the single-response group elastic net problem.
In fact, `adelie` reuses the single-response solver for multi-response problems
by modifying the inputs appropriately 
(e.g. using [ad.matrix.kronecker_eye](https://jamesyang007.github.io/adelie/generated/adelie.matrix.kronecker_eye.html) to represent $X \otimes I_K$).
For the MultiGaussian family, we wrap the specialized single-response Gaussian solver
and otherwise for general multi-response GLMs, we wrap the single-response GLM solver.