Generalized Linear Model#
The Generalized Linear Model (GLM) defines the loss function in group elastic net problems. Many GLMs have already been implemented in the adelie.glm
submodule. In this notebook, we show how to use GLM objects and write custom GLM classes.
[1]:
import adelie as ad
import numpy as np
Single-Response GLM#
The most commonly used GLMs are of single-response type. Some examples of single-response GLMs are Gaussian, Binomial, Poisson, and Cox models. As an example, we show below how to construct a gaussian GLM object.
[2]:
n = 100 # number of observations
seed = 0 # seed
np.random.seed(seed)
y = np.random.normal(0, 1, n)
glm = ad.glm.gaussian(y=y)
It is possible to also specify observation weights \(w\). This may be useful if the user wishes to incorporate prior knowledge that some data points carry more signal or information in the underlying pattern. By default, our GLM classes will initialize the weights to be uniform across all data points. Moreover, the weights will be normalized to sum to \(1\). The Gaussian solvers require the weights to be normalized to \(1\), although the general GLM solvers do not have the same requirements. However, all types of solvers benefit from numerical stability if the weights are normalized. For these reasons, we enforce that the weights are normalized.
[3]:
# raises warning about normalization!
w = np.random.uniform(0, 1, n)
glm = ad.glm.gaussian(y=y, weights=w)
/Users/jhyang/sandbox/adelie/adelie/glm.py:32: UserWarning: Normalizing weights to sum to 1.
warnings.warn("Normalizing weights to sum to 1.")
[4]:
# no warnings!
w = w / np.sum(w)
glm = ad.glm.gaussian(y=y, weights=w)
The single-response GLM classes expose a few member functions required by the GLM solver. For example, .gradient()
is a member function that computes the (negative) gradient of the loss function. We show below the equivalence between calling .gradient()
and the corresponding numpy code.
[5]:
eta = np.random.normal(0, 1, n)
grad = np.empty(n)
glm.gradient(eta, grad)
assert np.allclose(
grad,
w * (y - eta),
)
It is worth mentioning that the .hessian()
member function computes the diagonal of the full hessian matrix. It is too computationally burdensome to compute the full matrix, and moreover, to invert such a matrix during the proximal Newton step in our solver. Hence, as an approximation, we only request the diagonal of the matrix. Due to the warm-starting nature of our algorithm, this approximation seems to be sufficient and does not result in divergence issues.
For the full set of methods, we refer the readers to GlmBase64.
Multi-Response GLM#
The multi-response GLM classes correspond to GLMs with multiple responses. Some examples are multigaussian and multinomial GLMs. We show below an example of constructing a multigaussian GLM object.
[6]:
n = 100 # number of observations
K = 4 # number of responses
y = np.random.normal(0, 1, (n, K))
glm = ad.glm.multigaussian(y=y, weights=w)
Multi-response GLMs work similarly as single-response GLMs except that arguments to the member functions are of different shapes. For example, the .gradient()
member function expects eta
to be of (n, K)
shape rather than (n,)
shape as before. We show below an example of calling .gradient()
.
[7]:
eta = np.random.normal(0, 1, (n, K))
grad = np.empty((n, K))
glm.gradient(eta, grad)
assert np.allclose(
grad,
w[:, None] * (y - eta) / K,
)
For the full set of methods, we refer the readers to GlmMultiBase64.
Custom GLM Class#
A key feature of adelie
is the ability to define a user-specific GLM class. This custom class can be implemented in either C++ or Python! In this section, we show how to implement a custom single-response GLM class for the Gaussian family equivalent to ad.glm.gaussian
. All of the discussion carries through for the multi-response GLM classes.
We first show the full code for our custom GLM class.
[8]:
class Gaussian(ad.glm.GlmBase64):
def __init__(self, y, w=None):
self.y = y
self.w = (
np.full(y.shape[0], 1 / y.shape[0])
if w is None else
w / np.sum(w)
)
# MUST call base class __init__!
ad.glm.GlmBase64.__init__(self, "my_gaussian", self.y, self.w)
def gradient(self, eta, grad):
grad[...] = self.w * (self.y - eta)
def hessian(self, eta, grad, hess):
hess[...] = self.w
def loss(self, eta):
return np.sum(self.w * (-self.y * eta + 0.5 * eta ** 2))
def loss_full(self):
return -0.5 * np.sum(self.w * self.y ** 2)
We remark on a few important points:
The custom GLM class must inherit from the provided base class
ad.glm.GlmBase64
. For users interested in using 32-bit floats, inherit fromad.glm.GlmBase32
, however beware of the numerical instability in using 32-bit floats! You may observe strange behaviors in the optimizer, so use with caution! Users who are implementing multi-response GLMs must inherit fromad.glm.GlmMultiBase64
orad.glm.GlmMultiBase32
.The base class constructor must get called exactly as shown above without the use of
super()
! This is a quirk of thepybind11
package which is what we rely on for exporting C++ classes to Python.The base class requires a reference to
y
andw
, however, the Python interpreter will not increment the reference counter for C++ classes holding references. To avoid memory lifetime issues, the user must save a reference to these variables on the Python side (e.g.self.y = y
andself.w = w
).Many of the member functions are given the output container to store the result of computing a quantity. Hence, we use the syntax
out[...] = expression
to modify the container in-place. A common pitfall is to writeout = expression
, which will only redirect the local variableout
to point to a different location in memory. This low-level interface is done for memory efficiency.
We now show that the GLM object can be passed into our existing solver ad.grpnet
with no further changes to the code. We first generate our data and call ad.grpnet
.
[9]:
n = 100 # number of observations
p = 1000 # number of features
X = np.random.normal(0, 1, (n, p))
y = X[:, -1] * np.random.normal(0, 1) + np.random.normal(0, 1, n)
[10]:
state = ad.grpnet(
X=X,
glm=Gaussian(y=y),
)
/Users/jhyang/sandbox/adelie/adelie/matrix.py:259: UserWarning: Detected matrix to be C-contiguous. Performance may improve with F-contiguous matrix.
warnings.warn(
44%|████ | 44/100 [00:00:00<00:00:00, 6335.87it/s] [dev:90.7%]
We can compare solutions with our Gaussian solver. To demonstrate an exact match in coefficients, we pass in a Gaussian GLM object with optimization flag turned off so that the general GLM solver is used rather than our special Gaussian solver.
[11]:
state_exp = ad.grpnet(
X=X,
glm=ad.glm.gaussian(y=y, opt=False),
)
44%|████ | 44/100 [00:00:00<00:00:00, 7097.11it/s] [dev:90.7%]
[12]:
assert np.allclose(
state.betas.toarray(),
state_exp.betas.toarray(),
)
Based on our experience, the part of the solver that interacts with the GLM object is not speed critical, so there is seldom a noticeable performance cost by using a Python GLM class. However, for the best performance, users may be interested in porting the code to C++ and exporting the bindings. We refer the readers to the glm directory in our C++ backend for the numerous examples of how to extend the base classes in C++. The associated binding code can be found in glm.cpp.