adelie.cv.cv_grpnet#

adelie.cv.cv_grpnet(X: ndarray | MatrixNaiveBase32 | MatrixNaiveBase64, glm: GlmBase32 | GlmBase64 | GlmMultiBase32 | GlmMultiBase64, *, n_threads: int = 1, early_exit: bool = False, min_ratio: float = 0.1, lmda_path_size: int = 100, n_folds: int = 5, seed: int | None = None, **grpnet_params)[source]#

Solves cross-validated group elastic net via naive method.

This function was written with the intent that glm is to be one of the GLMs defined in adelie.glm. In particular, we assume the observation weights w associated with glm has the property that if w[i] == 0, then the i th prediction \(\eta_i\) is ignored in the computation of the loss.

Parameters:
X(n, p) Union[ndarray, MatrixNaiveBase32, MatrixNaiveBase64]

Feature matrix. It is typically one of the matrices defined in adelie.matrix submodule or numpy.ndarray.

glmUnion[GlmBase32, GlmBase64, GlmMultiBase32, GlmMultiBase64]

GLM object. It is typically one of the GLM classes defined in adelie.glm submodule.

n_threadsint, optional

Number of threads. Default is 1.

early_exitbool, optional

True if the function should early exit based on training deviance explained. Unlike in adelie.solver.grpnet(), the default value is False. This is because internally, we construct a common regularization path that roughly contains every generated path using each training fold. If early_exit is True, then some training folds may not fit some smaller \(\lambda\)’s, in which case, an extrapolation method is used based on adelie.diagnostic.coefficient(). To avoid misinterpretation of the CV loss curve for the general user, we disable early exiting and fit on the entire (common) path for every training fold. If early_exit is True, the user may see a flat component to the right of the loss curve. The user must be aware that this may then be due to the extrapolation giving the same coefficients. Default is False.

min_ratiofloat, optional

The ratio between the largest and smallest \(\lambda\) in the regularization sequence. Unlike in adelie.solver.grpnet(), the default value is increased. This is because CV tends to pick a \(\lambda\) early in the path. If the loss curve does not look bowl-shaped, the user may decrease this value to fit further down the regularization path. Default is 1e-1.

lmda_path_sizeint, optional

Number of regularizations in the path. Default is 100.

n_foldsint, optional

Number of CV folds. Default is 5.

seedint, optional

Seed for random number generation. If None, the seed is not explicitly set. Default is None.

**grpnet_paramsoptional

Parameters to adelie.solver.grpnet(). The following cannot be specified:

  • ddev_tol: internally enforced to be 0. Otherwise, the solver may stop too early when early_exit=True.

Returns:
resultCVGrpnetResult

Result of running K-fold CV.