linear_model

All linear models operate through the basic structure provided by the base class. The base class performs the necessary bootstrapping, fitting procedures, intersection step, and model averaging. The derived classes simply provide objects to the base class that perform the actual fits (e.g., UoILasso provides Lasso and LinearRegression objects to the base class).

Base Classes

The base class for all linear models is AbstractUoILinearModel. Intermediate derived classes, AbstractUoILinearRegressor (for lasso and elastic net), and AbstractUoIGeneralizedLinearRegressor (for logistic and Poisson regression) are also provided.

class pyuoi.linear_model.base.AbstractUoIGeneralizedLinearRegressor(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, stability_selection=1.0, estimation_score='acc', estimation_target=None, copy_X=True, fit_intercept=True, standardize=True, random_state=None, max_iter=None, tol=None, shared_support=True, comm=None, logger=None)[source]

An abstract base class for UoI linear classifier classes.

intersect(coef, thresholds)[source]

Intersect coefficients accross all thresholds.

This implementation will account for multi-class classification.

class pyuoi.linear_model.base.AbstractUoILinearModel(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, stability_selection=1.0, fit_intercept=True, standardize=True, shared_support=True, max_iter=None, tol=None, random_state=None, comm=None, logger=None)[source]

An abstract base class for UoI linear_model classes.

Parameters
  • n_boots_sel (int) – The number of data bootstraps to use in the selection module. Increasing this number will make selection more strict.

  • n_boots_est (int) – The number of data bootstraps to use in the estimation module. Increasing this number will relax selection and decrease variance.

  • selection_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the selection module. Small values of this parameter imply larger “perturbations” to the dataset.

  • estimation_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the estimation module. The remaining data is used to obtain validation scores. Small values of this parameters imply larger “perturbations” to the dataset.

  • stability_selection (int, float, or array-like) – If int, treated as the number of bootstraps that a feature must appear in to guarantee placement in selection profile. If float, must be between 0 and 1, and is instead the proportion of bootstraps. If array-like, must consist of either ints or floats between 0 and 1. In this case, each entry in the array-like object will act as a separate threshold for placement in the selection profile.

  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • standardize (bool) – If True, the regressors X will be standardized before regression by subtracting the mean and dividing by their standard deviations.

  • shared_support (bool) – For models with more than one output (multinomial logistic regression) this determines whether all outputs share the same support or can have independent supports.

  • max_iter (int) – Maximum number of iterations for iterative fitting methods.

  • tol (float) – Stopping criteria for solver.

  • random_state (int, RandomState instance, or None) – The seed of the pseudo random number generator that selects a random feature to update. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • comm (MPI communicator) – If passed, the selection and estimation steps are parallelized.

  • logger (Logger) – The logger to use for messages when verbose=True in fit. If None is passed, a logger that writes to sys.stdout will be used.

coef_

Estimated coefficients for the linear regression problem.

Type

array, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

supports_

Boolean array indicating whether a given regressor (column) is selected for estimation for a given regularization parameter value (row).

Type

array, shape

fit(X, y, stratify=None, verbose=False)[source]

Fit data according to the UoI algorithm.

Parameters
  • X (ndarray or scipy.sparse matrix, (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – Response vector. Will be cast to X’s dtype if necessary. Currently, this implementation does not handle multiple response variables.

  • stratify (array-like or None) – Ensures groups of samples are alloted to training/test sets proportionally. Labels for each group must be an int greater than zero. Must be of size equal to the number of samples, with further restrictions on the number of groups.

  • verbose (bool) – A switch indicating whether the fitting should print out messages displaying progress.

get_n_coef(X, y)[source]

Return the number of coefficients that will be estimated

This should return the shape of X.

abstract intersect(coef, thresholds)[source]

Intersect coefficients across all thresholds.

uoi_selection_sweep(X, y, reg_param_values)[source]

Perform selection regression on a dataset over a sweep of regularization parameter values.

Parameters
  • X (ndarray or scipy.sparse matrix, shape (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – Response vector.

  • reg_param_values (list of dicts) – A list of dictionaries containing the regularization parameter values to iterate over.

Returns

coefs – Predicted parameter values for each regularization strength.

Return type

ndarray, shape (n_param_values, n_features)

class pyuoi.linear_model.base.AbstractUoILinearRegressor(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, stability_selection=1.0, estimation_score='r2', estimation_target=None, copy_X=True, fit_intercept=True, standardize=True, random_state=None, max_iter=None, tol=None, comm=None, logger=None)[source]

An abstract base class for UoI linear regression classes.

intersect(coef, thresholds)[source]

Intersect coefficients accross all thresholds.

Lasso

The UoI_Lasso object provides the base class with a Lasso object for the selection module and a LinearRegression object for the estimation module. Additionally, the pycasso solver is provided as the PycLasso class.

class pyuoi.linear_model.lasso.PycLasso(alphas=None, fit_intercept=True, max_iter=1000, tol=0.0001)[source]

Lasso using the pycasso solver. Solves for an entire regularization path at once.

Parameters
  • alphas (nd-array) – The regularization path. Defaults to None for compatibility with UoI, but needs to be set prior to fitting.

  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations.

  • max_iter (int) – Maximum number of iterations for pycasso solver.

  • tol (float) – Stopping criteria for solver.

coef_

Estimated coefficients for the linear regression problem.

Type

ndarray, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

fit(X, y)[source]

Fit data according to the pycasso object.

Parameters
  • X (ndarray, (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – Response vector. Will be cast to X’s dtype if necessary. Currently, this implementation does not handle multiple response variables.

predict(X)[source]

Predicts responses given a design matrix.

Parameters

X (ndarray, (n_samples, n_features)) – The design matrix.

Returns

y – Predicted response vector.

Return type

ndarray, shape (n_samples,)

set_params(**kwargs)[source]

Sets the parameters of this estimator.

class pyuoi.linear_model.lasso.UoI_Lasso(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, n_lambdas=48, stability_selection=1.0, estimation_score='r2', estimation_target=None, eps=0.001, warm_start=True, copy_X=True, fit_intercept=True, standardize=True, max_iter=1000, tol=0.0001, random_state=None, comm=None, logger=None, solver='cd')[source]

UoILasso solver.

Parameters
  • n_boots_sel (int) – The number of data bootstraps/resamples to use in the selection module. Increasing this number will make selection more strict.

  • n_boots_est (int) – The number of data bootstraps/resamples to use in the estimation module. Increasing this number will relax selection and decrease variance.

  • n_lambdas (int) – The number of regularization values to use for selection.

  • selection_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the selection module. Small values of this parameter imply larger “perturbations” to the dataset.

  • estimation_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the estimation module. The remaining data is used to obtain validation scores. Small values of this parameters imply larger “perturbations” to the dataset.

  • stability_selection (int, float, or array-like) – If int, treated as the number of bootstraps that a feature must appear in order to guarantee placement in selection profile. If float, must be between 0 and 1, and is instead the proportion of bootstraps. If array-like, must consist of either ints or floats between 0 and 1. In this case, each entry in the array-like object will act as a separate threshold for placement in the selection profile.

  • estimation_score (string, "r2" | "AIC" | "AICc" | "BIC") – Objective used to choose the best estimates per bootstrap.

  • estimation_target (string, "train" | "test") – Decide whether to assess the estimation_score on the train or test data across each bootstrap. By deafult, a sensible choice is made based on the chosen estimation_score

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution

  • eps (float) – Length of the lasso path. eps=1e-3 means that lambda_min / lambda_max = 1e-3

  • copy_X (bool) – If True, X will be copied; else, it may be overwritten.

  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • standardize (bool) – If True, the regressors X will be standardized before regression by subtracting the mean and dividing by their standard deviations. This parameter is equivalent to normalize in scikit-learn models.

  • max_iter (int) – Maximum number of iterations for iterative fitting methods.

  • tol (float) – Stopping criteria for solver.

  • random_state (int, RandomState instance, or None) – The seed of the pseudo random number generator that selects a random feature to update. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • comm (MPI communicator) – If passed, the selection and estimation steps are parallelized.

  • logger (Logger) – The logger to use for messages when verbose=True in fit. If None is passed, a logger that writes to sys.stdout will be used.

  • solver (string, 'cd' | 'pyc') – If cd, will use the scikit-learn lasso implementation (via coordinate descent). If pyc, will use pyclasso, built off of the pycasso path-wise solver.

coef_

Estimated coefficients for the linear regression problem.

Type

nd-array, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

supports_

boolean array indicating whether a given regressor (column) is selected for estimation for a given regularization parameter value (row).

Type

array, shape

uoi_selection_sweep(X, y, reg_param_values)[source]

Overwrite base class selection sweep to accommodate pycasso path-wise solution

Elastic Net

The UoI_ElasticNet object provides the base class with an ElasticNet object for the selection module and a LinearRegression object for the estimation module.

class pyuoi.linear_model.elasticnet.UoI_ElasticNet(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, n_lambdas=48, alphas=array([0.5]), stability_selection=1.0, estimation_score='r2', estimation_target=None, warm_start=True, eps=0.001, copy_X=True, fit_intercept=True, standardize=True, max_iter=1000, tol=0.0001, random_state=None, comm=None, logger=None)[source]

UoIElasticNet solver.

Parameters
  • n_boots_sel (int) – The number of data bootstraps to use in the selection module. Increasing this number will make selection more strict.

  • n_boots_est (int) – The number of data bootstraps to use in the estimation module. Increasing this number will relax selection and decrease variance.

  • selection_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the selection module. Small values of this parameter imply larger “perturbations” to the dataset.

  • estimation_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the estimation module. The remaining data is used to obtain validation scores. Small values of this parameters imply larger “perturbations” to the dataset. IGNORED - Leaving this here to double check later

  • n_lambdas (int) – The number of regularization values to use for selection.

  • alphas (list or ndarray) – The parameter that trades off L1 versus L2 regularization for a given lambda.

  • stability_selection (int, float, or array-like) – If int, treated as the number of bootstraps that a feature must appear in to guarantee placement in selection profile. If float, must be between 0 and 1, and is instead the proportion of bootstraps. If array-like, must consist of either ints or floats between 0 and 1. In this case, each entry in the array-like object will act as a separate threshold for placement in the selection profile.

  • estimation_score (string, "r2" | "AIC" | "AICc" | "BIC") – Objective used to choose the best estimates per bootstrap.

  • estimation_target (string, "train" | "test") – Decide whether to assess the estimation_score on the train or test data across each bootstrap. By default, a sensible choice is made based on the chosen estimation_score.

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution

  • eps (float) – Length of the lasso path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

  • copy_X (bool) – If True, X will be copied; else, it may be overwritten.

  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations (e.g. data is expected to be already centered).

  • standardize (bool) – If True, the regressors X will be standardized before regression by subtracting the mean and dividing by their standard deviations.

  • max_iter (int) – Maximum number of iterations for iterative fitting methods.

  • tol (float) – Stopping criteria for solver.

  • random_state (int, RandomState instance, or None) – The seed of the pseudo random number generator that selects a random feature to update. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • comm (MPI communicator) – If passed, the selection and estimation steps are parallelized.

  • logger (Logger) – The logger to use for messages when verbose=True in fit. If None is passed, a logger that writes to sys.stdout will be used.

coef_

Estimated coefficients for the linear regression problem.

Type

array, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

supports_

Boolean array indicating whether a given regressor (column) is selected for estimation for a given regularization parameter value (row).

Type

ndarray, shape (n_supports, n_features)

get_reg_params(X, y)[source]

Calculates the regularization parameters (alpha and lambda) to be used for the provided data.

Note that the Elastic Net penalty is given by

\[\frac{1}{2\ \text{n_samples}} ||y - Xb||^2_2 + \lambda (\alpha |b|_1 + 0.5 (1 - \alpha) |b|^2_2)\]

where lambda and alpha are regularization parameters.

scikit-learn does not use these names. Instead, scitkit-learn denotes alpha by ‘l1_ratio’ and lambda by ‘alpha’.

Parameters
  • X (array-like, shape (n_samples, n_features)) – The design matrix.

  • y (array-like, shape (n_samples)) – The response vector.

Returns

reg_params – A list containing dictionaries with the value of each (lambda, alpha) describing the type of regularization to impose. The keys adhere to scikit-learn’s terminology (lambda->alpha, alpha->l1_ratio). This allows easy passing into the ElasticNet object.

Return type

a list of dictionaries

Logistic Regression

The UoI_L1Logistic module uses a custom logistic regression solver for both the selection and estimation modules. This solver uses a modified orthant-wise limited memory quasi-Newton algorithm. For estimation, no regularization is performed.

class pyuoi.linear_model.logistic.UoI_L1Logistic(n_boots_sel=24, n_boots_est=24, selection_frac=0.9, estimation_frac=0.9, n_C=48, stability_selection=1.0, estimation_score='acc', estimation_target=None, multi_class='auto', shared_support=True, warm_start=False, eps=1e-05, fit_intercept=True, standardize=True, max_iter=10000, tol=0.001, random_state=None, comm=None, logger=None)[source]

UoIL1-Logistic model.

Parameters
  • n_boots_sel (int) – The number of data bootstraps to use in the selection module. Increasing this number will make selection more strict.

  • n_boots_est (int) – The number of data bootstraps to use in the estimation module. Increasing this number will relax selection and decrease variance.

  • selection_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the selection module. Small values of this parameter imply larger “perturbations” to the dataset.

  • estimation_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the estimation module. The remaining data is used to obtain validation scores. Small values of this parameters imply larger “perturbations” to the dataset.

  • n_C (int) – The number of regularization values to use for selection.

  • stability_selection (int, float, or array-like) – If int, treated as the number of bootstraps that a feature must appear in to guarantee placement in selection profile. If float, must be between 0 and 1, and is instead the proportion of bootstraps. If array-like, must consist of either ints or floats between 0 and 1. In this case, each entry in the array-like object will act as a separate threshold for placement in the selection profile.

  • estimation_score (string, "acc" | "log" | "AIC", | "AICc" | "BIC") – Objective used to choose the best estimates per bootstrap.

  • estimation_target (string, "train" | "test") – Decide whether to assess the estimation_score on the train or test data across each bootstrap. By deafult, a sensible choice is made based on the chosen estimation_score

  • multi_class (string, "auto" | "multinomial") – For “multinomial” the loss minimised is the multinomial loss fit across the entire probability distribution, even when the data is binary. “auto” selects binary if the data is binary, and otherwise selects “multinomial”.

  • shared_support (bool) – For models with more than one output (multinomial logistic regression) this determines whether all outputs share the same support or can have independent supports.

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution

  • eps (float) – Length of the L1 path. eps=1e-5 means that alpha_min / alpha_max = 1e-5

  • fit_intercept (bool) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations.

  • standardize (bool) – If True, the regressors X will be standardized before regression by subtracting the mean and dividing by their standard deviations.

  • max_iter (int) – Maximum number of iterations for iterative fitting methods.

  • tol (float) – Stopping criteria for solver.

  • random_state (int, RandomState instance, or None) – The seed of the pseudo random number generator that selects a random feature to update. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • comm (MPI communicator) – If passed, the selection and estimation steps are parallelized.

  • logger (Logger) – The logger to use for messages when verbose=True in fit. If None is passed, a logger that writes to sys.stdout will be used.

coef_

Estimated coefficients for the linear regression problem.

Type

array, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

supports_

boolean array indicating whether a given regressor (column) is selected for estimation for a given regularization parameter value (row).

Type

array, shape

Poisson Regression

The poisson module provides a Poisson regression solver that uses either coordinate descent or a modified orthant-wise limited memory quasi-Newton solver. UoI_Poisson uses Poisson objects for both selection and estimation; however, the estimation module uses no regularization penalties.

class pyuoi.linear_model.poisson.Poisson(alpha=1.0, l1_ratio=1.0, fit_intercept=True, standardize=False, max_iter=1000, tol=1e-05, warm_start=False, solver='lbfgs')[source]

Generalized Linear Model with exponential link function (i.e. Poisson) trained with L1/L2 regularizer (i.e. Elastic net penalty).

The log-likelihood of the Poisson GLM is optimized by performing coordinate descent on a linearized quadratic approximation. See Chapter 5 of Hastie, Tibshirani, and Wainwright (2016) for more details.

Parameters
  • alpha (float) – Constant that multiplies the L1 term. Defaults to 1.0.

  • l1_ratio (float) – Float between 0 and 1 acting as a scaling between l1 and l2 penalties. For l1_ratio = 0 the penalty is an L2 penalty. For l1_ratio = 1 it is an L1 penalty. For 0 < l1_ratio < 1, the penalty is a combination of L1 and L2 penalties.

  • fit_intercept (bool) – Whether to fit an intercept or not.

  • standardize (bool) – If True, centers the design matrix across samples and rescales them to have standard deviation of 1.

  • tol (float) – The tolerance for the optimization: if the updates are smaller than tol, the optimization code checks the dual gap for optimality and continues until it is smaller than tol.

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution.

  • solver (string, 'lbfgs' | 'cd') – The solver to use. Options are ‘lbfgs’ (orthant-wise LBFGS) and ‘cd’ (coordinate descent).

coef_

The fitted parameter vector.

Type

ndarray, shape (n_features,)

intercept_

The fitted intercept.

Type

float

static adjusted_response(X, y, coef, intercept=0)[source]

Calculates the adjusted response when posing the fitting procedure as Iteratively Reweighted Least Squares (Newton update on log likelihood).

Parameters
  • X (array-like, shape (n_features, n_samples)) – The design matrix.

  • y (array-like, shape (n_samples)) – The response vector.

  • coef (array-like, shape (n_features)) – Current estimate of the parameters.

  • intercept (float) – The current estimate of the intercept.

Returns

  • w (array-like, shape (n_samples)) – Weights for samples. The log-likelihood for a GLM, when posed as a linear regression problem, requires reweighting the samples.

  • z (array-like, shape (n_samples)) – Working response. The linearized response when rewriting coordinate descent for the Poisson likelihood as a iteratively reweighted least squares.

fit(X, y, sample_weight=None)[source]

Fit the Poisson GLM.

Parameters
  • X (ndarray, shape (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – Response vector. Will be cast to X’s dtype if necessary. Currently, this implementation does not handle multiple response variables.

  • sample_weight (array-like, shape (n_samples,)) – Array of weights assigned to the individual samples. If None, then each sample is provided an equal weight.

predict(X)[source]

Predicts the response variable given a design matrix. The output is the mode of the Poisson distribution.

Parameters

X (array_like, shape (n_samples, n_features)) – Design matrix to predict on.

Returns

mode – The predicted response values, i.e. the modes.

Return type

array_like, shape (n_samples)

predict_mean(X)[source]

Calculates the mean response variable given a design matrix.

Parameters

X (array_like, shape (n_samples, n_features)) – Design matrix to predict on.

Returns

mu – The predicted response values, i.e. the conditional means.

Return type

array_like, shape (n_samples)

static soft_threshold(X, threshold)[source]

Performs the soft-thresholding necessary for coordinate descent lasso updates.

Parameters
  • X (array-like, shape (n_features, n_samples)) – Matrix to be thresholded.

  • threshold (float) – Soft threshold.

Returns

X_soft_threshold – Soft thresholded X.

Return type

array-like

class pyuoi.linear_model.poisson.UoI_Poisson(n_boots_sel=24, n_boots_est=24, n_lambdas=48, alphas=array([1.0]), selection_frac=0.8, estimation_frac=0.8, stability_selection=1.0, estimation_score='log', estimation_target=None, solver='lbfgs', warm_start=True, eps=0.001, tol=1e-05, copy_X=True, fit_intercept=True, standardize=True, max_iter=1000, random_state=None, comm=None, logger=None)[source]

UoIPoisson solver.

Parameters
  • n_boots_sel (int) – The number of data bootstraps to use in the selection module. Increasing this number will make selection more strict.

  • n_boots_est (int) – The number of data bootstraps to use in the estimation module. Increasing this number will relax selection and decrease variance.

  • selection_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the selection module. Small values of this parameter imply larger “perturbations” to the dataset.

  • estimation_frac (float) – The fraction of the dataset to use for training in each resampled bootstrap, during the estimation module. The remaining data is used to obtain validation scores. Small values of this parameters imply larger “perturbations” to the dataset.

  • n_lambdas (int) – The number of regularization values to use for selection.

  • alphas (list or ndarray of floats) – The parameter that trades off L1 versus L2 regularization for a given lambda.

  • stability_selection (int, float, or array-like) – If int, treated as the number of bootstraps that a feature must appear in to guarantee placement in selection profile. If float, must be between 0 and 1, and is instead the proportion of bootstraps. If array-like, must consist of either ints or floats between 0 and 1. In this case, each entry in the array-like object will act as a separate threshold for placement in the selection profile.

  • estimation_score (string, "log" | "AIC", | "AICc" | "BIC") – Objective used to choose the best estimates per bootstrap.

  • estimation_target (string, "train" | "test") – Decide whether to assess the estimation_score on the train or test data across each bootstrap. By deafult, a sensible choice is made based on the chosen estimation_score.

  • solver (string, 'lbfgs' | 'cd') – The solver to use. Options are ‘lbfgs’ (orthant-wise LBFGS) and ‘cd’ (coordinate descent).

  • warm_start (bool) – When set to True, reuse the solution of the previous call to fit as initialization, otherwise, just erase the previous solution

  • eps (float) – Length of the lasso path. eps=1e-3 means that alpha_min / alpha_max = 1e-3.

  • tol (float) – Stopping criteria for solver.

  • copy_X (boolean) – If True, X will be copied; else, it may be overwritten.

  • fit_intercept (boolean) – Whether to calculate the intercept for this model. If set to False, no intercept will be used in calculations.

  • standardize (bool) – If True, the regressors X will be standardized before regression by subtracting the mean and dividing by their standard deviations.

  • max_iter (int) – Maximum number of iterations for iterative fitting methods.

  • random_state (int, RandomState instance, or None) – The seed of the pseudo random number generator that selects a random feature to update. If int, random_state is the seed used by the random number generator; If RandomState instance, random_state is the random number generator; If None, the random number generator is the RandomState instance used by np.random.

  • comm (MPI communicator) – If passed, the selection and estimation steps are parallelized.

  • logger (Logger) – The logger to use for messages when verbose=True in fit. If None is passed, a logger that writes to sys.stdout will be used.

coef_

Estimated coefficients for the linear regression problem.

Type

ndarray, shape (n_features,) or (n_targets, n_features)

intercept_

Independent term in the linear model.

Type

float

supports_

Boolean array indicating whether a given regressor (column) is selected for estimation for a given regularization parameter value (row).

Type

ndarray, shape (n_supports, n_features)

get_reg_params(X, y)[source]

Calculates the regularization parameters (alpha and lambda) to be used for the provided data.

Parameters
  • X (array-like, shape (n_samples, n_features)) – The design matrix.

  • y (array-like, shape (n_samples)) – The response vector.

Returns

reg_params – A list containing dictionaries with the value of each (lambda, alpha) describing the type of regularization to impose. The keys adhere to scikit-learn’s terminology (lambda->alpha, alpha->l1_ratio). This allows easy passing into the ElasticNet object.

Return type

a list of dictionaries