pyuoi.datasets

Dataset utility functions for the pyuoi package.

Testing Utilities

pyuoi.datasets.make_classification(n_samples=100, n_features=20, n_informative=2, n_classes=2, shared_support=False, random_state=None, w_scale=1.0, include_intercept=False)[source]

Make a linear classification dataset.

Parameters
  • n_samples (int) – The number of samples to make.

  • n_features (int) – The number of feature to use.

  • n_informative (int) – The number of feature with non-zero weights.

  • n_classes (int) – The number of classes.

  • shared_support (bool) – If True, all classes will share the same random support. If False, they will each have randomly chooses support.

  • random_state (int or np.random.RandomState instance) – Random number seed or state.

  • w_scale (float) – The model parameter matrix, w, will be drawn from a normal distribution with std=w_scale.

  • include_intercept (bool) – If true, includes an intercept in the model, if False, the intercept is set to 0.

pyuoi.datasets.make_linear_regression(n_samples=100, n_features=5, n_informative=2, X_loc=3.0, X_scale=1.0, snr=5.0, beta=None, beta_low=1.0, beta_high=3.0, include_intercept=False, random_state=None)[source]

Make a Linear regression dataset.

Parameters
  • n_samples (int) – The number of samples to make.

  • n_features (int) – The number of feature to use.

  • n_informative (int) – The number of feature with non-zero weights.

  • X_loc (float) – The mean of the features in the design matrix.

  • X_scale (float) – The standard deviation of the features in the design matrix.

  • snr (float) – The signal-to-noise ratio, which informs the variance of the noise term.

  • beta (np.ndarray or None) – The beta values to use. If None, beta values will be drawn from a uniform distribution.

  • beta_low (float) – The lower bound for the beta values.

  • beta_high (float) – The upper bound for the beta values.

  • include_intercept (bool) – If true, includes an intercept in the model, if False, the intercept is set to 0.

  • random_state (int, np.random.RandomState instance, or None) – Random number seed or state.

Returns

  • X (ndarray, shape (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – The response vector.

  • beta (ndarray, shape (n_features,)) – The feature coefficients.

  • intercept (float) – The intercept. If include_intercept is False, then intercept is zero.

pyuoi.datasets.make_poisson_regression(n_samples=100, n_features=5, n_informative=2, X_loc=0.0, X_scale=0.125, beta=None, beta_shape=1.0, beta_scale=3.0, include_intercept=False, random_state=None)[source]

Make a Poisson regression dataset.

Parameters
  • n_samples (int) – The number of samples to make.

  • n_features (int) – The number of feature to use.

  • n_informative (int) – The number of feature with non-zero weights.

  • X_loc (float) – The mean of the features in the design matrix.

  • X_scale (float) – The standard deviation of the features in the design matrix.

  • beta (np.ndarray or None) – The beta values to use. If None, beta values will be drawn from a gamma distribution.

  • beta_shape (float) – The shape parameter for the beta values.

  • beta_scale (float) – The scale parameter for the beta values.

  • include_intercept (bool) – If true, includes an intercept in the model, if False, the intercept is set to 0.

  • random_state (int, np.random.RandomState instance, or None) – Random number seed or state.

Returns

  • X (ndarray, shape (n_samples, n_features)) – The design matrix.

  • y (ndarray, shape (n_samples,)) – The response vector.

  • beta (ndarray, shape (n_features,)) – The feature coefficients.

  • intercept (float) – The intercept. If include_intercept is False, then intercept is zero.