MPI
MPI (Message Passing Interface) is a parallel computing interface that can be
used through the mpi4py
library in Python. Currently, the models in the
linear_model
module can take advantage of MPI parallelism during model
fitting. We assume some familiarity with using mpi4py
here.
During the UoI feature selection step, many models are fit across bootstraps and regularization parameters. These can all potentially be done in parallel using MPI. Similarly, during UoI estimation, many models are fit across bootstraps and supports. These can also be done in parallel.
Using MPI parallelism requires mpi4py
to be installed. In your code, the
two extra things you will need to do to use MPI parallelism is 1) to make sure
the dataset is on all ranks and 2) pass an MPI communicator into the model.
Broadcasting the dataset to all ranks
PyUoI provides helper functions to share data across MPI ranks. The two strategies we support are 1) load the data from a HDF5 file and 2) the user can load the data on a single rank by hand and broadcast the data.
Loading data from an HDF5 file
from pyuoi.mpi_utils import load_data_MPI
# file with keys 'X' and 'y'
h5_file = 'my_file.h5'
X, y = load_data_MPI(h5_file)
Loading data by hand
from mpi4py import MPI
import numpy as np
from pyuoi.mpi_utils import Bcast_from_root
comm = MPI.COMM_WORLD
rank = comm.rank
X = None
y = None
if rank == 0:
# file with keys 'X' and 'y'
data = np.load('my_file.npz')
X = data['X']
y = data['y']
X = Bcast_from_root(X, comm)
y = Bcast_from_root(y, comm)
Fitting with MPI parallelism
Fitting models with MPI parallelism is similar to fitting models with no parallelism.
from mpi4py import MPI
from pyuoi.mpi_utils import load_data_MPI
from pyuoi.linear_model import UoI_Lasso
comm = MPI.COMM_WORLD
rank = comm.rank
# file with keys 'X' and 'y'
h5_file = 'my_file.h5'
X, y = load_data_MPI(h5_file)
model = UoI_Lasso(comm=comm)
model.fit(X, y)
# model will now have fit parameters across all ranks