MPI-Based Block Vectors and Matrices

PyNumero’s MPI-based block vectors and matrices (MPIBlockVector and MPIBlockMatrix) behave very similarly to BlockVector and BlockMatrix. The primary difference is in construction. With MPIBlockVector and MPIBlockMatrix, each block is owned by either a single process/rank or all processes/ranks.

Consider the following example (in a file called “parallel_vector_ops.py”).

import numpy as np
from mpi4py import MPI
from pyomo.contrib.pynumero.sparse.mpi_block_vector import MPIBlockVector


def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    
    owners = [2, 0, 1, -1]
    x = MPIBlockVector(4, rank_owner=owners, mpi_comm=comm)
    x.set_block(owners.index(rank), np.ones(3)*(rank + 1))
    x.set_block(3, np.array([1, 2, 3]))
    
    y = MPIBlockVector(4, rank_owner=owners, mpi_comm=comm)
    y.set_block(owners.index(rank), np.ones(3)*(rank + 1))
    y.set_block(3, np.array([1, 2, 3]))
    
    z1: MPIBlockVector = x + y  # add x and y
    z2 = x.dot(y)  # dot product
    z3 = np.abs(x).max()  # infinity norm
    
    z1_local = z1.make_local_copy()
    if rank == 0:
        print(z1_local.flatten())
        print(z2)
        print(z3)

    return z1_local, z2, z3


if __name__ == '__main__':
    main()

This example can be run with

mpirun -np 3 python -m mpi4py parallel_vector_ops.py

The output is

[6. 6. 6. 2. 2. 2. 4. 4. 4. 2. 4. 6.]
56.0
3

Note that the make_local_copy() method is not efficient and should only be used for debugging.

The -1 in owners means that the block at that index (index 3 in this example) is owned by all processes. The non-negative integer values indicate that the block at that index is owned by the process with rank equal to the value. In this example, rank 0 owns block 1, rank 1 owns block 2, and rank 2 owns block 0. Block 3 is owned by all ranks. Note that blocks should only be set if the process/rank owns that block.

The operations performed with MPIBlockVector are identical to the same operations peformed with BlockVector (or even NumPy arrays), except that the operations are now performed in parallel.

MPIBlockMatrix construction is very similar. Consider the following example in a file called “parallel_matvec.py”.

import numpy as np
from mpi4py import MPI
from pyomo.contrib.pynumero.sparse.mpi_block_vector import MPIBlockVector
from pyomo.contrib.pynumero.sparse.mpi_block_matrix import MPIBlockMatrix
from scipy.sparse import random


def main():
    comm = MPI.COMM_WORLD
    rank = comm.Get_rank()
    
    owners = [0, 1, 2, -1]
    x = MPIBlockVector(4, rank_owner=owners, mpi_comm=comm)
    
    owners = np.array([[ 0, -1, -1, 0],
                       [-1,  1, -1, 1],
                       [-1, -1,  2, 2]])
    a = MPIBlockMatrix(3, 4, rank_ownership=owners, mpi_comm=comm)
    
    np.random.seed(0)
    x.set_block(3, np.random.uniform(-10, 10, size=10))
    
    np.random.seed(rank)
    x.set_block(rank, np.random.uniform(-10, 10, size=10))
    a.set_block(rank, rank, random(10, 10, density=0.1))
    a.set_block(rank, 3, random(10, 10, density=0.1))
    
    b = a * x  # parallel matrix-vector dot product
    
    # check the answer
    local_x = x.make_local_copy().flatten()
    local_a = a.to_local_array()
    local_b = b.make_local_copy().flatten()
    
    err = np.abs(local_a.dot(local_x) - local_b).max()

    if rank == 0:
        print('error: ', err)

    return err


if __name__ == '__main__':
    main()

Which can be run with

mpirun -np 3 python -m mpi4py parallel_matvec.py

The output is

error:  4.440892098500626e-16

The most difficult part of using MPIBlockVector and MPIBlockMatrix is determining the best structure and rank ownership to maximize parallel efficiency.