COLLATED IO CONTAINERS INTRINISICS

v2512: New and improved parallel operation

Collated-IO

TOP

The code for handling the collated format has been updated and streamlined. The changes aim to reduce memory overhead for transfers and reduce some of the parallel coordination (synchronization points) between ranks. For non-collated master-only writing, it now uses a polling dispatch to write file content it becomes available.

In addition to the original collated writing, this version also includes the option to use an MPI-IO backend for the collated format. The files created are fully backward-compatible with existing and previous versions of the format, i.e., they can create collated files with MPI-IO and read them with older versions of OpenFOAM.

For single-use, users can select via the collated backend optimization switch, e.g.

mpirun -np NN redistributePar
    -decompose -parallel
    -fileHandler collated -opt-switch collated.backend=1

The corresponding entry in the etc/controlDict can be adjusted for persistent use.

OffsetRange and GlobalOffset

TOP

The new OffsetRange and globalOffset containers help simplify the problem of global addressing with minimal of memory and communication overhead. Both of these are simple, lightweight collections of start/size/total information, but provide the semantics of what are termed slab or hyperslab addressing.

An example usage:

globalOffset cellSlab = mesh.nCells();
globalOffset pointSlab = mesh.nPoints();
globalOffset faceSlab = mesh.nFaces();

Foam::reduceOffsets(comm, cellSlab, pointSlab, faceSlab);

Info<< "nCells:" << cellSlab.total() << nl;

// OR

List<globalOffset> slabs = ...;

Foam::reduceOffsets(comm, slabs);

Source code

MPI Intrinsics

TOP

This version includes several small, but useful changes to the Pstream library (the intermediate interface to MPI that allows separation of the high-level code from a specific MPI version and/or MPI vendor).

Scan and Exscan

The Pstream classes now have an interface to the MPI_Scan and MPI_Exscan (exclusive scan) functions, which were previously not covered. They were not particularly missed until now, but the exclusive scan function (with summation) provides a particularly useful and scalable method for generating globally consistent offsets, with a lower communication overhead than the globalIndex currently requires.

Broadcast

This version extends the broadcast functionality to support broadcasting primitive data with any rank acting as the root. This allows a coordinated algorithm where different processor ranks can take charge and distribute data.

find_first, find_last

The new UPstream::find_first() and UPstream::find_last() methods provide a convenient and low-cost means of coordinating control between different processor ranks. An example way to use it,

// Find the first rank with a valid reference point

label refCelli = mesh.findCell(<point>, ...);

label ranking = UPstream::find_first((refCelli >= 0), communicator);

if (ranking < 0)
{
    // Nobody found it - try something else
    ...
}

// Lowest rank takes the lead
if (ranking != UPstream::myProcNo(communicator))
{
    refCelli = -1;
}

This example could also be extended to decide on broadcasting information to other ranks.

reduceOffset, reduceOffsets

The new Foam::reduceOffset() and Foam::reduceOffsets() functions address the problem of defining globally-consistent offset ranges while minimizing memory and communication overhead. They leverage the newly added Exscan function to determine the rank-specific offsets, and combine that with the updated broadcast method to communicate the total size to everyone.

Compared to the existing globalIndex, this new approach replaces the MPI Allgather communication pattern with Exscan and broadcast for improved scalability. Multiple collections of offsets can also be calculated en mass with the reduceOffsets() function. This function combines all the values together for communication, which means it still requires only exactly two MPI function calls.