v2606: New and improved parallel operation
In this release, procFaces has been extended to accept an optional nMasters (or nProcessorsPerMaster) setting. This enables similar behaviour to masterCoarsest by agglomerating processors at the coarsest level until only nMasters or fewer remain. With nMasters 1, the behaviour is identical to masterCoarsest since no inter-processor boundaries will remain.
Effect
Running the simpleFoam pitzDaily tutorial with the following configuration:
p
{
solver GAMG;
..
processorAgglomerator procFaces;
nMasters 4;
}With the debug switch enabled:
DebugSwitches
{
GAMGAgglomeration 1;
} nCells nFaces/nCells nInterfaces nIntFaces/nCells profile
Level nProcs avg max avg max avg max avg max avg
----- ------ --- --- --- --- --- --- --- --- ---
0 16 764 770 1.926 1.927 3.5 5 0.1025 0.1299 2.044e+04
1 16 381 385 1.964 2.143 3.5 5 0.1686 0.2158 8367
2 16 189 192 2.292 2.774 3.5 5 0.2689 0.349 3062
3 16 93 96 2.321 2.645 3.5 5 0.4439 0.5532 1028
4 16 45 47 2.33 2.489 3.5 5 0.6251 0.8043 362
5 4 89 90 2.497 2.596 1.5 2 0.2982 0.4432 902.5
6 4 41 43 2.298 2.429 1.5 2 0.4132 0.65 272.2Master-coarsest with compact masters
In this release, masters can be forced to be allocated compactly using the new compactMasters option:
p
{
solver GAMG;
..
processorAgglomerator masterCoarsest;
nMasters 4;
compactMasters true;
}In a case with 16 processors and 4 masters, the default master allocation places masters on processors 0, 4, 8, and 12 (visible with the masterCoarsest debug switch):
master nProcs procIDs
0 4 (1 2 3)
4 4 (5 6 7)
8 4 (9 10 11)
12 4 (13 14 15)With compactMasters enabled, masters are assigned to the lowest-numbered processors:
master nProcs procIDs
0 4 (4 5 6)
1 4 (7 8 9)
2 4 (10 11 12)
3 4 (13 14 15)Effect
Without compact masters:
nCells nFaces/nCells nInterfaces nIntFaces/nCells profile
Level nProcs avg max avg max avg max avg max avg
----- ------ --- --- --- --- --- --- --- --- ---
0 16 764 770 1.926 1.927 3.5 5 0.1025 0.1299 2.044e+04
1 16 381 385 1.964 2.143 3.5 5 0.1686 0.2158 8367
2 16 189 192 2.292 2.774 3.5 5 0.2689 0.349 3062
3 16 93 96 2.321 2.645 3.5 5 0.4439 0.5532 1028
4 16 45 47 2.33 2.489 3.5 5 0.6251 0.8043 362
5 4 89 90 2.536 2.596 1.5 2 0.2191 0.2889 949.5
6 4 41 43 2.36 2.429 1.5 2 0.2897 0.4 291.2With compact masters:
nCells nFaces/nCells nInterfaces nIntFaces/nCells profile
Level nProcs avg max avg max avg max avg max avg
----- ------ --- --- --- --- --- --- --- --- ---
0 16 764 770 1.926 1.927 3.5 5 0.1025 0.1299 2.044e+04
1 16 381 385 1.964 2.143 3.5 5 0.1686 0.2158 8367
2 16 189 192 2.292 2.774 3.5 5 0.2689 0.349 3062
3 16 93 96 2.321 2.645 3.5 5 0.4439 0.5532 1028
4 16 45 47 2.33 2.489 3.5 5 0.6251 0.8043 362
5 4 89 90 2.319 2.433 3 3 0.653 0.8621 599.8
6 4 41 43 2.029 2.116 3 3 0.9538 1.268 172.5The compact masters configuration results in more inter-processor boundaries and a higher profile. For larger agglomerations, e.g. 64 cores, this effect diminishes. However, there is no guarantee that equal-sized clusters are produced; on large decompositions, clusters of unequal size may still create bottlenecks.
Source code
Tutorial
Merge request
The Pstream interface to MPI includes several improvements in this release:
probeMessage(): simplified parameters for a regularMPI_Probe.probeMessages(): probe and return sizes from multiple sources simultaneously.
The IPstream (input/receive stream) can now be called with a “receive from any” mode, and its receive buffer can also be released. This enables algorithms to use IPstream to probe and receive, then recover the buffer for forwarding or storage with delayed deserialisation.
Additional align, tell, seek, and other methods on the input and output Pstreams allow them to be used for composite output with rewriting, enabling more flexible aggregated data handling.
Reduced MPI overhead in field function objects
The function objects fieldExtent, fieldMinMax, and fieldStatistics have been updated to significantly reduce the overall number of MPI operations and lower memory overhead. Although these function objects are not in critical code paths, they were selected as initial candidates for assessing the types of hidden overheads that remain in the code. The key improvements are:
- the intermediate volume field required for the
mag()operation is avoided in most cases - bounding box reductions are now bundled together, resulting in exactly two MPI reductions instead of
2*(nPatches+1) fieldMinMaxandfieldStatisticsnow use a singleMPI_AllGatherinstead of six separate ones
Further work in this area is expected as these types of hidden overheads become more noticeable with increasing core counts.
Specialised reduction
A specialised Foam::reduce for MinMax with sumOp has been added. This passes directly to the corresponding MPI reductions without any intermediate tree communication or serialisation/deserialisation overhead.

