OpenFOAM® v1906: New and improved parallel operation

OpenFOAM® v1906: New and improved parallel operation


New random decomposition method

The new random decomposition method puts each cell on a randomly selected processor. It produces an extremely bad decomposition that is useful for testing purposes, e.g. to stress-test parallel implementations, making it easier to detect parallel inconsistencies.

The new method is set in the system/decomposeParDict dictionary:

//- The total number of domains (mandatory)
numberOfSubdomains  2;

//- The decomposition method (mandatory)
method          random;

and can be visualised by e.g. running decomposePar -cellDist and post-process the resulting cellDist field. The following image shows the decomposition for the lid-driven cavity tutorial with 5 processors, coloured by destination processor number:


Extended distributed triangulated surface functionality

Triangulated surfaces are employed to describe geometry, used for input into meshing tools, e.g. snappyHexMesh, and post-processing, e.g. for iso-surfaces. The distributedTriSurfaceMesh variant is suitable for parallel workflows where each processor just holds a subset of triangles.


In this release the distributedTriSurfaceMesh has been extended to:

  • no longer require a separate utility surfaceRedistributePar for decomposition
  • allow inside/outside tests and so can be used e.g. to perform volume refinement in snappyHexMesh
  • speed up various queries and made more memory efficient

The distributedTriSurfaceMesh is specified in the geometry section of the snappyHexMeshDict dictionary:

        type distributedTriSurfaceMesh;

This will read either:

  • an already decomposed surface: processorDDD/constant/triSurface/sphere.stl and corresponding processorDDD/constant/triSurface/sphere.stlDict. The latter dictionary contains e.g. the bounding box of the local surface.
  • the undecomposed constant/triSurface/sphere.stl and distribute it; this is the most frequently used operation.

The following images show the difference for a simple test case employing both a triSurfaceMesh and a distributedTriSurfaceMesh:

[Picture] [Picture]


An undecomposed distributedTriSurfaceMesh currently requires the decomposition method in the system/decomposeParDict to be set to hierarchical. This can be automated using the new conditionals in dictionaries:

#ifeq $FOAM_APPLICATION decomposePar
    method scotch;
    method hierarchical;
        n  (5 2 1);

Source code

Easier interfacing to Metis decomposion library

Some Linux distributions including OpenSUSE, and Redhat come with a pre-compiled version of Metis 64 bit OS versions it may have been compiled with 64 bit integers. OpenFOAM can use this directly thanks to the new new automatic integer conversion in OpenFOAM’s Metis interface.

To use a system version of Metis change in

Compile the metis interface:

cd parallel/decompose
Test with any parallel case by setting the method in the system/decomposeParDict to metis. Typical output comprises:

Calculating distribution of cells
Selecting decompositionMethod metis [4]
Metis can be used in a parallel context e.g. using redistributePar and snappyHexMesh but will combine the graph onto the master processor to do determine the distribution.

Source code

New profiling tools

An option has been added to do some basic parallel profiling of parallel runs. This is currently outside the profiling framework and might change in the future. The profiling is selected using the parProfiling function object. A typical set-up would be

    type  parProfiling;

    libs  ("");

    // Report stats on exit only (instead of every time step)
    executeControl  onEnd;
    writeControl    none;
A typical output:

    reduce    : avg = 72.7133s
                min = 2.37s (processor 0)
                max = 88.29s (processor 4)
    all-all   : avg = 14.9633s
                min = 11.04s (processor 5)
                max = 17.18s (processor 4)
The profiling consists of two timings:
  • one related to global reductions, i.e. involving all processors; and
  • one related to processor-processor communications.

The interesting measure here is the amount of time spent waiting for reduces which dominates this calculation. The timings are a basic measure only - the heuristic to derive which category the time is for is not exact - the accuracy of the timings is quite low.