OpenFOAM® v1712: New and improved parallel operation

OpenFOAM® v1712: New and improved parallel operation

31/12/2017

New collated file input and output

OpenFOAM’s input and output (I/O) system has received an update that provides users with the choice of how to read and write OpenFOAM data files. All I/O is now redirected through an object called fileHandler. This file handler can be selected through a standard run-time selection model. This release ships with three file handlers which differ in how they handle parallel operation:

  • uncollated : This is the normal behaviour where in parallel every processor writes its own processorXXX directory.
  • masterUncollated : Special version of uncollated that performs all I/O on the master, and therefore does not require NFS (but is slower).
  • collated : This writes all processors into a single file. This drastically reduce the number of files generated for parallel runs. For each output time a single field file is assembled, as opposed to all fields being written per processor, e.g. Looking at the $FOAM_CASE directory, instead of seeing directories processor0 processorN, a single processors directory is written.

Note that the available fileHandlers can be extended via user libraries loaded through the libs entry in the system/controlDict, and then enabled by overriding the fileHandler in the OptimisationSwitches - see below.

The fileHandler can be selected (in order of precedence):

  • through the environment variable FOAM_FILEHANDLER
  • as a setting in the OptimisationSwitches dictionary located in the $WM_PROJECT_DIR/etc/controlDict file, or the per-user override in the  /.OpenFOAM/v1712/controlDict file or per-case override in the $FOAM_CASE/system/controlDict file

    OptimisationSwitches
    {
        //- Parallel IO file handler
        //  uncollated (default), collated or masterUncollated
        fileHandler uncollated;
    }
  • as a command line option -fileHandler:

    decomposePar -fileHandler collated

From a user perspective, the collated option may be beneficial if there are too many files; note that collated operation is generally no faster than using NFS. To activate the collated file handler, two additional entries should be updated in the OptimisationSwitches dictionary:

  • maxThreadFileBufferSize must be set to a non-zero value, defined as the size in bytes of the buffer. If the buffer is too small to receive all of the data it will be received in the write thread, which requires the MPI to have full thread support.

    If the value remains at zero no threading is used for the writing which can lead to a severe bottleneck:

    OptimisationSwitches
    {
        //- collated: thread buffer size for queued file writes.
        //  If set to 0 or not sufficient for the file size, threading is not used.
        //  Default: 1e9
        maxThreadFileBufferSize 1e9; // v1712 default is 0;
    }
  • maxMasterFileBufferSize is a buffer size in bytes. Ideally this should be set large enough to receive all of the file data to avoid switching to a blocking/scheduled transfer method:

    OptimisationSwitches
    {
        //- collated | masterUncollated: non-blocking buffer size.
        //  If the file exceeds this buffer size scheduled transfer is used.
        //  Default: 1e9
        maxMasterFileBufferSize 1e9;
    }
  • Note that the buffer size cannot exceed 2Gb due to an MPI limit

Note that for collated operation:

  • the binary format should be used; ascii is supported but produces larger files. Note that writeCompression is not beneficial with binary files and not supported for e.g.decomposePar
  • MPI is initialised using full thread support i.e. enabling MPI communication in any thread. However, communication from a spawned thread is only used if the data size is larger than the maxThreadFileBufferSize - see above. We have observed problems using this with various versions of openmpi (1.10, 2.1, 3.0) when using Infiniband (IB). However, TCP is generally robust.
  • collated can read the uncollated and collated formats but will write collated format.
  • files can be converted using the foamFormatConvert utility e.g. to convert mesh and fields in collated format back to uncollated format:

    mpirun -np 4 foamFormatConvert -fileHandler uncollated -parallel
  • ascii collated files look editable but are not! They are a binary dump of one (ascii or binary) buffer per processor.
  • all regIOobjects that implement streaming, i.e. implement readData/writeData are supported.

Source code
$FOAM_SRC/OpenFOAM/global/fileOperations
Examples
$FOAM_TUTORIALS/IO/fileHandler

Attribution
This development was led by Mattijs Janssens - see commit 85f12f
Integration
This functionality is still evolving and will continue to be developed in future releases

Updated decomposePar utility

Updates to the decomposePar utility provide greater control over the decomposition for parallel runs, particularly for multi-region cases. In earlier releases, all regions were decomposed into the same number of sub-domains, whereby if one region was much smaller e.g. for surface film, conjugate heat transfer etc. this strategy could prove inefficient. The updates provides:

  • region-by-region decomposition method;
  • region-by-region specification for the number of processor domains; and
  • simplified and more consistent specification of the decomposition model coefficients.

numberOfSubdomains 2048;
method metis;

regions
{
    heater
    {
        numberOfSubdomains 2;
        method      hierarchical;
        coeffs
        {
            n           (2 1 1);
        }
    }

    "*.solid"
    {
        numberOfSubdomains 16;
        method      scotch;
    }
}

Note that the top-level numberOfSubdomains entry remains mandatory, since this specifies the number of domains for the entire simulation. The individual regions may use the same number or fewer domains. The numberOfSubdomains entry within a region specification is only needed if the value differs

Source code
$FOAM_SRC/parallel/
Examples
$FOAM_TUTORIALS/heatTransfer/chtMultiRegionFoam/snappyMultiRegionHeater

Improved multi-level decomposition

The multiLevel decomposition method provides a general means of successively decomposing with different methods. Each appplication of the decomposition is termed a level. For example,

multiLevelCoeffs
{
    nodes
    {
        numberOfSubdomains 128;
        method      hierarchical;
        coeffs
        {
            n (16 4 2);
        }
    }

    cpus
    {
        numberOfSubdomains 2;
        method      scotch;
    }

    cores
    {
        numberOfSubdomains 8;
        method      metis;
    }
}

For cases where the same method is applied at each level, this can also be conveniently written in a much shorter form:

numberOfSubdomains 2048;
method      multiLevel;
multiLevelCoeffs
{
    method      scotch
    domains     (128 2 8);
}

When the specified domains is smaller than numberOfSubdomains but can be resolved as an integral multiple, this integral multiple is used as the first level. This can make it easier to manage when changing the number of domains for the simulation

numberOfSubdomains 1024;
method      multiLevel;
multiLevelCoeffs
{
    method      scotch
    domains     (2 8); //< inferred as domains (64 2 8);
}

Source code
$FOAM_SRC/parallel/decompose/decompositionMethods/multiLevelDecomp

New interface to the KaHIP decomposition library

This release includes a new domain decomposition interface to the KaHIP (Karlsruhe High Quality Partitioning) library. The KaHIP sources are included in the ThirdParty pack. The method can be selected using, e.g.:

numberOfSubdomains   N;
method               kahip;
kahipCoeffs
{
    config          fast; // fast | eco | strong
    imbalance       0.01;
}

Source code
$FOAM_SRC/parallel/decompose/kahipDecomp

New parallel processing support

Additional improvements to the parallel processing include:

  • Improved reporting of slave processes: the default behaviour is to report per-host-cpu usage, e.g.

    nProcs : 6
    Hosts  :
    (
        (muttley17 6)
    )
    Note: to use the verbose format including PID information, set the InfoSwitch in the $WM_PROJECT_DIR/etc/controlDict file:

    InfoSwitches
    {
        ...
        // Report hosts used (parallel)
        // - 0 = none
        // - 1 = per-host-count, but unsorted
        // - 2 = long output of "slave.pid"
        writeHosts  2;
        ...
    }
  • MPI configuration for CRAY-MPICH and better handling of both INTELMPI and CRAY-MPICH when building ptscotch.
  • Support linking metis, scotch static libraries, which helps with build systems such as EasyBuild.
  • Fixed configuration problem finding scotch headers on ubuntu
  • third-party clang now works with openmp
  • mirrorMesh (see Issue #587 ) and refineMesh now operate in parallel