DARPA/DOE HPC Challenge Benchmark version 1.4.2

Piotr Luszczek*

October 12, 2012

1  Introduction

This is a suite of benchmarks that measure performance of processor, memory subsytem, and the interconnect. For details refer to the HPC Challenge web site (http://icl.cs.utk.edu/hpcc/.)

In essence, HPC Challenge consists of a number of tests each of which measures performance of a different aspect of the system.

If you are familiar with the High Performance Linpack (HPL) benchmark code (see the HPL web site: http://www.netlib.org/benchmark/hpl/) then you can reuse the build script file (input for make(1) command) and the input file that you already have for HPL. The HPC Challenge benchmark includes HPL and uses its build script and input files with only slight modifications. The most important change must be done to the line that sets the TOPdir variable. For HPC Challenge, the variable’s value should always be ../../.. regardless of what it was in the HPL build script file.

2  Compiling

The first step is to create a build script file that reflects characteristics of your machine. This file is reused by all the components of the HPC Challenge suite. The build script file should be created in the hpl directory. This directory contains instructions (the files README and INSTALL) on how to create the build script file for your system. The hpl/setup directory contains many examples of build script files. A recommended approach is to copy one of them to the hpl directory and if it doesn’t work then change it.

The build script file has a name that starts with Make. prefix and usally ends with a suffix that identifies the target system. For example, if the suffix chosen for the system is Unix, the file should be named Make.Unix.

To build the benchmark executable (for the system named Unix) type: make arch=Unix. This command should be run in the top directory (not in the hpl directory). It will look in the hpl directory for the build script file and use it to build the benchmark executable.

The runtime behavior of the HPC Challenge source code may be configured at compiled time by defining a few C preprocessor symbols. They can be defined by adding appropriate options to CCNOOPT and CCFLAGS make variables. The former controls options for source code files that need to be compiled without aggressive optimizations to ensure accurate generation of system-specific parameters. The latter applies to the rest of the files that need good compiler optimization for best performance. To define a symbol S, the majority of compilers requires option -DS to be used. Currently, the following options are available in the HPC Challenge source code:

3  Runtime Configuration

The HPC Challenge is driven by a short input file named hpccinf.txt that is almost the same as the input file for HPL (customarily called HPL.dat). Refer to the directory hpl/www/tuning.html for details about the input file for HPL. A sample input file is included with the HPC Challenge distribution.

The differences between HPL’s input file and HPC Challenge’s input file can be summarized as follows:

The additional lines in the HPC Challenge input file (compared to the HPL input file) are:

Just for completeness, here is the list of lines of the HPC Challenge’s input file and brief description of their meaning:

4  Running

The exact way to run the HPC Challenge benchmark depends on the MPI implementation and system details. An example command to run the benchmark could like like this: mpirun -np 4 hpcc. The meaning of the command’s components is as follows:

After the run, a file called hpccoutf.txt is created. It contains results of the benchmark. This file should be uploaded through the web form at the HPC Challenge website.

5  Source Code Changes across Versions (ChangeLog)

5.1  Version 1.4.3 (2013-08-26)

  1. Increased the size of scratch vector for local FFT tests that was missed in the previous version (reported by SGI).
  2. Added Makefile for Blue Gene/P contributed by Vasil Tsanov.

5.2  Version 1.4.2 (2012-10-12)

  1. Increased sizes of scratch vectors for local FFT tests to account for runs on systems with large main memory (reported by IBM, SGI and Intel).
  2. Reduced vector size for local FFT tests due to larger scratch space needed.
  3. Added a type cast to prevent overflow of a 32-bit integer vector size in FFT data generation routine (reported by IBM).
  4. Fixed variable types to handle array sizes that overflow 32-bit integers in RandomAccess (reported by IBM and SGI).
  5. Changed time-bound code to be used by default in Global RandomAccess and allowed for it to be switched off with a compile time flag if necessary.
  6. Code cleanup to allow compilation without warnings of RandomAccess test.
  7. Changed communication code in PTRANS to avoid large message sizes that caused problems in some MPI implementations.
  8. Updated documentation in README.txt and README.html files.

5.3  Version 1.4.1 (2010-06-01)

  1. Added optimized variants of RandomAccess that use Linear Congruential Generator for random number generation.
  2. Made corrections to comments that provide definition of the RandomAccess test.
  3. Removed initialization of the main array from the timed section of optimized versions of RandomAccess.
  4. Fixed the length of the vector used to compute error when using MPI implementation from FFTW.
  5. Added global reduction to error calculation in MPI FFT to achieve more accurate error estimate.
  6. Updated documentation in README.

5.4  Version 1.4.0 (2010-03-26)

  1. Added new variant of RandomAccess that uses Linear Congruential Generator for random number generation.
  2. Rearranged the order of benchmarks so that HPL component runs last and may be aborted if the performance of other components was not satisfactory. RandomAccess is now first to assist in tuning the code.
  3. Added global initialization and finalization routine that allows to properly initialize and finalize external software and hardware components without changing the rest of the HPCC testing harness.
  4. Lack of hpccinf.txt is no longer reported as error but as a warning.

5.5  Version 1.3.2 (2009-03-24)

  1. Fixed memory leaks in G-RandomAccess driver routine.
  2. Made the check for 32-bit vector sizes in G-FFT optional. MKL allows for 64-bit vector sizes in its FFTW wrapper.
  3. Fixed memory bug in single-process FFT.
  4. Update documentation (README).

5.6  Version 1.3.1 (2008-12-09)

  1. Fixed a dead-lock problem in FFT component due to use of wrong communicator.
  2. Fixed the 32-bit random number generator in PTRANS that was using 64-bit routines from HPL.

5.7  Version 1.3.0 (2008-11-13)

  1. Updated HPL component to use HPL 2.0 source code
    1. Replaced 32-bit Pseudo Random Number Generator (PRNG) with a 64-bit one.
    2. Removed 3 numerical checks of the solution residual with a single one.
    3. Added support for 64-bit systems with large memory sizes (before they would overflow during index calculations 32-bit integers.)
  2. Introduced a limit on FFT vector size so they fit in a 32-bit integer (only applicable when using FFTW version 2.)

5.8  Version 1.2.0 (2007-06-25)

  1. Changes in the FFT component:
    1. Added flexibility in choosing vector sizes and processor counts: now the code can do powers of 2, 3, and 5 both sequentially and in parallel tests.
    2. FFTW can now run with ESTIMATE (not just MEASURE) flag: it might produce worse performance results but often reduces time to run the test and cuases less memory fragmentation.
  2. Changes in the DGEMM component:
    1. Added more comprehensive checking of the numerical properties of the test’s results.
  3. Changes in the RandomAccess component:
    1. Removed time-bound functionality: only runs that perform complete computation are now possible.
    2. Made the timing more accurate: main array initialization is not counted towards performance timing.
    3. Cleaned up the code: some non-portable C language constructs have been removed.
    4. Added new algorithms: new algorithms from Sandia based on hypercube network topology can now be chosen at compile time which results on much better performance results on many types of parallel systems.
    5. Fixed potential resource leaks by adding function calls rquired by the MPI standard.
  4. Changes in the HPL component:
    1. Cleaned up reporting of numerics: more accurate printing of scaled residual formula.
  5. Changes in the PTRANS component:
    1. Added randomization of virtual process grids to measure bandwidth of the network more accurately.
  6. Miscellaneous changes:
    1. Added better support for Windows-based clusters by taking advantage of Win32 API.
    2. Added custom memory allocator to deal with memory fragmentation on some systems.
    3. Added better reporting of configuration options in the output file.

5.9  Version 1.0.0 (2005-06-11)

5.10  Version 0.8beta (2004-10-19)

5.11  Version 0.8alpha (2004-10-15)

5.12  Version 0.6beta (2004-08-21)

5.13  Version 0.6alpha (2004-05-31)

5.14  Version 0.5beta (2003-12-01)

5.15  Version 0.4alpha (2003-11-13)

5.16  Version 0.3alpha (2004-11-05)


*
University of Tennessee Knoxville, Innovative Computing Laboratory

This document was translated from LATEX by HEVEA.