I measured the memory bandwidth of a server using the popular STREAM benchmark tool. Compiled the STREAM code with the working array size set to 500 MB. The number of threads accessing the memory was determined and controlled by setting the environment variable OMP_NUM_THREADS to 1, 2, 5, 10, 20, 30 and 50.
To compile STREAM, I used the following compile command:
gcc -m64 -mcmodel=medium -O -fopenmp stream.c \ -DSTREAM_ARRAY_SIZE=500000000 -DNTIMES=[10-1000] \ -o stream_multi_threaded_500MB_[10-1000]TIMES # Compiled Stream packages # stream_multi_threaded_500MB_10TIMES # stream_multi_threaded_500MB_100TIMES # stream_multi_threaded_500MB_1000TIMES
Above, I compiled multiple versions of STREAM so can see the effect of various iterations from 10 to 1000. Then I created, wrapper bash script for STREAM to execute and collect its output:
#!/bin/bash ################################################# # STREAM Harness to analyze memory bandwidth ################################################# bench_home=/$USER/stream out_home=$bench_home/out bench_exec=stream_multi_threaded_500MB_1000TIMES host=`hostname` echo "Running Test: $bench_exec" # Timer elapsed() { (( seconds = SECONDS )) "$@" (( seconds = SECONDS - seconds )) (( etime_seconds = seconds % 60 )) (( etime_minuts = ( seconds - etime_seconds ) / 60 % 60 )) (( etime_hours = seconds / 3600 )) (( verif = etime_seconds + (etime_minuts * 60) + (etime_hours * 3600) )) echo "Elapsed time: ${etime_hours}h ${etime_minuts}m ${etime_seconds}s" } mem_stream() { for n in 1 2 5 10 20 30 50 do export OMP_NUM_THREADS=$n $bench_home/$bench_exec > $out_home/$host.memory.$n.txt echo "Thread $OMP_NUM_THREADS complete" done } # Main elapsed mem_stream exit 0
Sample result – ADD:
Hope this write-up will help you get on the right path with STREAM.