|
Using IPM
Profiling your code with IPM
Note: If you are working on NSF teragrid machines, please consalt this NSF quickstart guide in addition to the information below.
IPM can be using in one of two modes either statically or dynamically:
Static Usage - in this case the users code needs to be relinked:
mpicc my_mpi_code.c -L/path/to/ipm/lib -lipm
Dynamic Usage - no code recompilation needed:
csh syntax
setenv LD_PRELOAD /path/to/ipm/lib/libipm.so
mpirun ./a.out
unsetenv LD_PRELOAD
bash syntax
LD_PRELOAD=/path/to/ipm/lib/libipm.so mpirun ./a.out
IPM is controlled via environment variables and through MPI_Pcontrol.
Environment Variables
Variable | Values | Description |
IPM_REPORT | terse |
(default) Aggregate wallclock time, memory usage and
flops are reported along with the percentage of wallclock time spent
in MPI calls. |
| full |
Each HPM counter is reported as are all of wallclock,
user, system, and MPI time. The contribution of each MPI call to the
communication time is given. |
| none |
No report |
IPM_MPI_THRESHOLD | 0.0 < x < 1.0 |
Only report MPI routines using more than x% of the total
MPI time. |
IPM_HPM | 1,2,3,4,scan |
POWER3 allows four different event sets. Use this
environment variable to pick the event set or select scan to
use different event sets on different tasks. Using the scan
option allows greater coverage of the HPM counters but for
codes with load imbalance or MPMD models uniform sampling may
be more accurate. The scan option extrapolates the to full
totals based on the sampled event sets. |
MPI_Pcontrol
The first argument to MPI_Pcontrol determines what action will be taken by IPM.
Arguments | Description |
1,"label" | start code region "label" |
-1,"label" | exit code region "label" |
0,"label" | invoke custom event "label" |
Code Regions
Defining code regions and events:
C FORTRAN
MPI_Pcontrol( 1,"proc_a"); call mpi_pcontrol( 1,"proc_a"//char(0))
MPI_Pcontrol(-1,"proc_a"); call mpi_pcontrol(-1,"proc_a"//char(0))
MPI_Pcontrol( 0,"tag_a"); call mpi_pcontrol(0,"tag_a"//char(0))
MPI_Pcontrol( 0,"tag_a"); call mpi_pcontrol(0,"tag_a"//char(0))
( fortran label strings must be null terminated )
Post-processing IPM output
By default IPM produces a summary of the performance information for
the application on stdout. IPM also generates an XML file that can be
used to generate a graphical webpage. This can be produced one of two
ways:
- Generation of the webpage on the cluster where IPM ran and then
ftp the html to a local site
- build ploticus for the cluster head node. It is available here
http://ploticus.sourceforge.net
- setenv IPM_KEYFILE /path/to/ipm/ipm_key
- /path/to/ipm/bin/ipm_parse -html xmlfile
This will generate a directory named something like
a.out_1_nwright.1231369287.321103.0_ipm_unknown
tar up that dir, ftp it to your laptop, untar and look at index.html.
- Move the IPM xml file locally and generate the html on your laptop/desktop.
The IPM xml file will be named something like
your_username.1231369287.321103.0 eg. nwright.1231369287.321103.0
- install ploticus on your local machine http://ploticus.sourceforge.net
- (note you can do this under cygwin on windows.)
- put a copy of IPM your local machine - no need to compile you just
need access to the ipm_parse script and keyfile
- setenv IPM_KEYFILE /path/to/ipm/ipm_key
- /path/to/ipm/bin/ipm_parse -html
- you will be left with a directory with a index.html file you can open
with your favorite browser.
Using Hardware Performance Counters with IPM
IPM provides a method of collecting data from hardware performance counters, using either the PAPI (or on AIX systems PMAPI interface).
Within IPM several default are defined for each type of processor. These are listed below and are accessed by setting the IPM_HPM environment variable to the number corresponding to the desired group. In addition the user can also choose their own set of counters, by setting IPM_HPM to a comma separated list of the desired measurements.
setenv IPM_HPM PAPI_FP_OPS,PAPI_TOT_INS,PAPI_L1_DCM,PAPI_L1_DCA
In this case the user is responsible for choosing a valid group of counters. (As defined by papi_avail for example.)
The detailed hardware performance counter settings for various platforms can be found in this file.
|