TAU profiling in ICE
- 1 Introduction
- 2 General list of steps for TAU profiling in ICE
- 3 ICE specific notes
Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is https://www.cs.uoregon.edu/research/tau/home.php
TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.
TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:
- Vampir (https://www.vampir.eu/)
- Paraver (http://www.bsc.es/computer-sciences/performance-tools/paraver)
- JumpShot (http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/) .
General list of steps for TAU profiling in ICE
Installation of TAU on host system (platform/architecture specific)
TAU binaries and wrappers are required on the specific hosts where the
code to be profiling is going to be run and tested. This step is host (or at least
architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation
is also needed for the platform where profiling information will be collected. Installation
of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the
code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however,
it is best if it is installed by the system administrator.
Input for this step: TAU source code from TAU web-site (https://www.cs.uoregon.edu/research/tau/downloads.php)
Output from this step: tau_exec and other wrappers for compilers
Code recompilation with TAU wrappers
The code to be profiled needs to be recompiled and instead of using
the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU
wrappers (made available from Step 2.1) are used. These are typically
of the type tau_cc.sh, tau_cxx.sh, tau_f90.sh, and tau_f77.sh etc.
This step requires changing makefile(s) and or changing environments
at the time of ./configure.
Code developers and experienced code users should be able to perform
this step independently. However, some help will be required for most
end users from the system administrator to recompile the code with the correct
modifications to the makefiles and configure step. As mentioned above,
this involves specifying the path to TAU libraries and machine file,
and setting some environment variables. See TAU documentation for more
As a result, re-compilation with TAU wrappers will auto-instrument the code and compile
the binaries requires generation of the profiles at run time.
Input for this step: TAU installation (Step 2.1), source code for the
Output from this step: Instrumented binary for the application.
Run TAU-compiled code binaries to collect profiles
This step can be performed independently by the end user. The major requirement is the TAU generated code executables/binaries from last step (these binaries are different as they have been auto-instrumented for profiling data collection). The tests (either standard for application or special ones designed to profiling data) can then be used on the host architecture for collection of the profiling data.
The profiling data can be collected for a variety of probes by specifying the correct binary and flag at test run-time.
-io Track I/O -memory Track memory allocation/deallocation -memory_debug Enable memory debugger -cuda Track GPU events via CUDA -cupti Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API) -opencl Track GPU events via OpenCL -openacc Track GPU events via OpenACC (currently PGI only) -ompt Track OpenMP events via OMPT interface -armci Track ARMCI events via PARMCI -ebs Enable event-based sampling -ebs_period=<count> Sampling period (default 1000) -ebs_source=<counter> Counter (default itimer) -um Enable Unified Memory events via CUPTI -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags -loadlib=<file.so> : Specify additional load library -XrunTAUsh-<options> : Specify TAU library directly -gdb Run program in the gdb debugger
Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.
Analysis and visualization of the profiling results
The profiling results from the previous step of running the test are collected in profiles files (such as profile.0.0.0). These results can be analyzed and characterizing the by the end-user using the paraprof and other TAU tools.
ICE specific notes
Paraprof and other visualization tools will be included with the ability for user to specify TAU installation directory and host architecture. Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.
Paraprof is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. profile.0.0.0 is independent from others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.