Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: for the plan.

Jump to: navigation, search

TAU profiling in ICE


Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is

TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.

TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:

General list of steps for TAU profiling in ICE

Installation of TAU on host system (platform/architecture specific)

TAU binaries and wrappers are required on the specific hosts where the code to be profiling is going to be run and tested. This step is host (or at least architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation is also needed for the platform where profiling information will be collected. Installation of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however, it is best if it is installed by the system administrator.

Input for this step: TAU source code from TAU web-site (
Output from this step: tau_exec and other wrappers for compilers

Code recompilation with TAU wrappers

The code to be profiled needs to be recompiled and instead of using the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU wrappers (made available from Step 2.1) are used. These are typically of the type,,, and etc. This step requires changing makefile(s) and or changing environments at the time of ./configure.

Code developers and experienced code users should be able to perform this step independently. However, some help will be required for most end users from the system administrator to recompile the code with the correct modifications to the makefiles and configure step. As mentioned above, this involves specifying the path to TAU libraries and machine file, and setting some environment variables. See TAU documentation for more details.

As a result, re-compilation with TAU wrappers will auto-instrument the code and compile the binaries requires generation of the profiles at run time.

Input for this step: TAU installation (Step 2.1), source code for the application.
Output from this step: Instrumented binary for the application.

Run TAU-compiled code binaries to collect profiles

This step can be performed independently by the end user. The major requirement is the TAU generated code executables/binaries from last step (these binaries are different as they have been auto-instrumented for profiling data collection). The tests (either standard for application or special ones designed to profiling data) can then be used on the host architecture for collection of the profiling data.

The profiling data can be collected for a variety of probes by specifying the correct binary and flag at test run-time.

       -io                     Track I/O
       -memory                 Track memory allocation/deallocation
       -memory_debug           Enable memory debugger
       -cuda                   Track GPU events via CUDA
       -cupti                  Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
       -opencl                 Track GPU events via OpenCL
       -openacc                Track GPU events via OpenACC (currently PGI only)
       -ompt                   Track OpenMP events via OMPT interface
       -armci                  Track ARMCI events via PARMCI
       -ebs                    Enable event-based sampling
       -ebs_period=<count>     Sampling period (default 1000)
       -ebs_source=<counter>   Counter (default itimer)
       -um                     Enable Unified Memory events via CUPTI
       -loadlib=<>                : Specify additional load library
       -XrunTAUsh-<options>              : Specify TAU library directly
       -gdb                    Run program in the gdb debugger

Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.

Analysis and visualization of the profiling results

The profiling results from the previous step of running the test are collected in profiles files (such as profile.0.0.0). These results can be analyzed and characterizing the by the end-user using the paraprof and other TAU tools.

ICE specific notes

Paraprof and other visualization tools will be included with the ability for user to specify TAU installation directory and host architecture. Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.

Paraprof is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. profile.0.0.0 is independent from others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.

Back to the top