Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "TAU profiling in ICE"

(Introduction)
(Run TAU-compiled code binaries to collect profiles)
Line 46: Line 46:
 
=== Run TAU-compiled code binaries to collect profiles ===
 
=== Run TAU-compiled code binaries to collect profiles ===
  
This step can be performed independently by the end user. The only
+
This step can be performed independently by the end user. The major
requirement is the binaries from last step. The tests (either standard
+
requirement is the TAU generated code executables/binaries from last step (these binaries
for application or special ones) can be used on the host architecture
+
are different as they have been auto-instrumented for profiling data collection). The tests (either standard
for collection of the profiling data.
+
for application or special ones designed to profiling data) can then be used on the host architecture
 +
for collection of the profiling data.  
  
 
The profiling data can be collected for a variety of probes by specifying
 
The profiling data can be collected for a variety of probes by specifying
 
the correct binary and flag at test run-time.
 
the correct binary and flag at test run-time.
         -io           Track I/O
+
         -io                     Track I/O
         -memory       Track memory allocation/deallocation
+
         -memory                 Track memory allocation/deallocation
         -memory_debug Enable memory debugger
+
         -memory_debug           Enable memory debugger
         -cuda         Track GPU events via CUDA
+
         -cuda                   Track GPU events via CUDA
         -cupti       Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
+
         -cupti                 Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
         -opencl       Track GPU events via OpenCL
+
         -opencl                 Track GPU events via OpenCL
         -openacc     Track GPU events via OpenACC (currently PGI only)
+
         -openacc               Track GPU events via OpenACC (currently PGI only)
         -ompt         Track OpenMP events via OMPT interface
+
         -ompt                   Track OpenMP events via OMPT interface
         -armci       Track ARMCI events via PARMCI
+
         -armci                 Track ARMCI events via PARMCI
         -ebs         Enable event-based sampling
+
         -ebs                   Enable event-based sampling
         -ebs_period=<count> Sampling period (default 1000)
+
         -ebs_period=<count>     Sampling period (default 1000)
         -ebs_source=<counter> Counter (default itimer)
+
         -ebs_source=<counter>   Counter (default itimer)
         -um         Enable Unified Memory events via CUPTI
+
         -um                     Enable Unified Memory events via CUPTI
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
         -loadlib=<file.so>   : Specify additional load library
+
         -loadlib=<file.so>               : Specify additional load library
         -XrunTAUsh-<options> : Specify TAU library directly
+
         -XrunTAUsh-<options>             : Specify TAU library directly
         -gdb         Run program in the gdb debugger
+
         -gdb                   Run program in the gdb debugger
 +
 
 +
Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.
  
 
===  Analysis and visualization of the profile results ===
 
===  Analysis and visualization of the profile results ===

Revision as of 10:36, 17 October 2015

Introduction

Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is https://www.cs.uoregon.edu/research/tau/home.php

TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.

TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools.

General list of steps for TAU profiling in ICE

Installation of TAU on host system (Platform/Architecture Specific)

TAU binaries and wrappers are required on the specific hosts where the code to be profiling is going to be run and tested. This step is host (or at least architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation is also needed for the platform where profiling information will be collected. Installation of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however, it is best if it is installed by the system administrator.

Input for this step: TAU source code from TAU web-site (https://www.cs.uoregon.edu/research/tau/downloads.php)
Output from this step: tau_exec and other wrappers for compilers

Code recompilation with TAU wrappers

The code to be profiled needs to be recompiled and instead of using the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU wrappers (made available from Step 1) are used. These are typically of the type tau_cc.sh, tau_cxx.sh, tau_f90.sh, and tau_f77.sh etc. This step requires changing makefile(s) and or changing environments at the time of ./configure.

Code developers and experienced code users should be able to perform this step independently. However, some help will be required for most end users from the system administrator to recompile the code with the correct modifications to the makefiles and configure step. As mentioned above, this involves specifying the path to TAU libraries and machine file, and setting some environment variables. See TAU documentation for more details.

As a result, re-compilation with TAU wrappers will auto-instrument the code and compile the binaries requires generation of the profiles at run time.

Input for this step: TAU installation (Step 1), source code for the application.
Output from this step: Instrumented binary for the application.

Run TAU-compiled code binaries to collect profiles

This step can be performed independently by the end user. The major requirement is the TAU generated code executables/binaries from last step (these binaries are different as they have been auto-instrumented for profiling data collection). The tests (either standard for application or special ones designed to profiling data) can then be used on the host architecture for collection of the profiling data.

The profiling data can be collected for a variety of probes by specifying the correct binary and flag at test run-time.

       -io                     Track I/O
       -memory                 Track memory allocation/deallocation
       -memory_debug           Enable memory debugger
       -cuda                   Track GPU events via CUDA
       -cupti                  Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
       -opencl                 Track GPU events via OpenCL
       -openacc                Track GPU events via OpenACC (currently PGI only)
       -ompt                   Track OpenMP events via OMPT interface
       -armci                  Track ARMCI events via PARMCI
       -ebs                    Enable event-based sampling
       -ebs_period=<count>     Sampling period (default 1000)
       -ebs_source=<counter>   Counter (default itimer)
       -um                     Enable Unified Memory events via CUPTI
       -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
       -loadlib=<file.so>                : Specify additional load library
       -XrunTAUsh-<options>              : Specify TAU library directly
       -gdb                    Run program in the gdb debugger

Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.

Analysis and visualization of the profile results

The results from the previous step can be collected in profiles (such as profile.0.0.0) and the results can be analyzed the included paraprof and other tools.

ICE specific notes

Paraprof and other visualization tools will be included with the ability for user to specify TAU installation directory and host architecture. Instead the TAU instrumented binaries can also be made available to TAU. The tests can be either run through ICE or results can be directly made available to ICE at a location.

ICE will allow the end-user and developers to visually interpret the results.

Back to the top