Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "TAU profiling in ICE"

(Code recompilation with TAU wrappers)
(Introduction)
 
(9 intermediate revisions by the same user not shown)
Line 4: Line 4:
 
TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.
 
TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.
  
TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the Vampir, Paraver or JumpShot trace visualization tools.  
+
TAU's profile visualization tool, ''paraprof'', provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:
 +
* ''Vampir'' (https://www.vampir.eu/)
 +
* ''Paraver'' (http://www.bsc.es/computer-sciences/performance-tools/paraver)
 +
* ''JumpShot'' (http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/) .
  
 
== General list of steps for TAU profiling in ICE ==
 
== General list of steps for TAU profiling in ICE ==
  
=== Installation of TAU on host system (Platform/Architecture Specific) ===
+
=== Installation of TAU on host system (platform/architecture specific) ===
 
TAU binaries and wrappers are required on the specific hosts where the
 
TAU binaries and wrappers are required on the specific hosts where the
 
code to be profiling is going to be run and tested. This step is host (or at least
 
code to be profiling is going to be run and tested. This step is host (or at least
Line 23: Line 26:
  
 
The code to be profiled needs to be recompiled and instead of using
 
The code to be profiled needs to be recompiled and instead of using
the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU
+
the standard compilers (e.g. ''gcc'', ''g++'', ''mpicc'' etc.) equivalent TAU
wrappers (made available from Step 1) are used. These are typically
+
wrappers (made available from Step 2.1) are used. These are typically
of the type tau_cc.sh, tau_cxx.sh, tau_f90.sh, and tau_f77.sh etc.
+
of the type ''tau_cc.sh'', ''tau_cxx.sh'', ''tau_f90.sh'', and ''tau_f77.sh'' etc.
 
This step requires changing makefile(s) and or changing environments
 
This step requires changing makefile(s) and or changing environments
 
at the time of ''./configure''.<br>
 
at the time of ''./configure''.<br>
Line 40: Line 43:
 
the binaries requires generation of the profiles at run time.<br>
 
the binaries requires generation of the profiles at run time.<br>
  
'''Input for this step:''' TAU installation (Step 1), source code for the
+
'''Input for this step:''' TAU installation (Step 2.1), source code for the
 
application.<br>
 
application.<br>
 
'''Output from this step:''' Instrumented binary for the application.
 
'''Output from this step:''' Instrumented binary for the application.
Line 46: Line 49:
 
=== Run TAU-compiled code binaries to collect profiles ===
 
=== Run TAU-compiled code binaries to collect profiles ===
  
This step can be performed independently by the end user. The only
+
This step can be performed independently by the end user. The major
requirement is the binaries from last step. The tests (either standard
+
requirement is the TAU generated code executables/binaries from last step (these binaries
for application or special ones) can be used on the host architecture
+
are different as they have been auto-instrumented for profiling data collection). The tests (either standard
for collection of the profiling data.
+
for application or special ones designed to profiling data) can then be used on the host architecture
 +
for collection of the profiling data.  
  
 
The profiling data can be collected for a variety of probes by specifying
 
The profiling data can be collected for a variety of probes by specifying
 
the correct binary and flag at test run-time.
 
the correct binary and flag at test run-time.
         -io           Track I/O
+
         -io                     Track I/O
         -memory       Track memory allocation/deallocation
+
         -memory                 Track memory allocation/deallocation
         -memory_debug Enable memory debugger
+
         -memory_debug           Enable memory debugger
         -cuda         Track GPU events via CUDA
+
         -cuda                   Track GPU events via CUDA
         -cupti       Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
+
         -cupti                 Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
         -opencl       Track GPU events via OpenCL
+
         -opencl                 Track GPU events via OpenCL
         -openacc     Track GPU events via OpenACC (currently PGI only)
+
         -openacc               Track GPU events via OpenACC (currently PGI only)
         -ompt         Track OpenMP events via OMPT interface
+
         -ompt                   Track OpenMP events via OMPT interface
         -armci       Track ARMCI events via PARMCI
+
         -armci                 Track ARMCI events via PARMCI
         -ebs         Enable event-based sampling
+
         -ebs                   Enable event-based sampling
         -ebs_period=<count> Sampling period (default 1000)
+
         -ebs_period=<count>     Sampling period (default 1000)
         -ebs_source=<counter> Counter (default itimer)
+
         -ebs_source=<counter>   Counter (default itimer)
         -um         Enable Unified Memory events via CUPTI
+
         -um                     Enable Unified Memory events via CUPTI
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
         -loadlib=<file.so>   : Specify additional load library
+
         -loadlib=<file.so>               : Specify additional load library
         -XrunTAUsh-<options> : Specify TAU library directly
+
         -XrunTAUsh-<options>             : Specify TAU library directly
         -gdb         Run program in the gdb debugger
+
         -gdb                   Run program in the gdb debugger
  
===  Analysis and visualization of the profile results ===
+
Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.
  
The results from the previous step can be collected in profiles (such
+
===  Analysis and visualization of the profiling results ===
as profile.0.0.0) and the results can be analyzed the included paraprof
+
 
and other tools.
+
The profiling results from the previous step of running the test are collected in profiles files (such
 +
as ''profile.0.0.0''). These results can be analyzed and characterizing the by the end-user using the ''paraprof''
 +
and other TAU tools.
  
 
== ICE specific notes ==
 
== ICE specific notes ==
Paraprof and other visualization tools will be included with the ability
+
''Paraprof'' and other visualization tools will be included with the ability
 
for user to specify TAU installation directory and host architecture.
 
for user to specify TAU installation directory and host architecture.
Instead the TAU instrumented binaries can also be made available to TAU.
+
Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made
The tests can be either run through ICE or results can be directly made
+
available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.
available to ICE at a location.
+
  
ICE will allow the end-user and developers to visually interpret the results.
+
''Paraprof'' is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. ''profile.0.0.0'' is independent from
 +
others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made
 +
as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.

Latest revision as of 20:10, 20 October 2015

Introduction

Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is https://www.cs.uoregon.edu/research/tau/home.php

TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.

TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:

General list of steps for TAU profiling in ICE

Installation of TAU on host system (platform/architecture specific)

TAU binaries and wrappers are required on the specific hosts where the code to be profiling is going to be run and tested. This step is host (or at least architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation is also needed for the platform where profiling information will be collected. Installation of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however, it is best if it is installed by the system administrator.

Input for this step: TAU source code from TAU web-site (https://www.cs.uoregon.edu/research/tau/downloads.php)
Output from this step: tau_exec and other wrappers for compilers

Code recompilation with TAU wrappers

The code to be profiled needs to be recompiled and instead of using the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU wrappers (made available from Step 2.1) are used. These are typically of the type tau_cc.sh, tau_cxx.sh, tau_f90.sh, and tau_f77.sh etc. This step requires changing makefile(s) and or changing environments at the time of ./configure.

Code developers and experienced code users should be able to perform this step independently. However, some help will be required for most end users from the system administrator to recompile the code with the correct modifications to the makefiles and configure step. As mentioned above, this involves specifying the path to TAU libraries and machine file, and setting some environment variables. See TAU documentation for more details.

As a result, re-compilation with TAU wrappers will auto-instrument the code and compile the binaries requires generation of the profiles at run time.

Input for this step: TAU installation (Step 2.1), source code for the application.
Output from this step: Instrumented binary for the application.

Run TAU-compiled code binaries to collect profiles

This step can be performed independently by the end user. The major requirement is the TAU generated code executables/binaries from last step (these binaries are different as they have been auto-instrumented for profiling data collection). The tests (either standard for application or special ones designed to profiling data) can then be used on the host architecture for collection of the profiling data.

The profiling data can be collected for a variety of probes by specifying the correct binary and flag at test run-time.

       -io                     Track I/O
       -memory                 Track memory allocation/deallocation
       -memory_debug           Enable memory debugger
       -cuda                   Track GPU events via CUDA
       -cupti                  Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
       -opencl                 Track GPU events via OpenCL
       -openacc                Track GPU events via OpenACC (currently PGI only)
       -ompt                   Track OpenMP events via OMPT interface
       -armci                  Track ARMCI events via PARMCI
       -ebs                    Enable event-based sampling
       -ebs_period=<count>     Sampling period (default 1000)
       -ebs_source=<counter>   Counter (default itimer)
       -um                     Enable Unified Memory events via CUPTI
       -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
       -loadlib=<file.so>                : Specify additional load library
       -XrunTAUsh-<options>              : Specify TAU library directly
       -gdb                    Run program in the gdb debugger

Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.

Analysis and visualization of the profiling results

The profiling results from the previous step of running the test are collected in profiles files (such as profile.0.0.0). These results can be analyzed and characterizing the by the end-user using the paraprof and other TAU tools.

ICE specific notes

Paraprof and other visualization tools will be included with the ability for user to specify TAU installation directory and host architecture. Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.

Paraprof is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. profile.0.0.0 is independent from others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.

Back to the top