Skip to main content

Notice: This Wiki is now read only and edits are no longer possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "TAU profiling in ICE"

(Created page with "TAU profiling in ICE ==================== Steps in brief: 1. Installation of TAU on host system (platform specific) 2. Code recompilation with TAU warappers, for auto-instrum...")
 
(Introduction)
 
(24 intermediate revisions by the same user not shown)
Line 1: Line 1:
TAU profiling in ICE
+
== Introduction ==
====================
+
Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is https://www.cs.uoregon.edu/research/tau/home.php
  
Steps in brief:
+
TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.
1. Installation of TAU on host system (platform specific)
+
2. Code recompilation with TAU warappers, for auto-instrumentation
+
3. Run TAU-compiled code binaries to collect profiles
+
4. Anaylsis and visualization of the profile results
+
  
Section 1: Installation of TAU on host system
+
TAU's profile visualization tool, ''paraprof'', provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:
=============================================
+
* ''Vampir'' (https://www.vampir.eu/)
TAU binaries and wrappers are required on specific hosts where the
+
* ''Paraver'' (http://www.bsc.es/computer-sciences/performance-tools/paraver)
code to be profiled is going to be run. This step is host (or atleast
+
* ''JumpShot'' (http://www.mcs.anl.gov/research/projects/perfvis/software/viewers/) .
architecture dependent). TAU can be compiled in user space, however,
+
it is best if it is installed by the system adminstrator.
+
  
Input for this step: TAU source code from TAU web-site
+
== General list of steps for TAU profiling in ICE ==
Out from this step: tau_exec and other wrappers for compilers
+
 
 +
=== Installation of TAU on host system (platform/architecture specific) ===
 +
TAU binaries and wrappers are required on the specific hosts where the
 +
code to be profiling is going to be run and tested. This step is host (or at least
 +
architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation
 +
is also needed for the platform where profiling information will be collected. Installation
 +
of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the
 +
code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however,
 +
it is best if it is installed by the system administrator. <br>
 +
 
 +
'''Input for this step:''' TAU source code from TAU web-site (https://www.cs.uoregon.edu/research/tau/downloads.php)<br>
 +
'''Output from this step:''' tau_exec and other wrappers for compilers<br>
 +
 
 +
=== Code recompilation with TAU wrappers ===
  
Step 2:  Code recompilation with TAU wrappers
 
=============================================
 
 
The code to be profiled needs to be recompiled and instead of using
 
The code to be profiled needs to be recompiled and instead of using
the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU
+
the standard compilers (e.g. ''gcc'', ''g++'', ''mpicc'' etc.) equivalent TAU
wrappers (made available from Step 1) are used. These are typically
+
wrappers (made available from Step 2.1) are used. These are typically
of the type tau_CC.sh etc.
+
of the type ''tau_cc.sh'', ''tau_cxx.sh'', ''tau_f90.sh'', and ''tau_f77.sh'' etc.
 +
This step requires changing makefile(s) and or changing environments
 +
at the time of ''./configure''.<br>
  
 
Code developers and experienced code users should be able to perform
 
Code developers and experienced code users should be able to perform
 
this step independently. However, some help will be required for most
 
this step independently. However, some help will be required for most
end users from the system administrator. This involves specifying the
+
end users from the system administrator to recompile the code with the correct
path to TAU libraries and machine file, and setting some environment
+
modifications to the makefiles and configure step. As mentioned above,
variables.
+
this involves specifying the path to TAU libraries and machine file,  
 +
and setting some environment variables. See TAU documentation for more
 +
details.<br>
  
Compilation with TAU wrappers will auto-instrument the code and compile
+
As a result, re-compilation with TAU wrappers will auto-instrument the code and compile
the binaries requires generation of the profiles at run time.
+
the binaries requires generation of the profiles at run time.<br>
  
Input for this step: TAU installation (Step 1), source code for the
+
'''Input for this step:''' TAU installation (Step 2.1), source code for the
application.
+
application.<br>
Output from this step: Instrumented binary for the application.
+
'''Output from this step:''' Instrumented binary for the application.
  
Step 3: Run TAU-compiled code binaries to collect profiles
+
=== Run TAU-compiled code binaries to collect profiles ===
==========================================================
+
 
This step can be performed indpendently by the end user. The only
+
This step can be performed independently by the end user. The major
requirement is the binaries from last step. The tests (either standard
+
requirement is the TAU generated code executables/binaries from last step (these binaries
for application or special ones) can be used on the host architecture
+
are different as they have been auto-instrumented for profiling data collection). The tests (either standard
for collection of the profiling data.
+
for application or special ones designed to profiling data) can then be used on the host architecture
 +
for collection of the profiling data.  
  
 
The profiling data can be collected for a variety of probes by specifying
 
The profiling data can be collected for a variety of probes by specifying
 
the correct binary and flag at test run-time.
 
the correct binary and flag at test run-time.
         -io           Track I/O
+
         -io                     Track I/O
         -memory       Track memory allocation/deallocation
+
         -memory                 Track memory allocation/deallocation
         -memory_debug Enable memory debugger
+
         -memory_debug           Enable memory debugger
         -cuda         Track GPU events via CUDA
+
         -cuda                   Track GPU events via CUDA
         -cupti       Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
+
         -cupti                 Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
         -opencl       Track GPU events via OpenCL
+
         -opencl                 Track GPU events via OpenCL
         -openacc     Track GPU events via OpenACC (currently PGI only)
+
         -openacc               Track GPU events via OpenACC (currently PGI only)
         -ompt         Track OpenMP events via OMPT interface
+
         -ompt                   Track OpenMP events via OMPT interface
         -armci       Track ARMCI events via PARMCI
+
         -armci                 Track ARMCI events via PARMCI
         -ebs         Enable event-based sampling
+
         -ebs                   Enable event-based sampling
         -ebs_period=<count> Sampling period (default 1000)
+
         -ebs_period=<count>     Sampling period (default 1000)
         -ebs_source=<counter> Counter (default itimer)
+
         -ebs_source=<counter>   Counter (default itimer)
         -um         Enable Unified Memory events via CUPTI
+
         -um                     Enable Unified Memory events via CUPTI
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
 
         -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
         -loadlib=<file.so>   : Specify additional load library
+
         -loadlib=<file.so>               : Specify additional load library
         -XrunTAUsh-<options> : Specify TAU library directly
+
         -XrunTAUsh-<options>             : Specify TAU library directly
         -gdb         Run program in the gdb debugger
+
         -gdb                   Run program in the gdb debugger
 +
 
 +
Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.
 +
 
 +
===  Analysis and visualization of the profiling results ===
  
Step 4. Anaylsis and visualization of the profile results
+
The profiling results from the previous step of running the test are collected in profiles files (such
=========================================================
+
as ''profile.0.0.0''). These results can be analyzed and characterizing the by the end-user using the ''paraprof''
The results from the previous step can be collected in profiles (such
+
and other TAU tools.
as profile.0.0.0) and the results can be analyzed the included paraprof
+
and other tools.
+
  
ICE specific notes:
+
== ICE specific notes ==
===================
+
''Paraprof'' and other visualization tools will be included with the ability
Paraprof and other visualization tools will be included with the ability
+
 
for user to specify TAU installation directory and host architecture.
 
for user to specify TAU installation directory and host architecture.
Instead the TAU instrumented binaries can also be made available to TAU.
+
Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made
The tests can be either run through ICE or results can be directly made
+
available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.
available to ICE at a location.
+
  
ICE will allow the end-user and developers to visually interpret the results.
+
''Paraprof'' is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. ''profile.0.0.0'' is independent from
 +
others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made
 +
as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.

Latest revision as of 20:10, 20 October 2015

Introduction

Tuning and Analysis Utilities (TAU) is a profiling and tracing toolkit for performance analysis of parallel programs. It is developed by jointly by the University of Oregon, Los Alamos National Laboratory, and Research Centre Julich, ZAM, Germany. Its web-site is https://www.cs.uoregon.edu/research/tau/home.php

TAU is capable of gathering performance information through instrumentation of functions, methods, basic blocks, and statements as well as event-based sampling. All C++ language features are supported including templates and namespaces. The API also provides selection of profiling groups for organizing and controlling instrumentation. The instrumentation can be inserted in the source code using an automatic instrumentor tool based on the Program Database Toolkit (PDT), dynamically using DyninstAPI, at runtime in the Java Virtual Machine, or manually using the instrumentation API.

TAU's profile visualization tool, paraprof, provides graphical displays of all the performance analysis results, in aggregate and single node/context/thread forms. The user can quickly identify sources of performance bottlenecks in the application using the graphical interface. In addition, TAU can generate event traces that can be displayed with the following trace tools:

General list of steps for TAU profiling in ICE

Installation of TAU on host system (platform/architecture specific)

TAU binaries and wrappers are required on the specific hosts where the code to be profiling is going to be run and tested. This step is host (or at least architecture dependent, e.g. x86_64). More specifically for parallel runs, the MPI installation is also needed for the platform where profiling information will be collected. Installation of TAU generates scripts (called wrappers) which will enable auto-instrumentation of the code by invoking the local compilers and adding TAU libraries. Note that TAU can be compiled in the user space; however, it is best if it is installed by the system administrator.

Input for this step: TAU source code from TAU web-site (https://www.cs.uoregon.edu/research/tau/downloads.php)
Output from this step: tau_exec and other wrappers for compilers

Code recompilation with TAU wrappers

The code to be profiled needs to be recompiled and instead of using the standard compilers (e.g. gcc, g++, mpicc etc.) equivalent TAU wrappers (made available from Step 2.1) are used. These are typically of the type tau_cc.sh, tau_cxx.sh, tau_f90.sh, and tau_f77.sh etc. This step requires changing makefile(s) and or changing environments at the time of ./configure.

Code developers and experienced code users should be able to perform this step independently. However, some help will be required for most end users from the system administrator to recompile the code with the correct modifications to the makefiles and configure step. As mentioned above, this involves specifying the path to TAU libraries and machine file, and setting some environment variables. See TAU documentation for more details.

As a result, re-compilation with TAU wrappers will auto-instrument the code and compile the binaries requires generation of the profiles at run time.

Input for this step: TAU installation (Step 2.1), source code for the application.
Output from this step: Instrumented binary for the application.

Run TAU-compiled code binaries to collect profiles

This step can be performed independently by the end user. The major requirement is the TAU generated code executables/binaries from last step (these binaries are different as they have been auto-instrumented for profiling data collection). The tests (either standard for application or special ones designed to profiling data) can then be used on the host architecture for collection of the profiling data.

The profiling data can be collected for a variety of probes by specifying the correct binary and flag at test run-time.

       -io                     Track I/O
       -memory                 Track memory allocation/deallocation
       -memory_debug           Enable memory debugger
       -cuda                   Track GPU events via CUDA
       -cupti                  Track GPU events via CUPTI (Also see env. variable TAU_CUPTI_API)
       -opencl                 Track GPU events via OpenCL
       -openacc                Track GPU events via OpenACC (currently PGI only)
       -ompt                   Track OpenMP events via OMPT interface
       -armci                  Track ARMCI events via PARMCI
       -ebs                    Enable event-based sampling
       -ebs_period=<count>     Sampling period (default 1000)
       -ebs_source=<counter>   Counter (default itimer)
       -um                     Enable Unified Memory events via CUPTI
       -T <CUPTI,DISABLE,PROFILE,SERIAL> : Specify TAU tags
       -loadlib=<file.so>                : Specify additional load library
       -XrunTAUsh-<options>              : Specify TAU library directly
       -gdb                    Run program in the gdb debugger

Some additional environment variables settings may be requires during the run-time for correct data collection. Please see TAU documentation for details.

Analysis and visualization of the profiling results

The profiling results from the previous step of running the test are collected in profiles files (such as profile.0.0.0). These results can be analyzed and characterizing the by the end-user using the paraprof and other TAU tools.

ICE specific notes

Paraprof and other visualization tools will be included with the ability for user to specify TAU installation directory and host architecture. Instead the TAU instrumented binaries can also be made available to TAU, directly by the system administrators. The tests can be either run through ICE or results can be directly made available to ICE at a location. ICE will allow the end-user and developers to visually interpret the results, and re-run the tests.

Paraprof is coded in Java and therefore its integration in the ICE code base should be relatively straight-forward. The visualization of the profiling files e.g. profile.0.0.0 is independent from others steps such as TAU, and code recompiling and data collection. Therefore, this step can be made as a separate module. Allowing user to specify their profiling information location and analyzing it independently from all other steps. If alternate files are made available (other architectures and other conditions) they can be compared within ICE framework.

Back to the top