TPTP-AG-20080229

Date

Stanislav will host a technical discussion on enhancement 209342 (Binary Data Transfer Format for Profiling), focusing on the design and implementation of the enhancement.

New binary format is intended to be a size-optimized alternate to the XML data format currently used by the TPTP Java Profiler to increase scalability and performance.

Data stream descriptor: Basic attributes describing the data stream such as ID, version, encoding, endianness, etc..
Messages: Individual binary messages consisting of:
1. Header: Describes the message including unique ID and message length.
2. Message attributes: Ordered integer, long, double, and null-terminating string attributes describing the message. Each message contains the CPU frequency (CPU ticks) for calculating the time stamp on the client side.

There are currently ~45 message formats following the Java Specification for Java Profiling (e.g. methodEntry and methodExit).

Performance
- CPU time measurement in stand-alone mode have improved performances by only 30% since most of the time is spent for I/O.
- Could implement a buffering or caching strategy but implementation time and complexity is considerably more than the benefit for a peripheral use case.

Capability:
- The handshake algorithm for backward compatibility is outstanding so we are unable to deal with TPTP 4.4 and below Java Profiler Agents.
- Currently defaults to XML for controlled/enabled modes but binary for stand-alone mode.
- For controlled/enabled modes, the Java Profiler will send XML data but if the client responds, the binary format will be used. This passive approach needs to be changed to a handshake algorithm.
- Needs to default to XML for all modes since this is the existing format and users cannot convert binary data to XML data.

Q: Is there a utility to convert binary to XML format (e.g. the user generates binary data by mistake, they would need to rerun the trace)?
A: No. Setting the default mode to XML would solve this problem.

Q: For peer agent discovery, who controls defining the format? How do we handle mixed-modes?
A: This use case has not been considered.

Stanislav to provide cost benefit analysis for:
- Compression benefits and complexity costs for removing unused fields.
- Performance costs for current binary data loaders.