PTP/designs/rm new proxy protocol
New Resource Manager Proxy Protocol
Currently, the proxy protocol is an ASCII text based protocol consisting of a set of space-delimited tokens in ASCII text. This format is described in detail in the comments preceding the read() method in the org.eclipse.ptp.proxy.packet.ProxyPacket class, contained in the org.eclipse.ptp.proxy.protocol plugin.
The protocol will be modified so use a binary message format to reduce the number of bytes sent per message. Also, to further reduce message length, messages will be compressed using Huffman compression before being transmitted.
This does not address protocol, optimization fully. In particular, the proxy may generate messages where integer values and bitsets are converted to ASCII strings. Placing operands of integer and bitset type into the message stream requires each argument to contain a type flag indicating the type of the argument, string, integer or bitset. This adds one byte to each operand. These formats probably occur infrequently enough that there is little benefit to handling these types specially. If inadequate optimizations are accomplished otherwise, this decision will be revisited.
Integer and short integer fields with the exception of the message length will be stored in varint format. This format encodes integers in a variable length format which has the advantage that smaller values require less storage. An example of varint encoding is in the Base 128 Varints section at the URL http://code.google.com/apis/protocolbuffers/docs/encoding.html. Note that once a field is encoded in varint format, the significance of int vs short data type disappears since a short or int with the same value has the same encoding in varint format. Also note that a varint must be decoded back to integer format before checking its value or sign.
The following changes will be made:
- The length header for the message is currently an 8-byte hexadecimal number. This will be replaced with a 4 byte integer in big endian order, and will be set to the length of the remaining message body. This integer is an exception to the rule that integers and short integers will be in varint format. The reason is that the length field specifies the number of bytes that are to be read to retrieve the message body. If the length is formatted as a 4 byte integer, it can be read in a single read, and then the code which attempts to read the message body can also attempt to read the message body in a single read. If the length was encoded as a varint, then the code which reads the length would be required to read it one byte at a time to properly reconstruct the integer length without inadvertently reading too many bytes. Note that even though the code may attempt to read or write a 4 byte int or a full message body, proper socket programming requires that the code which issues the read or write must check the number of bytes actually transmitted and send the remaining bytes as required. This is required since the network connection may only have the capacity to send or receive some of the data transferred.
- The message header currently contains a 4 hexadecimal digit command id, an 8 hexadecimal digit transaction id and an 8 hexadecimal digit argument count, all separated by ':'. This header will be replaced by the following structure:
- Command ID (varint)
- Transaction ID (varint)
- Argument count (varint)
- Individual arguments are composed of an 8 hexadecimal digit length followed by a ':' and the string containing the argument value. The argument value may be a simple string or it may be a key=value pair. This representation will be replaced by a length/value representation for both the key and value portion of the argument. The length will be in varint format. If the length is zero, that component of the argument is not used. If the length is positive, then it specifies the length of the following string, where the following string does not have a trailing x'00' byte. If the length is negative, then it is an index into either an enumeration table, for the first component of the argument, or an index into a string table for the second portion of the argument. If the length is negative, signifying an index, then no string bytes follow. To summarize, both the key and value components of an individual argument can be one of the following forms.
- Length = 0, component value is omitted and the component consists of only a single byte varint with the value x'00'.
- Length > 0, component consists of length in varint format followed by ASCII string with no trailing x'00' byte.
- Length < 0, component consists of an index into an enumeration table (for key) or string table (for value) in varint format. There is no ASCII string following.
The result of these optimization changes will be a message with the following format:
- 4 byte integer length, specifying length of the following message body
- Message body
- 1 byte flags
- x'80' indicating the next byte is a flag byte. The rule is that if this flag byte is the last flag byte, this bit will be zero. If there is a following flag byte, then this byte's flag bit is set. Currently, there is only a single flag byte, so this bit is always zero.
- x'40' indicating message is compressed using Huffman compression
- x'20' indicator to flow control (iteration 3) that this message is a priority message
- varint command id
- varint transaction id
- varint argument count
- 0..n arguments, each formatted as described above.