Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.
Difference between revisions of "User:Rick.barkhouse.oracle.com/VTD"
Line 1: | Line 1: | ||
=VTD-XML Investigation= | =VTD-XML Investigation= | ||
− | VTD-XML ([http://vtd-xml.sourceforge.net/ http://vtd-xml.sourceforge.net/]) is a high-performance XML processing model that deals with XML in a binary form, instead of the traditional text form. | + | VTD-XML ([http://vtd-xml.sourceforge.net/ http://vtd-xml.sourceforge.net/]) is a high-performance XML processing model that deals with XML in a binary form, instead of the traditional text form. VTD stands for '''V'''irtual '''T'''oken '''D'''escriptor. |
VTD-XML parses an XML document and builds an internal data structure representing the entire XML document in <tt>byte[]</tt> form. Each "token" of the XML document is represented as the following 64-bit integer: | VTD-XML parses an XML document and builds an internal data structure representing the entire XML document in <tt>byte[]</tt> form. Each "token" of the XML document is represented as the following 64-bit integer: | ||
Line 24: | Line 24: | ||
** Token type: 4 bits (b63~b60) | ** Token type: 4 bits (b63~b60) | ||
** Reserved bit: 2 bits (b31: b30) | ** Reserved bit: 2 bits (b31: b30) | ||
− | |||
− | |||
==VTD-XML Core Concepts== | ==VTD-XML Core Concepts== |
Revision as of 12:41, 14 December 2012
VTD-XML Investigation
VTD-XML (http://vtd-xml.sourceforge.net/) is a high-performance XML processing model that deals with XML in a binary form, instead of the traditional text form. VTD stands for Virtual Token Descriptor.
VTD-XML parses an XML document and builds an internal data structure representing the entire XML document in byte[] form. Each "token" of the XML document is represented as the following 64-bit integer:
- It is a binary format specification, not an API specification
- A VTD record is a primitive data type (integer multiple of 32 bits) that encodes the following parameters of a token in an XML file:
- Starting offset
- Length
- Nesting depth
- Token type
- VTD requires that XML document be maintained intact in memory.
- Our current VTD record layout further specifies the following:
- Use 64 bits as the primitive type (b63~b0)
- Big endian
- Starting offset: 30 bits (b29 ~ b0) maximum value is 2^30 -1 = 1G -1
- Length: 20 bits (b51 ~ b32) maximum value is 2^20-1 = 1M -1
- For some token type
- Prefix length: 9 bits (b51~ b43) max value 511
- Q-name length: 11 bits (b42 ~ b 32) max value 1023
- For some token type
- Depth: 8 bits (b59~b52) max value is 2^8-1 = 255
- Token type: 4 bits (b63~b60)
- Reserved bit: 2 bits (b31: b30)
VTD-XML Core Concepts
Unmarshalling a VTD-XML document
VTDGen vg = new VTDGen(); // from existing byte[] // true indicates namespace aware vg.setDoc(byte[]); vg.parse(true); // - or - // from file vg.parseFile("old.xml",false)