EclipseLink/Development/DBWS/ParseDDLDS

From Eclipsepedia

Jump to: navigation, search


Parsing DDL for Metadata Design Spec

Document History

Date Author Version Description & Notes
110601 Mike Norman 1.0
130612 Dave McCann 1.1

Overview

The primary job of the DBWSBuilder utility is to extract metadata from the database about some artifact (i.e. a Table or a StoredProcedure) that is to be used as the basis of a Web Service. The DBWSBuilder translates the retrieved information (field names, arguments, scalar or composite datatypes, etc.) into DBWS runtime artifacts (EclipseLink Project XML files, WSDL and XML schema, etc.) Thus it is crucial that DBWSBuilder get complete and accurate information from the database.

Broken Use Cases

The existing o.e.p.platform.database.oracle.publisher classes cannot handle the following use cases:

  • default values for StoredProcedure/StoredFunction arguments (information not available in Data Dictionary prior to 11g)
  • strongly-typed REF CURSOR as OUT argument when underlying RECORD type is also used as an argument or package-level global variable
CREATE OR REPLACE PACKAGE CURSOR_TEST AS
   TYPE EREC IS RECORD (
        FLAG PLS_INTEGER,
	EMPNO NUMBER(4),
        ENAME VARCHAR2(10),
        JOB VARCHAR2(9)
   );
   TYPE EREC_CURSOR IS REF CURSOR RETURN EREC;
   FUNCTION EFUNC(PARM1 IN VARCHAR2, PARM2 IN EREC) RETURN EREC_CURSOR;
END CURSOR_TEST;
  • object-graph distortion between optional arguments and overloading

Unfortunately, the information in the Data Dictionary views can be (under certain circumstances) incomplete or inaccurate, or the object-graph can be distorted. For example, an argument to a StoredProcedure may have a default value assigned to it, thus making it optional. This in turn means that the calling sequence can change (fewer arguments need be passed in):

CREATE OR REPLACE PACKAGE SOMEPKG AS
  TYPE EMP_REC IS RECORD(
    EMPNO EMP.EMPNO%TYPE,
    FNAME EMP.FNAME%TYPE,
    LNAME EMP.LNAME%TYPE
  );
  PROCEDURE DOSOMETHING(MYREC IN OUT EMP_REC, STUFF IN VARCHAR2,
    P_EMPNO IN NUMBER := 20, FLAG IN BOOLEAN DEFAULT FALSE,
    WHATEVER IN VARCHAR2 DEFAULT 'bogus');
END;

The object-graph contains four StoredProcedure 'pseudo' representations:

DOSOMETHING(MYREC,STUFF)
DOSOMETHING(MYREC,STUFF, P_EMPNO)
DOSOMETHING(MYREC,STUFF, P_EMPNO, FLAG)
DOSOMETHING(MYREC,STUFF, P_EMPNO, FLAG, WHATEVER)

however, there are an additional four 'pseudo' representations not generated:

DOSOMETHING(MYREC,STUFF, FLAG)
DOSOMETHING(MYREC,STUFF, WHATEVER)
DOSOMETHING(MYREC,STUFF, FLAG, WHATEVER)
DOSOMETHING(MYREC,STUFF, P_EMPNO, WHATEVER)

This highlights an issue where the number of in-memory objects could expand geometrically at the rate of 2N. In addition, if the StoredProcedure DOSOMETHING was overloaded, the 'pseudo' representations could conflict.

  • object-graph disconnect between PL/SQL composite arguments and JDBC composite arguments (bug 329435)
CREATE OR REPLACE PACKAGE TEST_PKG AS
   TYPE TESTREC IS TEST_EMP%ROWTYPE;
...
END;
 
CREATE OR REPLACE TYPE PHONE_OBJECT AS OBJECT (
  AREACODE VARCHAR2(3),
  PHONENUMBER VARCHAR2(20),
  PHONETYPE VARCHAR2(20)
);
CREATE OR REPLACE TYPE PHONE_LIST AS VARRAY(2) OF PHONE_OBJECT;
CREATE TABLE TEST_EMP (
    EMPNO NUMBER(4) NOT NULL,
    FNAME VARCHAR2(40),
    LNAME VARCHAR2(40),
    PHONES PHONE_LIST,
    PRIMARY KEY (EMPNO)
);

Internally, the type TESTREC points to table TEST_EMP, but when the field PHONES is referenced, the traversal algorithm breaks down.

Source DDL

All the information required is present in the source DDL for the database artifact. Other OS projects that reverse-engineer information from dictionary views have also discovered these problems (a discussion of the problem and decision to switch to parse source DDL)

Extracting DDL

There is a utility method available in the Oracle database DBMS_METADATA.GET_DDL(specType, typeName, schema) that has been available since 9i that successfully returns DDL for a wide-variety of artifacts:

  • package specification
SQL> SELECT DBMS_METADATA.GET_DDL('PACKAGE_SPEC', 'SOMEPACKAGE', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE OR REPLACE PACKAGE "SCOTT"."SOMEPACKAGE" AS
  TYPE TBL1 IS TABLE OF VARCHAR2(111) INDEX BY BINARY_INTEGER;
  TYPE TBL2 IS TABLE OF NUMBER INDEX BY BINARY_INTEGER;
  TYPE ARECORD IS RECORD (
    T1 TBL1,
    T2 TBL2,
    T3 BOOLEAN
  );
  TYPE TBL3 IS TABLE OF ARECORD INDEX BY PLS_INTEGER;
  TYPE TBL4 IS TABLE OF TBL2 INDEX BY PLS_INTEGER;
  PROCEDURE P1(SIMPLARRAY IN TBL1, FOO IN VARCHAR2);
  PROCEDURE P2(OLD IN TBL2, NEW IN TBL2);
  PROCEDURE P3(RECARRAY IN TBL3);
  PROCEDURE P4(REC IN ARECORD);
  PROCEDURE P5(OLDREC IN ARECORD, NEWREC OUT ARECORD);
  PROCEDURE P6(BAR IN TBL4);
  PROCEDURE P7(SIMPLARRAY IN TBL1, FOO IN VARCHAR2);
  PROCEDURE P7(SIMPLARRAY IN TBL1, FOO IN VARCHAR2, BAR IN VARCHAR2);
  PROCEDURE P8(FOO IN VARCHAR2);
  PROCEDURE P8(FOO IN VARCHAR2, BAR IN VARCHAR2);
  FUNCTION F1(OLDREC IN ARECORD, FOO IN VARCHAR2) RETURN ARECORD;
  FUNCTION F2(OLD IN TBL2, SIMPLARRAY IN TBL1) RETURN TBL2;
  FUNCTION F3(SIMPLARRAY IN TBL1, OLDVAR IN VARCHAR2) RETURN VARCHAR2;
  FUNCTION F4(RECARRAY IN TBL3, OLDREC IN ARECORD) RETURN TBL3;
END SOMEPACKAGE;
  • table
SQL> SELECT DBMS_METADATA.GET_DDL('TABLE', 'EMP', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE TABLE "SCOTT"."EMP"(
    "EMPNO" NUMBER(4,0),
    "ENAME" VARCHAR2(10),
    "JOB" VARCHAR2(9),
    "MGR" NUMBER(4,0),
    "HIREDATE" DATE,
    "SAL" NUMBER(7,2),
    "COMM" NUMBER(7,2),
    "DEPTNO" NUMBER(2,0)
) 
ALTER TABLE "SCOTT"."EMP" ADD CONSTRAINT "PK_EMP" PRIMARY KEY ("EMPNO") ENABLE;

NB - if the table 'EMP' is actually a materialized view, DBMS_METADATA still returns table-&-column information.

  • object types: structs, varrays, nested tables
SQL> SELECT DBMS_METADATA.GET_DDL('TYPE_SPEC', 'EMP_ADDRESS', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE OR REPLACE TYPE "SCOTT"."EMP_ADDRESS" AS OBJECT (
 STREET       VARCHAR2(100),
 SUBURB       VARCHAR2(50),
 ADDR_REGION  REGION,
 POSTCODE     INTEGER
)
  • top-level procedures and functions.
SQL> SELECT DBMS_METADATA.GET_DDL('FUNCTION', 'BUILDTBL2', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE OR REPLACE FUNCTION "SCOTT"."BUILDTBL2" (NUM IN INTEGER) RETURN SOMEPACKAGE_TBL2 AS
    L_DATA SOMEPACKAGE_TBL2 := SOMEPACKAGE_TBL2();
    BEGIN
        FOR I IN 1 .. NUM LOOP
         L_DATA.EXTEND;
         L_DATA(I) := I;
        END LOOP;
    RETURN L_DATA;
END;

The DDL returned is complete, no artifacts are missing nor is any type information distorted.

DBMS_METADATA Session filters

The amount of information presented in the DDL can be controlled - on a per-connection (a.k.a. 'session') level by DBMS_METADATA.SESSION_TRANSFORM's. For example, a table has information regarding its storage attributes and tablespace; however, a user may create a table without specifying any of these attributes, relying on the built-in defaults to be sufficient. The DDL retrieved by DBMS_METADATA.GET_DDL will show these attributes:

CREATE TABLE "SCOTT"."EMP"(
    "EMPNO" NUMBER(4,0),
    "ENAME" VARCHAR2(10),
    "JOB" VARCHAR2(9),
    "MGR" NUMBER(4,0),
    "HIREDATE" DATE,
    "SAL" NUMBER(7,2),
    "COMM" NUMBER(7,2),
    "DEPTNO" NUMBER(2,0)
) PCTFREE 10 PCTUSED 40 INITRANS 1 MAXTRANS 255 NOCOMPRESS LOGGING
STORAGE(INITIAL 65536 NEXT 1048576 MINEXTENTS 1 MAXEXTENTS 2147483645
PCTINCREASE 0 FREELISTS 1 FREELIST GROUPS 1 BUFFER_POOL DEFAULT)
TABLESPACE "USERS"

These attributes do not contribute any useful information about the structure and make up of the table (column names, column types, etc.) The transform parameters can be set to "cleanup" the displayed information:

SELECT DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM,'STORAGE','FALSE') FROM DUAL;
SELECT DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM,'TABLESPACE','FALSE') FROM DUAL;
SELECT DBMS_METADATA.SET_TRANSFORM_PARAM(DBMS_METADATA.SESSION_TRANSFORM,'SEGMENT_ATTRIBUTES','FALSE') FROM DUAL;
SELECT DBMS_METADATA.GET_DDL('TYPE_SPEC', 'EMP', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE TABLE "SCOTT"."EMP"(
    "EMPNO" NUMBER(4,0),
    "ENAME" VARCHAR2(10),
    "JOB" VARCHAR2(9),
    "MGR" NUMBER(4,0),
    "HIREDATE" DATE,
    "SAL" NUMBER(7,2),
    "COMM" NUMBER(7,2),
    "DEPTNO" NUMBER(2,0)
)

The transform parameters are initially set as:

PRETTY = TRUE
SQLTERMINATOR = TRUE
CONSTRAINTS = TRUE
CONSTRAINTS_AS_ALTER = TRUE
REF_CONSTRAINTS = FALSE
SEGMENT_ATTRIBUTES = FALSE
STORAGE = FALSE
TABLESPACE = FALSE
SPECIFICATION = TRUE
BODY = FALSE

In order to 'future-proof' ourselves from future versions of DBMS_METADATA adding or changing the meaning of these arguments, the end-user of the Oracle DDL parser will be able to modify the list of transform parameters.

DBMS_METADATA failure

If for some reason DBMS_METADATA does not provide the required information, it should be possible to fail-over to use JDBC metadata that is used for supporting non-Oracle platforms. However JDBC metadata not support all use cases, especially those involving PL/SQL artifacts.

Parsing Technology

The technology and techniques to parse languages are well-known: the language is described by a grammar; a parser is constructed that produces an AST (Abstract Syntax Tree). The tree is traversed and required objects generated.

From Wikipedia ...
Extended Backus–Naur Form (EBNF) is a notation used for expressing context-free grammars: that is, a formal way to describe computer programming languages and other formal languages. The International Organization for Standardization has adopted an EBNF standard (ISO/IEC 14977). The earliest EBNF was originally developed by Niklaus Wirth. An EBNF consists of terminal symbol and non-terminal production rules which are the restrictions governing how terminal symbols can be combined into a legal sequence. Examples of terminal symbols include alphanumeric characters, punctuation marks, and white space characters. The EBNF defines production rules where sequences of symbols are respectively assigned to a Ttrminal and nonterminal symbols:
digit excluding zero = "1" | "2" | "3" | "4" | "5" | "6" | "7" | "8" | "9" ;
digit                = "0" | digit excluding zero ;

This production rule defines the nonterminal digit which is on the left side of the assignment. The vertical bar represents an alternative and the terminal symbols are enclosed with quotation marks followed by a semicolon as terminating character. Hence a Digit is a 0 or a digit excluding zero that can be 1 or 2 or 3 and so forth until 9.

A production rule can also include a sequence of terminals or nonterminals, each separated by a comma, expressions that may be omitted or repeated can be represented through curly braces { ... }:

natural number = digit excluding zero , { digit } ;

The first decision is whether or not to use a tool to generate a parser or to build a parser by hand:

  • hand-built parsers typically take more time and effort to build, refactor or extend; however, they usually provide custom behaviour that is specific to the domain
  • tool-built parsers typically are easier (time,effort) to build, refactor or extend; however, the wide variety of tools can paradoxically make decision-making difficult. Another factor is the size and complexity of the grammar - some languages require hundreds of productions to describe their syntax - in these cases, a tool-built parser has a big advantage over a hand-built parser.

Parser tool: JavaCC

JavaCC constructs LL(k) Java parsers by processing grammar files (.jj or .jjt extension) and generating suitable Java source files. Lexical scanning and AST-tree building is integrated (other tools such as Lex and YACC separate these into phases) and comprehensive error reporting is supported (accurate to the source line and column number).

The two main reasons to choose JavaCC over other tools (say for example ANTRL) are:

  1. reduced binary footprint - the Java source code generated by JavaCC does not rely on any APIs other than those provided by Java SE;
    in comparison, code generated by ANTLR requires a runtime package of approx. 80 classes.
  2. an Eclipse IDE plugin that makes dealing with .jj files as natural as dealing with .java files;
    in comparison, ANTLRWorks is a standalone tool without any integration into the Eclipse IDE

Changes to DBWS

The major change to DBWS is the elimination of the remaining JPublisher code in o.e.p.platform.database.oracle.publisher. Existing JPub metadata classes cannot exist without the surrounding JPub code; therefore a new set of metadata classes will be created in package o.e.p.tools.database.metadata.* to model all the artifacts described by the DDL: for example o.e.p.tools.database.metadata.TableType to represent a table, o.e.p.tools.database.metadata.FieldType to represent a field in a table or in an object type. Finally, the DDL grammar file (DDLParser.jjt, ~40 productions) generates about ~15 Java files.

The bulk of the work will be to:

  1. create a complete set of metadata model classes
  2. create complete JavaCC productions to generate said model classes
  3. create vistors for the model classes
  4. refactor DBWSBuilder (PLSQLHelperObjectsBuilder, AdvancedJDBCORDescriptorBuilder, PLSQLOXDescriptorBuilder, etc) to adapt to the new model classes and visitors.

Metadata model classes

When describing DDL artifacts, two basic APIs are required: a 'marker' interface to indicate that a class models a database artifact (o.e.p.tools.database.metadata.DatabaseType) and sub-interfaces to indicate if the artifact is composite (CompositeDatabaseType extends DatabaseType) or scalar (ScalarDatabaseType extends DatabaseType). A composite database artifact contains additional enclosed database artifacts while a scalar database artifact contains only the necessary information to describe itself. For example, the TableType mentioned above is composite, containing a list of FieldType's to describe its columns; FieldType in turn is composite, containing the name of the field as well as an encloded DatabaseType that describes what type of data the column holds. The class VarChar2Type is a scalar database artifact, it does not contain within itself another DatabaseType. Scalar database types can be stateful - e.g. one instance of VarChar2Type can have a different size compared to another instance - or singleton - e.g. ScalarDatabaseTypeEnum.BOOLEAN is identical for all instances so it is safe to share a single instance across different instances of composite DatabaseType's

Metamodel class hier.png

Translating Oracle DDL documentation to JavaCC productions

The DDL for tables, packages, types and other database artifacts is not documented in BNF form, but as railroad diagrams (please follow the link for details on how to read railroad diagrams).

For example, the following railroad diagram describes the DDL for table creation: Table ddl railroad.gif

The equivalent JavaCC-compatible BNF is:

// PLSQL reserved keywords (prefix with K_ to avoid naming conflicts) - NB: incomplete list
TOKEN: {  
          <K_A: "A">
        | <K_ADD: "ADD">
...
        | <K_CREATE: "CREATE">
        | <K_GLOBAL: "GLOBAL">
...
}
// separators and operators (prefix with O_ to avoid naming conflicts)
TOKEN: {
          <O_ASSIGN: ":=">
...
        | <O_DOT: ".">
...
}
// table at 'top-level'
TableType parseTable():
{TableType tableType = null;
 String schema = null;
 String tableName = null;}
{
    <K_CREATE> [ <K_GLOBAL> <K_TEMPORARY> ] <K_TABLE> 
        [LOOKAHEAD(2) schema=OracleObjectName() <O_DOT> ] tableName=OracleObjectName()
        {
            tableType = new TableType(tableName);
            if (schema != null) {
                tableType.setSchema(schema);
            }
        }
        <O_OPENPAREN> columnDeclarations(tableType) <O_CLOSEPAREN>
         [ <K_ON> <K_COMMIT > [<K_DELETE> | <K_PRESERVE> ] <K_ROWS> ] <O_SEMICOLON>
        ( alterDeclaration(tableType) <O_SEMICOLON> ) *
     <EOF>
     {
         typesRepository.setDatabaseType(tableName, tableType);
         return tableType;
     }
}

JavaCC allows one to interleave grammar production rules with Java code that is then embedded in the generated parser code.

Visitors

The classes in the package o.e.p.tools.database.metadata model the information parsed from the DDL. The classes are used to build an in-memory object-graph that is then visited by other parts of DBWS. This pattern ensures that there is no coupling between the DDL parser and DBWS - i.e. the DDL parser can be used by other projects.

All classes implement the org.eclipse.persistence.tools.oracleddl.metadata.visit.DatabaseTypeVisitable interface:

public interface DatabaseTypeVisitable {
 
    public void accept(DatabaseTypeVisitor visitor);
 
}

The actual Visitor interface has callbacks for every model type:

public interface DatabaseTypeVisitor {
 
	//scalar visit callbacks
	public void visit(BinaryType databaseType);
	public void visit(BlobType databaseType);
	public void visit(LongRawType databaseType);
	public void visit(RawType databaseType);
	public void visit(CharType databaseType);
...
	//composite visit callbacks
	public void beginVisit(ArgumentType databaseType);
	public void visit(ArgumentType databaseType);
	public void endVisit(ArgumentType databaseType);
 
	public void beginVisit(FieldType databaseType);
	public void visit(FieldType databaseType);
	public void endVisit(FieldType databaseType);
...

The default implementation of DatabaseTypeVisitor is in BaseDatabaseTypeVisitor:

public class BaseDatabaseTypeVisitor implements DatabaseTypeVisitor {
 
	//scalar visit callbacks
	public void visit(BinaryType databaseType) {
	}
	public void visit(BlobType databaseType) {
	}
...
	//composite visit callbacks
	public void beginVisit(ArgumentType databaseType) {
	}
	public void visit(ArgumentType databaseType) {
		beginVisit(databaseType);
		DatabaseType dt = databaseType.getDataType();
		if (dt != null) {
			dt.accept(this);
		}
		endVisit(databaseType);
	}
	public void endVisit(ArgumentType databaseType) {
	}
...

The following is a simple Visitor that walks the object-graph and determines if any of the artifacts are 'un-resolved'. Something is 'un-resolved' if (for example) some table has a column whose datatype is a VArray type. The DDL for the table has no information about the VArray type other than its name. Another pass of the DDL parser with the DDL for the VArray type is required.

package org.eclipse.persistence.tools.oracleddl.metadata.visit;
 
//javase imports
import java.util.ArrayList;
import java.util.List;
 
//DDL parser imports
import org.eclipse.persistence.tools.oracleddl.metadata.UnresolvedSizedType;
import org.eclipse.persistence.tools.oracleddl.metadata.UnresolvedType;
 
public class UnresolvedTypesVisitor extends BaseDatabaseTypeVisitor {
 
    protected List<String> unresolvedTypes = new ArrayList<String>();
 
    public List<String> getUnresolvedTypes() {
        return unresolvedTypes;
    }
 
    public void visit(UnresolvedType unresolvedType) {
        unresolvedTypes.add(unresolvedType.getTypeName());
    }
 
    public void visit(UnresolvedSizedType unresolvedType) {
        unresolvedTypes.add(unresolvedType.getTypeName());
    }
 
}

Metamodel visitors.png

Parser Util classes

Parser utils.png

Use Cases

Basic Table

The following example illustrates how the DDLParser can be used to generate a TableType instance based on Table DDL.

The DDL for SCOTT.EMP would be extracted as follows:

SQL> SELECT DBMS_METADATA.GET_DDL('TABLE', 'EMP', SYS_CONTEXT('USERENV', 'CURRENT_USER')) AS RESULT FROM DUAL;
CREATE TABLE "SCOTT"."EMP"(
    "EMPNO" NUMBER(4,0),
    "ENAME" VARCHAR2(10),
    "JOB" VARCHAR2(9),
    "MGR" NUMBER(4,0),
    "HIREDATE" DATE,
    "SAL" NUMBER(7,2),
    "COMM" NUMBER(7,2),
    "DEPTNO" NUMBER(2,0)
) 
ALTER TABLE "SCOTT"."EMP" ADD CONSTRAINT "PK_EMP" PRIMARY KEY ("EMPNO") ENABLE;

Once the DDL is obtained, the DDLParser can be used to generate the meta-model objects based on the DDL string:

String ddlString;
// set ddlString ...
 
DDLParser parser = new DDLParser(new StringReader(ddlString));
TableType tableType = parser.parseTable();