Rewrite PHP grammar in ANTLR

From Eclipsepedia

Revision as of 04:07, 28 March 2009 by Dustin.xu.gmail.com (Talk | contribs)

(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Abstract

Currently PDT plugin uses JavaCUP to generate the parser for PHP grammar. This proposal is about using ANTLR to replace JavaCUP, rewrite PHP grammar.


Background

Why choose ANTLR? 1. Since LR Parser gives an error reporting while encountering the Reduce/Reduce or Reduce/Shift conflict, it is not easy to understand. If you want to understand one error generated by Reduce/Shift conflict, you need to follow the debug trace. ANTLR uses LL(*) grammar, error reporting is more readable, moreover, you can use ANTLRWorks, a GUI development environment to debug your grammar.

2. ANTLR has lexer and parser grammar together, using capital and small letters to distinguish them, they are written in the same kind of grammar. While CUP has seperated lexer(Jlex) and parser(CUP) grammar, people need to learn two things before writing an compiler.

3. ANTLR grammars are easier to write, and easier to read than CUP grammars, especially after adding actions in the rules. For example, CUP needs to indicate precedences of tokens, while ANTLR contains this information according to your grammar structure.

4. ANTLR has a new feature tree pattern matching. As long as we have a tree, we can use the filter to focus on subtrees which we care about, instead of giving full tree grammar again. (http://www.antlr.org/wiki/display/ANTLR3/filter+tree+grammar+mode)

5. ANTLR is still developed and supported today. (http://www.antlr.org)

Details

Current classes related to CUP:

1.AST1:

 a)./ast/nodes:
   Build AST for editor
 b)../ast/rewrite:
   Dynamically modify current AST to solve some checking like Quick Fix etc.
    ASTRewriteAnalyzer.java
    SymbolsProvider.java
    TokenScanner.java
 c)../ast/scanner:
   These files are generated by CUP, used for operations related to editor.

2.AST2:

 a)../compiler.ast.nodes:
   Build AST for compiler
 b)../compiler.ast.parser:
   These files are generated by CUP, they are used for parsing from compiler perspective.

3.Document Parser:

 These are the classes generated by JFlex, used to support PHP API document.
 a)../documentModel
 b)../documentModel.parser

Plan: 1.Write PHP grammar by ANTLR. After I compare the classes under “../ast/scanner” and “../compiler/ast/parser”, I find they are almost the same except for some interface names, this is not difficult to explain because both of them are generated by the same grammar. Therefore, the first step is to write correct PHP grammar in ANTLR,. I plan to take two weeks to finish this part. 2.Build an appropriate AST refer to current AST structure, as "../ast/nodes" and “../compiler.ast.nodes” do. These classes should be generated automatically after writing the rewriting rules in ANTLR grammar. Since AST plays an important role in the following steps, like semantic checking, as well as there are two AST currently, I think there should be some differences in the tree structures, I plan one and a half week for each AST. 3.Construct symbol table to resolve variables, methods, classes and etc. This part is very important too, I need three weeks to achieve it, according to my experience to build the symbol table: One weeks for Definiation part, two week for Resolve part is appropriate. 4.Finish the rest part, like "../documentModel.parser", it seems like an parser for PHP dococument, I guess less than one week is enough for it. 5.Test all the code in the rest weeks, complete the document as well as comments in source code.