Jump to: navigation, search

Difference between revisions of "Rewrite PHP grammar in ANTLR"

(New page: == Abstract == Currently PDT plugin uses JavaCUP to generate the parser for PHP grammar. This proposal is about using ANTLR to replace JavaCUP, rewrite PHP grammar. == Background == Why...)
 
(Background)
 
(4 intermediate revisions by the same user not shown)
Line 4: Line 4:
  
  
== Background ==
+
== Why choose ANTLR? ==
Why choose ANTLR?
+
 
1. Since LR Parser gives an error reporting while encountering the Reduce/Reduce or Reduce/Shift conflict, it is not easy to understand. If you want to understand one error generated by Reduce/Shift conflict, you need to follow the debug trace. ANTLR uses LL(*) grammar, error reporting is more readable, moreover, you can use ANTLRWorks, a GUI development environment to debug your grammar.
 
1. Since LR Parser gives an error reporting while encountering the Reduce/Reduce or Reduce/Shift conflict, it is not easy to understand. If you want to understand one error generated by Reduce/Shift conflict, you need to follow the debug trace. ANTLR uses LL(*) grammar, error reporting is more readable, moreover, you can use ANTLRWorks, a GUI development environment to debug your grammar.
  
Line 16: Line 15:
 
5. ANTLR is still developed and supported today. (http://www.antlr.org)
 
5. ANTLR is still developed and supported today. (http://www.antlr.org)
  
== Details ==
+
== Current classes related to JavaCUP ==
Current classes related to CUP:
+
 
+
 
1.AST1:
 
1.AST1:
 
   a)./ast/nodes:
 
   a)./ast/nodes:
Line 24: Line 21:
 
   b)../ast/rewrite:
 
   b)../ast/rewrite:
 
     Dynamically modify current AST to solve some checking like Quick Fix etc.
 
     Dynamically modify current AST to solve some checking like Quick Fix etc.
    ASTRewriteAnalyzer.java
+
      ASTRewriteAnalyzer.java
    SymbolsProvider.java
+
      SymbolsProvider.java
    TokenScanner.java
+
      TokenScanner.java
 
   c)../ast/scanner:
 
   c)../ast/scanner:
 
     These files are generated by CUP, used for operations related to editor.
 
     These files are generated by CUP, used for operations related to editor.
Line 41: Line 38:
 
   b)../documentModel.parser
 
   b)../documentModel.parser
  
Plan:
+
== Plan ==
 
1.Write PHP grammar by ANTLR. After I compare the classes under “../ast/scanner” and “../compiler/ast/parser”, I find they are almost the same except for some interface names, this is not difficult to explain because both of them are generated by the same grammar. Therefore, the first step is to write correct PHP grammar in ANTLR,. I plan to take two weeks to finish this part.
 
1.Write PHP grammar by ANTLR. After I compare the classes under “../ast/scanner” and “../compiler/ast/parser”, I find they are almost the same except for some interface names, this is not difficult to explain because both of them are generated by the same grammar. Therefore, the first step is to write correct PHP grammar in ANTLR,. I plan to take two weeks to finish this part.
 +
 
2.Build an appropriate AST refer to current AST structure, as "../ast/nodes" and “../compiler.ast.nodes” do. These classes should be generated automatically after writing the rewriting rules in ANTLR grammar. Since AST plays an important role in the following steps, like semantic checking, as well as there are two AST currently, I think there should be some differences in the tree structures, I plan one and a half week for each AST.
 
2.Build an appropriate AST refer to current AST structure, as "../ast/nodes" and “../compiler.ast.nodes” do. These classes should be generated automatically after writing the rewriting rules in ANTLR grammar. Since AST plays an important role in the following steps, like semantic checking, as well as there are two AST currently, I think there should be some differences in the tree structures, I plan one and a half week for each AST.
 +
 
3.Construct symbol table to resolve variables, methods, classes and etc. This part is very important too, I need three weeks to achieve it, according to my experience to build the symbol table: One weeks for Definiation part, two week for Resolve part is appropriate.
 
3.Construct symbol table to resolve variables, methods, classes and etc. This part is very important too, I need three weeks to achieve it, according to my experience to build the symbol table: One weeks for Definiation part, two week for Resolve part is appropriate.
 +
 
4.Finish the rest part, like "../documentModel.parser", it seems like an parser for PHP dococument, I guess less than one week is enough for it.
 
4.Finish the rest part, like "../documentModel.parser", it seems like an parser for PHP dococument, I guess less than one week is enough for it.
 +
 
5.Test all the code in the rest weeks, complete the document as well as comments in source code.
 
5.Test all the code in the rest weeks, complete the document as well as comments in source code.

Latest revision as of 03:10, 28 March 2009

Abstract

Currently PDT plugin uses JavaCUP to generate the parser for PHP grammar. This proposal is about using ANTLR to replace JavaCUP, rewrite PHP grammar.


Why choose ANTLR?

1. Since LR Parser gives an error reporting while encountering the Reduce/Reduce or Reduce/Shift conflict, it is not easy to understand. If you want to understand one error generated by Reduce/Shift conflict, you need to follow the debug trace. ANTLR uses LL(*) grammar, error reporting is more readable, moreover, you can use ANTLRWorks, a GUI development environment to debug your grammar.

2. ANTLR has lexer and parser grammar together, using capital and small letters to distinguish them, they are written in the same kind of grammar. While CUP has seperated lexer(Jlex) and parser(CUP) grammar, people need to learn two things before writing an compiler.

3. ANTLR grammars are easier to write, and easier to read than CUP grammars, especially after adding actions in the rules. For example, CUP needs to indicate precedences of tokens, while ANTLR contains this information according to your grammar structure.

4. ANTLR has a new feature tree pattern matching. As long as we have a tree, we can use the filter to focus on subtrees which we care about, instead of giving full tree grammar again. (http://www.antlr.org/wiki/display/ANTLR3/filter+tree+grammar+mode)

5. ANTLR is still developed and supported today. (http://www.antlr.org)

Current classes related to JavaCUP

1.AST1:

 a)./ast/nodes:
   Build AST for editor
 b)../ast/rewrite:
   Dynamically modify current AST to solve some checking like Quick Fix etc.
     ASTRewriteAnalyzer.java
     SymbolsProvider.java
     TokenScanner.java
 c)../ast/scanner:
   These files are generated by CUP, used for operations related to editor.

2.AST2:

 a)../compiler.ast.nodes:
   Build AST for compiler
 b)../compiler.ast.parser:
   These files are generated by CUP, they are used for parsing from compiler perspective.

3.Document Parser:

 These are the classes generated by JFlex, used to support PHP API document.
 a)../documentModel
 b)../documentModel.parser

Plan

1.Write PHP grammar by ANTLR. After I compare the classes under “../ast/scanner” and “../compiler/ast/parser”, I find they are almost the same except for some interface names, this is not difficult to explain because both of them are generated by the same grammar. Therefore, the first step is to write correct PHP grammar in ANTLR,. I plan to take two weeks to finish this part.

2.Build an appropriate AST refer to current AST structure, as "../ast/nodes" and “../compiler.ast.nodes” do. These classes should be generated automatically after writing the rewriting rules in ANTLR grammar. Since AST plays an important role in the following steps, like semantic checking, as well as there are two AST currently, I think there should be some differences in the tree structures, I plan one and a half week for each AST.

3.Construct symbol table to resolve variables, methods, classes and etc. This part is very important too, I need three weeks to achieve it, according to my experience to build the symbol table: One weeks for Definiation part, two week for Resolve part is appropriate.

4.Finish the rest part, like "../documentModel.parser", it seems like an parser for PHP dococument, I guess less than one week is enough for it.

5.Test all the code in the rest weeks, complete the document as well as comments in source code.