Skip to main content

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Jump to: navigation, search

Difference between revisions of "Xtext/Documentation"

Line 110: Line 110:
 
   event=[Event] '=>' state=[State];
 
   event=[Event] '=>' state=[State];
 
</source>
 
</source>
 +
 +
In the following the different concepts of the grammar language are explained. We refer to this grammar when useful.
 +
 
== Language Declaration ==
 
== Language Declaration ==
 
The first line
 
The first line
Line 122: Line 125:
 
  language foo.SecretCompartment
 
  language foo.SecretCompartment
  
The first line is also used to declare a super language to inherit from. This mechanism is described [[#Language Inheritance| here]].
+
The first line can also be used to declare a super language to inherit from. This mechanism is described [[#Language Inheritance| here]].
  
 
== Rules ==
 
== Rules ==
  
The parsing is based on ANTLR 3, which is a parser generator framework based on [[http://www.antlr.org/wiki/display/ANTLR3/3.+LL(*)+Parsing+(Excursion) LL(*) algorithm]].
+
The parsing is based on ANTLR 3, which is a parser generator framework based on an [[http://www.antlr.org/wiki/display/ANTLR3/3.+LL(*)+Parsing+(Excursion) LL(*) algorithm]].
Basically parsing can be separated in three phases.  
+
Basically parsing can be separated in the following phases.  
 +
# lexing
 +
# parsing
 +
# model construction
 +
# linking
 +
# validation
  
 
=== Lexer Rules ===
 
=== Lexer Rules ===
 
In the first phase a sequence of characters (the text input) is transformed into a sequence of so called tokens.  
 
In the first phase a sequence of characters (the text input) is transformed into a sequence of so called tokens.  
 
Each token consists of one or more characters and was matched by a particular lexer rule.
 
Each token consists of one or more characters and was matched by a particular lexer rule.
In the secret compartments example there are no explicitly defined lexer rule, since only one built-in lexer rule (namely the ID) is used.
+
In the secret compartments example there are no explicitly defined lexer rules, since it uses built-in lexer rules, only (the ID rule).
  
 
That rule is defined in the built-in super language (see [[#Language Inheritance]]) as follows:
 
That rule is defined in the built-in super language (see [[#Language Inheritance]]) as follows:
Line 140: Line 148:
 
</code>
 
</code>
  
which is a lexer rule starts with a keyword 'lexer', followed by an Identifier 'ID', followed by a colon ':' followed by a string literal containing the definition of the lexer rule in Antlr syntax. In other words the lexer rules are blackbox escapes to Antlr.
+
It says that a Token ID starts with a letter ('a'..'z'|'A'..'Z') or underscore ('_') followed by any number of letters, underscores and numbers ('0'..'9'). Note that this declaration is is a black box where you use Antlr syntax directly. Please ignore the optional '^' for the moment.
  
This is the formal definition:
+
This is the formal definition of lexer rules:
  
 
<code lang="text">
 
<code lang="text">
Line 150: Line 158:
 
</code>
 
</code>
  
Note that it is possible to optionally specify a return type. By default a lexer rule creates and returns strings (type ecore::EString). However, if you want to have a different type you can specify it. For instance, the built-in lexer rule 'INT' is defined like so:
+
==== Return types ====
 +
A lexer rule returns a value, which defaults to a string (type ecore::EString).
 +
However, if you want to have a different type you can specify it. For instance, the built-in lexer rule 'INT' is defined like so:
  
 
<code lang="text">
 
<code lang="text">
Line 157: Line 167:
 
</code>
 
</code>
  
This says, that the lexer rule INT returns instances of ecore::EINT. It is possible to define any kind of data type here (also self defined ones), but you need to provide a corresponding 'ValueConverter' implementation, knowing how to convert a String to an instance of the specified data type and reverse.
+
This says, that the lexer rule INT returns instances of ecore::EInt. It is possible to define any kind of data type here, which just need to be an instance of ecore::EDataType.
 +
In order to tell the parser how to convert the lexed string to a value of the declared data type, you need to provide your own implementation of 'IValueConverterService'.
 +
 
 +
Have a look at [[http://dev.eclipse.org/viewcvs/index.cgi/org.eclipse.tmf/org.eclipse.xtext/plugins/org.eclipse.xtext/src/org/eclipse/xtext/builtin/conversion/XtextBuiltInConverters.java?root=Modeling_Project&view=markup org/eclipse/xtext/builtin/conversion/XtextBuiltInConverters.java]] to find out how such an implementation looks like.
 +
 
 +
The implementation needs to be registered as a service (see [[Service Framework]]).
  
 
==== Enum Rules ====
 
==== Enum Rules ====
Line 202: Line 217:
  
  
= Service framework =
+
= Service Framework =
  
 
= Runtime Architecture =
 
= Runtime Architecture =

Revision as of 07:57, 18 September 2008

What is Xtext?

The TMF Xtext project provides a domain-specific language (the grammar language) for description of textual programming languages and domain-specific languages. It is tightly integrated with the Eclipse Modeling Framework (EMF) and leverages the Eclipse Platform in order to provide language-specific tool support. In contrast to common parser generators the grammar language is much simpler but is used to derive much more than just a parser and lexer. From a grammar the following is derived:

  • incremental, Antlr3-based parser and lexer
  • Ecore-based meta models (optional)
  • a serializer, used to serialize instances of such meta models back to a parseable textual representation
  • an implementation of the EMF Resource interface (based on the parser and the serializer)
  • a full-fledged integration of the language into Eclipse IDE
    • syntax coloring
    • navigation (F3, etc.)
    • code completion
    • outline views
    • code templates
    • folding, etc.

The generated artifacts are wired up through a dependency injection framework, which makes it easy to exchange certain functionality in a non-invasive manner. For example if you don't like the default code assistant implementation, you need to come up with an alternative implementation of the corresponding service and configure it via eclipse extension point.

The Grammar Language

At the heart of Xtext there is the grammar language. The grammar language is defined in itself, of course. The grammar can be found here [[1]].

It is a carefully designed DSL for description of textual languages, based on ANTLR's LL(*) algorithm. Mainly one describes the concrete syntax and how to construct an in-memory model (semantic model) from that.

First an example

To get an idea of how it works we'll start by implementing a [simple example] introduced by from Martin Fowler. It's mainly about describing state machines used as the (un)lock mechanism of secret compartments.

One of those state machines could look like this:

 events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
 end
 
 resetEvents
  doorOpened
 end
 
 commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
 end
 
 state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
 end
 
 state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
 end
 
 state waitingForLight
  lightOn => unlockedPanel
 end
 
 state waitingForDraw
  drawOpened => unlockedPanel
 end
 
 state unlockedPanel
  actions {unlockPanel lockDoor}
  panelClosed => idle
 end

So we have a bunch of declared events, commands and states. Within states there are references to declared actions, which should be executed when entering such a state. Also there are transitions consisting of a reference to an event and a state. Please read Martin's description is it is not clear enough.

In order to implement this language with Xtext you need to write the following grammer:

 language SecretCompartments
 
 generate secretcompartment "http://www.eclipse.org/secretcompartment"
 
 Statemachine :
  'events'
     events+=Event+
  'end'
  ('resetEvents'
     resetEvents+=[Event]+
  'end')?
  'commands'
     commands+=Command+
  'end'
  states+=State+;
 
 Event :
  name=ID code=ID;
 
 Command :
  name=ID code=ID;
 
 State :
  'state' name=ID
     ('actions' '{' actions+=[Command]+ '}')?
     transitions+=Transition*
  'end';
 
 Transition :
  event=[Event] '=>' state=[State];

In the following the different concepts of the grammar language are explained. We refer to this grammar when useful.

Language Declaration

The first line

language SecretCompartments

declares the name of the language. Xtext leverages Java's classpath mechanism. this means that the name can be any valid Java qualifier. The file name needs to correspond and have the file extension '*.xtext'. So it needs to be "SecretCompartments.xtext" and must be placed in the default package on the Java's class path.

If you want to place it within a package (e.g. 'foo/SecretCompartment.xtext') the first line must read:

language foo.SecretCompartment

The first line can also be used to declare a super language to inherit from. This mechanism is described here.

Rules

The parsing is based on ANTLR 3, which is a parser generator framework based on an [LL(*) algorithm]. Basically parsing can be separated in the following phases.

  1. lexing
  2. parsing
  3. model construction
  4. linking
  5. validation

Lexer Rules

In the first phase a sequence of characters (the text input) is transformed into a sequence of so called tokens. Each token consists of one or more characters and was matched by a particular lexer rule. In the secret compartments example there are no explicitly defined lexer rules, since it uses built-in lexer rules, only (the ID rule).

That rule is defined in the built-in super language (see #Language Inheritance) as follows:

lexer ID : 
  "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*"; 

It says that a Token ID starts with a letter ('a'..'z'|'A'..'Z') or underscore ('_') followed by any number of letters, underscores and numbers ('0'..'9'). Note that this declaration is is a black box where you use Antlr syntax directly. Please ignore the optional '^' for the moment.

This is the formal definition of lexer rules:

LexerRule :
  'lexer' name=ID ('returns' type=TypeRef)? ':' body=STRING ';'
;

Return types

A lexer rule returns a value, which defaults to a string (type ecore::EString). However, if you want to have a different type you can specify it. For instance, the built-in lexer rule 'INT' is defined like so:

lexer INT returns ecore::EInt : 
  "('0'..'9')+";

This says, that the lexer rule INT returns instances of ecore::EInt. It is possible to define any kind of data type here, which just need to be an instance of ecore::EDataType. In order to tell the parser how to convert the lexed string to a value of the declared data type, you need to provide your own implementation of 'IValueConverterService'.

Have a look at [org/eclipse/xtext/builtin/conversion/XtextBuiltInConverters.java] to find out how such an implementation looks like.

The implementation needs to be registered as a service (see Service Framework).

Enum Rules

String Rules

Parser Rules

Model Construction

Meta Models

Meta-Model Inference

Importing existing Meta Models

Language Inheritance

Xtext support language inheritance. By default (implicitly) each language extends a language called *org.eclipse.xtext.builtin.XtextBuiltin* and is defined as follows:

 abstract language org.eclipse.xtext.builtin.XtextBuiltIn_Temp
 
 import "http://www.eclipse.org/emf/2002/Ecore" as ecore;
 
 
 lexer ID : 
   "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*";
 
 lexer INT returns ecore::EInt : 
   "('0'..'9')+";
 
 lexer STRING : 
   "
   '\"' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\"') )* '\"' | 
   '\\'' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\\'') )* '\\''
   ";
 
 lexer ML_COMMENT : 
   "'/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}";
 
 lexer SL_COMMENT : 
   "'//' ~('\\n'|'\\r')* '\\r'? '\\n' {$channel=HIDDEN;}";
 
 lexer WS : 
   "(' '|'\\t'|'\\r'|'\\n')+ {$channel=HIDDEN;}";
 
 lexer ANY_OTHER : 
   ".";

Just ignore the grammar if you don't yet understand it. It basically provides some commonly used lexer rules which can be used in all grammars.


Service Framework

Runtime Architecture

Value Converters

Linking

UI Architecture

Back to the top