Revision as of 10:56, 22 September 2008

What is Xtext?

The TMF Xtext project provides a domain-specific language (the grammar language) for description of textual programming languages and domain-specific languages. It is tightly integrated with the Eclipse Modeling Framework (EMF) and leverages the Eclipse Platform in order to provide language-specific tool support. In contrast to common parser generators the grammar language is much simpler but is used to derive much more than just a parser and lexer. From a grammar the following is derived:

incremental, Antlr3-based parser and lexer
Ecore-based meta models (optional)
a serializer, used to serialize instances of such meta models back to a parseable textual representation
an implementation of the EMF Resource interface (based on the parser and the serializer)
a full-fledged integration of the language into Eclipse IDE
- syntax coloring
- navigation (F3, etc.)
- code completion
- outline views
- code templates
- folding, etc.

The generated artifacts are wired up through a dependency injection framework, which makes it easy to exchange certain functionality in a non-invasive manner. For example if you don't like the default code assistant implementation, you need to come up with an alternative implementation of the corresponding service and configure it via eclipse extension point.

The Grammar Language

At the heart of Xtext there is the grammar language. The grammar language is defined in itself, of course. The grammar can be found here [[1]].

The grammar leanguage is a DSL carefully designed for description of textual languages, based on ANTLR's LL(*) algorithm. The main idea is to let users describe the concrete syntax, and to automatically derive an in-memory model (semantic model) from that.

First an example

To get an idea of how it works we'll start by implementing a [simple example] introduced by from Martin Fowler. It's mainly about describing state machines used as the (un)lock mechanism of secret compartments.

One of those state machines could look like this:

 events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
 end
 
 resetEvents
  doorOpened
 end
 
 commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
 end
 
 state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
 end
 
 state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
 end
 
 state waitingForLight
  lightOn => unlockedPanel
 end
 
 state waitingForDraw
  drawOpened => unlockedPanel
 end
 
 state unlockedPanel
  actions {unlockPanel lockDoor}
  panelClosed => idle
 end

So we have a bunch of declared events, commands and states. Within states there are references to declared actions, which should be executed when entering such a state. Also there are transitions consisting of a reference to an event and a state. Please read Martin's description is it is not clear enough.

In order to implement this language with Xtext you need to write the following grammer:

 language SecretCompartments
 
 generate secretcompartment "http://www.eclipse.org/secretcompartment"
 
 Statemachine :
  'events'
     events+=Event+
  'end'
  ('resetEvents'
     resetEvents+=[Event]+
  'end')?
  'commands'
     commands+=Command+
  'end'
  states+=State+;
 
 Event :
  name=ID code=ID;
 
 Command :
  name=ID code=ID;
 
 State :
  'state' name=ID
     ('actions' '{' actions+=[Command]+ '}')?
     transitions+=Transition*
  'end';
 
 Transition :
  event=[Event] '=>' state=[State];

In the following the different concepts of the grammar language are explained. We refer to this grammar when useful.

Language Declaration

The first line

language SecretCompartments

declares the name of the language. Xtext leverages Java's classpath mechanism. this means that the name can be any valid Java qualifier. The file name needs to correspond and have the file extension '*.xtext'. So it needs to be "SecretCompartments.xtext" and must be placed in the default package on the Java's class path.

If you want to place it within a package (e.g. 'foo/SecretCompartment.xtext') the first line must read:

language foo.SecretCompartment

The first line can also be used to declare a super language to inherit from. This mechanism is described here.

EPackage declarations

Xtext parsers instantiate Ecore models (aka meta model). An Ecore model basically consists of an EPackage containing EClasses, EDatatypes and EEnums. Xtext can infer Ecore models from a grammar (see #Metamodel Inference) but it is also possible to instantiate existing Ecore models. You can even mix this and use multiple existing ecore models and infer some others from one grammar.

EPackage generation

The easiest way to get started is to let Xtext infer the meta model from your grammar. This is what is done in the secret compartment example. To do so just state:


generate secretcompartment "http://www.eclipse.org/secretcompartment"

Which says: generate an EPackage with name secretcompartment and nsURI "http://www.eclipse.org/secretcompartment" (these are the properties needed to create an EPackage).

EPackage import

If you already have create such an EPackage somehow, you could import it:


import "http://www.eclipse.org/secretcompartment"

Using multiple packages

If you want to use multiple EPackages you need to specify aliases like so:


generate secretcompartment "http://www.eclipse.org/secretcompartment"
import "http://www.eclipse.org/anotherPackage" as another

When referring to a type somewhere in the grammar you need to qualify them using that alias (example "another::CoolType"). We'll see later where such type references occur.

Rules

The parsing is based on ANTLR 3, which is a parser generator framework based on an [LL(*) algorithm]. Basically parsing can be separated in the following phases.

lexing
parsing
model construction
linking
validation

Lexer Rules

In the first phase a sequence of characters (the text input) is transformed into a sequence of so called tokens. Each token consists of one or more characters and was matched by a particular lexer rule. In the secret compartments example there are no explicitly defined lexer rules, since it uses built-in lexer rules, only (the ID rule).

That rule is defined in the built-in super language (see #Language Inheritance) as follows:


lexer ID : 
  "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*";

It says that a Token ID starts with an optional '^' character followed by a letter ('a'..'z'|'A'..'Z') or underscore ('_') followed by any number of letters, underscores and numbers ('0'..'9'). Note that this declaration is is a black box where you use Antlr syntax directly. Please ignore the optional '^' for the moment.

This is the formal definition of lexer rules:


LexerRule :
  'lexer' name=ID ('returns' type=TypeRef)? ':' body=STRING ';'
;

Return types

A lexer rule returns a value, which defaults to a string (type ecore::EString). However, if you want to have a different type you can specify it. For instance, the built-in lexer rule 'INT' is defined like so:


lexer INT returns ecore::EInt : 
  "('0'..'9')+";

This says, that the lexer rule INT returns instances of ecore::EInt. It is possible to define any kind of data type here, which just need to be an instance of ecore::EDataType. In order to tell the parser how to convert the lexed string to a value of the declared data type, you need to provide your own implementation of 'IValueConverterService'.

Have a look at [org/eclipse/xtext/builtin/conversion/XtextBuiltInConverters.java] to find out how such an implementation looks like.

The implementation needs to be registered as a service (see #Service Framework).

Enum Rules

TODO

String Rules

TODO

Parser Rules

The parser reads in a sequence of tokens produced by the lexer and walks through the parser rules.

Model Construction

Meta-Models

The meta-model of a textual language describes the structure of its abstract syntax trees (AST).

Xtext uses Ecore EPackages to define meta models. Meta models are declared to be either inferred from the grammar #EPackage _generation or imported #EPackage _import. A meta model's declaration can also define an alias name, that is used in other places of the grammar to qualify references to its types. Each language inherits the aliases from its superlanguage, but it can also override these. The default meta model does not have an alias in its declaration. It will contain all types that are referred to without qualifier.

language MyLang

generate MyMetaModel "http://www.mysite.org/myMetaModel" // default meta model
import "http://www.eclipse.org/emf/2002/Ecore" as ecore  // imported meta model with alias

RuleA returns MyType:         // reference to default meta model
  'mytype' name=ID; 
   
RuleB returns ecore::EObject: // reference to imported EPackage
  name=ID;

Meta-Model Inference

By using the generate directive (see #EPackage _generation), Xtext derives the metamodel from the grammar. The algorithm walks through the grammar and creates

an EPackage: for each generated package declaration. The name of the EPackage will be set to the first parameter, its nsURI to the second parameter. An optional alias allows to distinguish generated EPackages later. Only one generated package declaration without an alias is allowed per grammar.
an EClass: for each return type of a parser rule. If a parser rule does not define a return type, an implicit one with the same name is assumed.; for each type defined in an action or a cross-reference.
an EDatatype: for each return type of a lexer rule.
an EAttribute in each current return type: of type EBoolean for each feature assignment using the '?=' operator. No further EReferences or EAttributes will be generated from this assignment.; for each assignment with the '=' or '+=' operator calling a lexer rule. Its type will be the return type of the called rule.
an EReference in each current return type: for each action. The reference's type will be set to the return type of the current calling rule.; for each assignment with the '=' or '+=' operator in a parser rule calling a parser rule. The EReferences type will be the return type of the called parser rule.

Each EAttribute or EReference takes its name from the assignment/action that caused it. Multiplicities will be 0...1 for assignments with the '=' operator and 0...* for assignments with the '+=' operator.

All EClasses and EDatatypes are added to the EPackage referred to by the alias provided in the type reference they were created from.

All elements can occur multiple times, but are generated only once. If the generated elements clash, e.g. different types for an EAttribute or EReference, cycles in the type hierarchy an error is reported.

While walking through the grammar, the algorithm keeps track of a set of the currently possible return types to add features to.

Entering a parser rule the set contains only the return type of the rule.
Entering a group in an alternative the set is reset to the same state it was in when entering the first group of this alternative.
Leaving an alternative the set contains the union of all types at the end of each of its groups.
After a mandatory (non-optional) rule call or mandatory action the set contains only the return type of the called rule or action.
An optional rule call or optional action does not modify the set.
A rule call or an action is optional, if its cardinality is '?' or '+'.

If a type is added to the currently possible return types, the return type of the current parser rule is added to the supertypes of the new return type. You can specify additional common supertypes by means of artificial parser rules, that are never called, e.g.

CommonSuperType:
  SubTypeA | SubTypeB | SubTypeC;

As a last step, we examine all generated EClasses and lift up features to supertypes if they are defined by all subtypes.

Importing existing Meta Models

Language Inheritance

Xtext support language inheritance. By default (implicitly) each language extends a language called *org.eclipse.xtext.builtin.XtextBuiltin* and is defined as follows:

 abstract language org.eclipse.xtext.builtin.XtextBuiltIn_Temp
 
 import "http://www.eclipse.org/emf/2002/Ecore" as ecore;
 
 
 lexer ID : 
   "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*";
 
 lexer INT returns ecore::EInt : 
   "('0'..'9')+";
 
 lexer STRING : 
   "
   '\"' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\"') )* '\"' | 
   '\\'' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\\'') )* '\\''
   ";
 
 lexer ML_COMMENT : 
   "'/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}";
 
 lexer SL_COMMENT : 
   "'//' ~('\\n'|'\\r')* '\\r'? '\\n' {$channel=HIDDEN;}";
 
 lexer WS : 
   "(' '|'\\t'|'\\r'|'\\n')+ {$channel=HIDDEN;}";
 
 lexer ANY_OTHER : 
   ".";

Just ignore the grammar if you don't yet understand it. It basically provides some commonly used lexer rules which can be used in all grammars.

@@ Line 238: / Line 238: @@
 :for each generated package declaration. The name of the EPackage will be set to the first parameter, its nsURI to the second parameter. An optional alias allows to distinguish generated EPackages later. Only one generated package declaration without an alias is allowed per grammar.
 ;an EClass
-:for each return type of a parser rule.
+:for each return type of a parser rule. If a parser rule does not define a return type, an implicit one with the same name is assumed.
 :for each type defined in an action or a cross-reference.
 ;an EDatatype
 :for each return type of a lexer rule.
-;an EAttribute in the current return type
+;an EAttribute in each current return type
 :of type EBoolean for each feature assignment using the '?=' operator. No further EReferences or EAttributes will be generated from this assignment.
-:for each assignment with the '=' or '+=' operator calling a lexer rule. Its type will be the return type
+:for each assignment with the '=' or '+=' operator calling a lexer rule. Its type will be the return type of the called rule.
-;an EReference in the current return type
+;an EReference in each current return type
-:for each action. The reference's type will be set to the current return type of the rule.
+:for each action. The reference's type will be set to the return type of the current calling rule.
 :for each assignment with the '=' or '+=' operator in a parser rule calling a parser rule. The EReferences type will be the return type of the called parser rule.
 Each EAttribute or EReference takes its name from the assignment/action that caused it. Multiplicities will be 0...1 for assignments with the '=' operator and 0...* for assignments with the '+=' operator.
-All EClasses and EDatatypes are added to the EPackage referred to by the alias provided in the type reference.
+All EClasses and EDatatypes are added to the EPackage referred to by the alias provided in the type reference they were created from.
-When walking through the grammar, the algorithm keeps track of the current return type. Each time a parser rule is entered, the current return type is set to the return type of the rule. If the parser rule does not define a return type, an implicit one with the same name as the rule is assumed. Having that, we iterate through all elements of the rule. On each rule call or action, the current return type is set to the return type of the called rule or type of the action. For optional rule calls, the current type is reset as soon as the option ends. Rule calls are optional, if they are contained within a group whose cardinality is '?' or '+'.
+All elements can occur multiple times, but are generated only once. If the generated elements clash, e.g. different types for an EAttribute or EReference, cycles in the type hierarchy an error is reported.
-If the current return type is changed, the old return type is added to the supertypes of the new return type.
+While walking through the grammar, the algorithm keeps track of a set of the currently possible return types to add features to.
+* Entering a parser rule the set contains only the return type of the rule.
+* Entering a group in an alternative the set is reset to the same state it was in when entering the first group of this alternative.
+* Leaving an alternative the set contains the union of all types at the end of each of its groups.
+* After a mandatory (non-optional) rule call or mandatory action the set contains only the return type of the called rule or action.
+* An optional rule call or optional action does not modify the set.
+* A rule call or an action is optional, if its cardinality is '?' or '+'.
+If a type is added to the currently possible return types, the return type of the current parser rule is added to the supertypes of the new return type. You can specify additional common supertypes by means of artificial parser rules, that are never called, e.g.
+ CommonSuperType:
+   SubTypeA | SubTypeB | SubTypeC;
 As a last step, we examine all generated EClasses and lift up features to supertypes if they are defined by all subtypes.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "Xtext/Documentation"

Revision as of 10:56, 22 September 2008

Contents

What is Xtext?

The Grammar Language

First an example

Language Declaration

EPackage declarations

EPackage generation

EPackage import

Using multiple packages

Rules

Lexer Rules

Return types

Enum Rules

String Rules

Parser Rules

Model Construction

Meta-Models

Meta-Model Inference

Importing existing Meta Models

Language Inheritance

Service Framework

Runtime Architecture

Value Converters

Linking

UI Architecture

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "Xtext/Documentation"

Revision as of 10:56, 22 September 2008

Contents

What is Xtext?

The Grammar Language

First an example

Language Declaration

EPackage declarations

EPackage generation

EPackage import

Using multiple packages

Rules

Lexer Rules

Return types

Enum Rules

String Rules

Parser Rules

Model Construction

Meta-Models

Meta-Model Inference

Importing existing Meta Models

Language Inheritance

Service Framework

Runtime Architecture

Value Converters

Linking

UI Architecture