Difference between revisions of "Xtext/Documentation"

From Eclipsepedia

Jump to: navigation, search
(ProposalProvider)
Line 153: Line 153:
  
 
<code lang="text">
 
<code lang="text">
  import "classpath:/foo/bar/MyEcore.dsl"
+
  import "classpath:/foo/bar/MyEcore.ecore"
 
</code>
 
</code>
  

Revision as of 02:08, 26 March 2009

Contents

What is Xtext?

The TMF Xtext project provides a domain-specific language (the grammar language) for description of textual programming languages and domain-specific languages. It is tightly integrated with the Eclipse Modeling Framework (EMF) and leverages the Eclipse Platform in order to provide language-specific tool support. In contrast to common parser (syntactic analyzer) generators of the grammar language is much simpler but is used to derive much more than just a parser and lexer (lexical analyzer). From a grammar the following is derived:

  • incremental, Antlr3-based parser and lexer
  • Ecore-based meta models (optional)
  • a serializer, used to serialize instances of such meta models back to a parseable textual representation
  • an implementation of the EMF Resource interface (based on the parser and the serializer)
  • a full-fledged integration of the language into Eclipse IDE
    • syntax coloring
    • navigation (F3, etc.)
    • code completion
    • outline views
    • code templates

The generated artifacts are wired up through the google guice dependency injection framework, which makes it easy to exchange certain functionality in a non-invasive manner. For example if you don't like the default code assistant implementation, you need to come up with an alternative implementation of the corresponding service and configure it via dependency injection.

The Grammar Language

The grammar language is the corner stone of Xtext and is defined in itself - of course.

It is a DSL carefully designed for description of textual languages, based on LL(*)-Parsing that is like Antlr3's parsing strategy and supported by packrat parsers. The main idea is to let users describe the concrete syntax, and to automatically derive an in-memory model (semantic model) from that.

First an example

To get an idea of how it works we'll start by implementing a simple example introduced by Martin Fowler. It's mainly about describing state machines used as the (un)lock mechanism of secret compartments. People who have secret compartments control their access in a very old-school way, e.g. by opening the door first and turning on the light afterwards. Then the secret compartment, for instance a panel, opens up.

One of those state machines could look like this:

 events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
 end
 
 resetEvents
  doorOpened
 end
 
 commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
 end
 
 state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
 end
 
 state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
 end
 
 state waitingForLight
  lightOn => unlockedPanel
 end
 
 state waitingForDraw
  drawOpened => unlockedPanel
 end
 
 state unlockedPanel
  actions {unlockPanel lockDoor}
  panelClosed => idle
 end

So, we have a bunch of declared events, commands and states. Within states there are references to declared actions, which should be executed when entering such a state. Also there are transitions consisting of a reference to an event and a state. Please read Martin's description if it is not clear enough.

In order to implement this language with Xtext you need to write the following grammar:

 grammar SecretCompartments with org.eclipse.xtext.common.Terminals
 
 generate secretcompartment "http://www.eclipse.org/secretcompartment"
 
 Statemachine :
  'events'
     (events+=Event)+
  'end'
  ('resetEvents'
     (resetEvents+=[Event])+
  'end')?
  'commands'
     (commands+=Command)+
  'end'
  (states+=State)+;
 
 Event :
  name=ID code=ID;
 
 Command :
  name=ID code=ID;
 
 State :
  'state' name=ID
     ('actions' '{' (actions+=[Command])+ '}')?
     (transitions+=Transition)*
  'end';
 
 Transition :
  event=[Event] '=>' state=[State];

In the following the different concepts of the grammar language are explained. We refer to this grammar when useful.

Language Declaration

The first line

grammar SecretCompartments with org.eclipse.xtext.common.Terminals

declares the name of the grammar. Xtext leverages Java's classpath mechanism. This means that the name can be any valid Java qualifier. The file name needs to correspond and have the file extension xtext. This means that the name needs to be SecretCompartments.xtext and must be placed in the default package on the Java's class path.

If you want to place it within a package (e.g. foo/SecretCompartment.xtext) the first line must read:

grammar foo.SecretCompartment with ...

The first line is also used to declare any used language (for mechanism details cf. Language Inheritance).

EPackage declarations

Xtext parsers instantiate Ecore models (aka meta model). An Ecore model basically consists of an EPackage containing EClasses, EDatatypes and EEnums. Xtext can infer Ecore models from a grammar (cf. Meta-Model Inference) but it is also possible to instantiate existing Ecore models. You can even mix this, use multiple existing ecore models and infer some others from one grammar.

EPackage generation

The easiest way to get started is to let Xtext infer the meta model from your grammar. This is what is done in the secret compartment example. To do so just state:

generate secretcompartment "http://www.eclipse.org/secretcompartment"

Which says: generate an EPackage with name secretcompartment and nsURI "http://www.eclipse.org/secretcompartment" (these are the properties needed to create an EPackage). See Meta-Model Inference for details.

EPackage import

If you already have created such an EPackage somehow, you could import it using either the name space URI or a resource URI (URIs are an EMF concept):

import "http://www.eclipse.org/secretcompartment"

Note that if you use, a namespace URI, the corresponding EPackage needs to be installed into the workbench, so that the editor can find it. At runtime (i.e. when starting the generator) you need to make sure that the corresponding EPackage is registered in EPackage.Registry.INSTANCE.

Xtext provides a new resource URI scheme, which is backed by the Java classpath. If you want to refer to an ecore file MyEcore.ecore, laying in a a package foo.bar, you could write

import "classpath:/foo/bar/MyEcore.ecore"

Using the classpath scheme is considered the preferred way.

Using multiple packages / meta model aliases

If you want to use multiple EPackages you need to specify aliases like so:

generate secretcompartment "http://www.eclipse.org/secretcompartment"
import "http://www.eclipse.org/anotherPackage" as another

When referring to a type somewhere in the grammar you need to qualify them using that alias (example another::CoolType). We'll see later where such type references occur.

It is also supported to put multiple EPackage imports into one alias. This is no problem as long as there are no two EClassifiers with the same name. In such cases none of them are referable. It is even possible to have multiple imports and one generate declared for the same alias. If you do so, for a reference to an EClassifier first the imported EPackages are scanned before it is assumed that a type needs to by generated into the to-be-generated package.

Example:

generate toBeGenerated "http://www.eclipse.org/toBeGenerated"
import "http://www.eclipse.org/packContainingClassA"
import "http://www.eclipse.org/packContainingClassB"

With the declaration from above

  1. a reference to type ClassA would be linked to the EClass contained in "http://www.eclipse.org/packContainingClassA",
  2. a reference to type ClassB would be linked to the EClass contained in "http://www.eclipse.org/packContainingClassB",
  3. a reference to type NotYetDefined would be linked to a newly created EClass in "http://www.eclipse.org/toBeGenerated"

Note, that using this feature is not recommended, because it might cause problems, which are hard to tackle down. For instance, a reference to classA would as well be linked to a newly created EClass, because the corresponding type in "http://www.eclipse.org/packContainingClassA" is spelled with a capital letter.

Rules

The default parsing is based on a homegrown packrat parser. It can be substituted by an Anltr parser through the Xtext service mechanism. Antlr is a sophisticated parser generator framework based on an LL(*) parsing algorithm, that works quite well for Xtext. At the moment it is advised to download the plugin de.itemis.xtext.antlr (from update site [[1]]) and use the Antlr Parser instead of the packrat parser (cf. Xtext Workspace Setup).

Basically parsing can be separated in the following phases.

  1. lexing
  2. parsing
  3. linking
  4. validation

Terminal Rules

In the first phase, i.e. lexing, a sequence of characters (the text input) is transformed into a sequence of so called tokens. Each token consists of one or more characters and was matched by a particular terminal rule and represents a atomic symbol. In the secret compartments example there are no explicitly defined terminal rules, since it ly uses the ID rule which is inherited from the grammar org.eclipse.xtext.common.Terminals (cf. Language Inheritance). Terminal rules are also refered to as token rules or lexer rules. There is an informal naming convention that terminal-rule names are all uppercase.

Therein the ID rule is defined as follows:

terminal ID : 
  ('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*; 

It says that a Token ID starts with an optional '^' character, which is called caret, followed by a letter ('a'..'z'|'A'..'Z') or underscore ('_') followed by any number of letters, underscores and numbers ('0'..'9').

The caret is used to escape an identifier for cases where there are conflicts with keywords. It is removed during parsing.

This is the formal definition of terminal rules:

TerminalRule :
  'terminal' name=ID ('returns' type=TypeRef)? ':' 
     alternatives=TerminalAlternatives ';'
;

Return types

A terminal rule returns a value, which is a string (type ecore::EString) by default. However, if you want to have a different type you can specify it. For instance, the rule 'INT' is defined as:

terminal INT returns ecore::EInt : 
  ('0'..'9')+;

This means that the terminal rule INT returns instances of ecore::EInt. It is possible to define any kind of data type here, which just needs to be an instance of ecore::EDataType. In order to tell the parser how to convert the parsed string to a value of the declared data type, you need to provide your own implementation of 'IValueConverterService' (cf. value converters).

The implementation needs to be registered as a service (cf. Service Framework). This is also the point where you can remove things like quotes form string literals or the caret ('^') from identifiers.

Extended Backus-Naur form expressions

Token rules are described using Extended Backus-Naur Form (EBNF) expressions. The different expressions are described in the following. The one thing all of these expressions have in common is the quantifier operator. There are four different quantities

  1. exactly one (the default no operator)
  2. one or none (operator "?")
  3. any (operator "*")
  4. one or more (operator "+")

Keywords / Characters

Keywords are a kind of token rule literals. The ID rule in org.eclipse.xtext.common.Terminals for instance starts with a keyword :

terminal ID : '^'? .... ;

The question mark sets the cardinality to "none or one" (i.e. optional) like explained above.

Note that a keyword can have any length and contain arbitrary characters.

Character Ranges

A character range can be declared using the '..' operator. Example:

terminal INT returns ecore::EInt: ('0'..'9')+ ;

In this case an INT is comprised of one or more (note the '+' operator) characters between (and including) '0' and '9'.

Wildcard

If you want to allow any character you can simple write a dot: Example:

FOO : 'f' . 'o';

The rule above would allow expressions like 'foo', 'f0o' or even 'f\no'.

Until Token

With the until token it is possible to state that everything should be consumed until a certain token occurs. The multi line comment is implemented using it:

 terminal ML_COMMENT	: '/*' -> '*/' ;

This is the rule for Java-style comments that begin with '/*' and end with '*/'.

Negated Token

All the tokens explained above can be inverted using a preceding explanation mark:

 terminal ML_COMMENT	: '/*' (!'*/')+ ;

Rule Calls

Rules can refer to other rules. This is done by writing the name of the rule to be called. We refer to this as rule calls. Rule calls in terminal rules can only point to token rules.

Example:

terminal QUALIFIED_NAME : ID ('.' ID)*;

Alternatives

Using alternatives one can state multiple different alternatives. For instance, the whitespace rule uses alternatives like so:

terminal WS : (' '|'\t'|'\r'|'\n')+ ;

That is a WS can be made of one or more whitespace characters (including ' ','\t','\r','\n')

Groups

Finally, if you put tokens one after another, the whole sequence is referred to as a group. Example:

terminal FOO : '0x' ('0'..'7') ('0'..'9'|'A'..'F') ;

That is the 4-digit hexadecimal code of ascii characters.

Datatype rules

Datatype rules are parsing-phase rules, which like token rules create instances of EDatatype. The nice thing about Datatype rules is that they are actually parser rules and are therefore

  1. context sensitive and
  2. allow for use of hidden tokens

If you, for instance, want to define a rule to consume Java-like qualified names (e.g. "foo.bar.Baz") you could write:

QualifiedName :
  ID ('.' ID)*;

Which looks similar to the terminal rule we've defined above in order to explain rule calls. However, the difference is that because it is a parser rule and therefore only valid in certain contexts, it won't conflict with ID. If you had defined it as a terminal rule, it would overlay the ID rule.

In addition having this defined as a datatype rule, it is allowed to use hidden tokens (e.g. "/* comment */") between the IDs and dots (e.g. "foo/* comment */. bar . Baz")

Return types can be specified like in token rules:

QualifiedName returns ecore::EString : ID ('.' ID)*;

Note that if a rule does not call another parser rule and does not contain any actions nor assignments (see parser rules), it is considered a datatype rule and the datatype EString is implied if not explicitly declared differently.

For conversion again value converters are responsible (cf. value converters).

Enum Rules

Enum rules return enumeration literals from strings. They can be seen as a shortcut for datatype rules with specific value converters. The main advantage of enum rules is their simplicity, typesafety and therefore nice validation. Furthermore it is possible to infere enums and their respective literals during the metamodel transformation.

If you want to define a ChangeKind [org.eclipse.emf.ecore.change/model/Change.ecore] with 'ADD', 'MOVE' and 'REMOVE' you could write:

enum ChangeKind :
  ADD | MOVE | REMOVE;

It is even possible to use alternative literals for your enums or reference an enum value twice:

enum ChangeKind :
  ADD = 'add' | ADD = '+' | 
  MOVE = 'move' | MOVE = '->' | 
  REMOVE = 'remove' | REMOVE = '-';

Parser Rules

The parser reads in a sequence of terminals and walks through the parser rules. That's why a parser rule - contrary to a terminal rule - does not produce a single terminal token but a tree of non-terminal and terminal tokens that lead to a so called parse tree (in Xtext it is a.k.a node model). Furthermore, parser rules are handled as kind of a building plan to create EObjects that form the semantic model (the AST). The different constructs like actions and assignments are used to derive types and initialize the semantic objects accordingly.

Extended Backus-Naur Form expressions

In parser rules (as well as in datatype rules) not all the expressions available for terminal rules can be used. Character ranges, wildcards, the until token and the negation are currently only available for terminal rules. Available in parser rules as well as in terminal rules are

  1. groups,
  2. alternatives,
  3. keywords and
  4. rule calls.

In addition to these elements, there are some expressions used to direct how the AST is constructed, which are listed and explained in the following.

Assignments

Assignments are used to assign parsed information to a feature of the current EClass. The current EClass is specified by the return type resp. if not explicitely stated it is implied that the type's name equals the rule's name.

Example:

State :
 'state' name=ID
    ('actions' '{' (actions+=[Command])+ '}')?
    (transitions+=Transition)*
 'end';

The syntactic declaration for states in the state machine example starts with a keyword 'state' followed by an assignment :

name=ID

Where the left hand side refers to a feature of the current EClass (assumed to be EClass 'State'). The right hand side can be a rule call, a keyword, a cross reference (explained later) or even an alternative comprised by the former. The type of the feature needs to be compatible to the type of the expression on the right. As ID returns an EString the feature name needs to be of type EString as well.

Assignment Operators

There are three different assignment operators, each with different semantics

  1. the simple equal sign "=" is the straight forward assignment, and used for features which take only one element
  2. the "+=" sign (the add operator) awaits a multiple feature and adds the value on the right hand to that feature, which is, of course, a list feature
  3. the "?=" sign (boolean operator) awaits a feature if type EBoolean and sets it to true if the right hand side was consumed (no matter with what values)

Cross References

A unique feature of Xtext is the ability to declare cross links in the grammar. In traditional compiler construction the cross links are not established during parsing but in a later linking phase. This is the same in Xtext, but we allow to specify cross link information in the grammar, which is used during the linking phase. The syntax for cross links is:

CrossReference :
  '[' type=TypeRef ('|' ^terminal=CrossReferenceableTerminal )? ']'
;

For example, the transition is made up of two cross references, pointing to an event and a state:

Transition :
 event=[Event] '=>' state=[State];

It is important to understand that the text between the square brackets does not refer to another rule, but to the type! This is sometimes confusing, because one usually uses the same name for the rules and the types. That is if we had named the type for events differently like in the following the cross references needs to be adapted as well:

Transition :
 event=[MyEvent] '=>' state=[State];

Event returns MyEvent : ....;

Looking at the syntax definition of cross references, there is an optional part starting with a vertical bar followed by 'CrossReferenceableTerminal'. This is the part describing the concrete text, from which the crosslink later should be established. By default (that's why it's optional) it is "|ID".

Have a look at the linking section in order to understand how linking is done.

Actions

By default the object to be returned by parser rule is created lazily on the first assignment. Then the type of the EObject to be created is determined from the specified return type (or the rule name if not explicit return type is specified). With Actions however, creation of EObject can be made explicit. We have two kinds of Actions:

If at some point you want to enforce creation of a specific type you can use simple actions. Example :

MyRule returns TypeA :
  "A" name=ID |
  {TypeB} "B" name=ID; 

In this example TypeB must be a subtype of TypeA. If an expression " A foo " is parsed an instance of TypeA is created as soon as the assignment is evaluated. An expression "B foo" would result in creation of TypeB.

Unassigned rule calls

We previously explained, that the EObject to be returned is created lazily when the first assignment occurs or when a simple action is evaluated. There is another way one can set the EObject to be returned, which we call an "unassigned rule call".

Unassigned rule calls (the name suggests it) are rule calls to other parser rules, which are not used within an assignment. If there is no feature the returned value shall be assigned to, the value is assigned to the "to-be-returned" reference.

With unassigned rule calls one can, for instance, create rules which just dispatch between several other rules:

AbstractToken :
   TokenA |
   TokenB |
   TokenC;

As AbstractToken could possibly return an instance of TokenA, TokenB or TokenC its type must by a super type to these types. It is now for instance as well possible to further change the state of the AST element by assigning additional things. Example:

AbstractToken :
  (TokenA |
   TokenB |
   TokenC ) (cardinality=('?'|'+'|'*'))?;

Thus, to state the cardinality is optional (last question mark) and can be represented by a question mark, a plus, or an asterisk.

Actions revisited

LL parsing has some significant advantages over LR algorithms. The most important ones for Xtext are, that the generated code is much simpler to understand and debug and that it is easier to recover from errors and especially Antlr has a very nice generic error recovery mechanism. This allows to have AST constructed even if there are syntactic errors in the text. You wouldn't get any of the nice IDE features as soon as there is one little error, if we hadn't error recovery.

However, LL also has some drawbacks. The most important is, that it does not allow left recursive grammars. For instance, the following is not allowed in LL based grammars, because "Expression '+' Expression" is left recursive:

Expression :
  Expression '+' Expression |
  '(' Expression ')'
  INT;

Instead one has to rewrite such things by "left-factoring" it:

 Expression :
   TerminalExpression ('+' TerminalExpression)?;

 TerminalExpression :
   '(' Expression ')' |
   INT 

In practice this is always the same pattern and therefore not problematic. However, by simply applying Xtext's AST construction we know so far like so ...

 Expression :
   {Operation} left=TerminalExpression (op='+' right=TerminalExpression)?;

 TerminalExpression returns Expression:
   '(' Expression ')' |
   {IntLiteral} value=INT;

... one would get unwanted elements in the resulting AST. For instance the expression " ( 42 ) " would result in a tree like this:

Operation {
 left=Operation {
  left=IntLiteral {
   value=42
  }
 }
}

Typically one would only want to have one instance of IntLiteral.

One can solve this problem using a combination of unassigned rule calls and actions:

 Expression :
   TerminalExpression ({Operation.left=current} op='+' right=TerminalExpression)?;

 TerminalExpression returns Expression:
   '(' Expression ')' |
   {IntLiteral} value=INT;

Hidden terminal symbols

Because parser rules describe not a single token, but a sequence of patterns in the input, it is necessary to define the interesting parts of the input. Xtext introduces the concept of hidden tokens to handle the "noise" like whitespaces, comments etc. in the input sequence gracefully. It is possible to define a set of terminal symbols, that are hidden from the parser rules and automatically skipped when they are recognized. Nevertheless, they are transparently woven into the node model, but not relevant for the semantic model.

Hidden terminals may (or may not) appear between any other terminals in any cardinality. They can be described per rule or for the whole grammar. The grammar org.eclipse.xtext.common.Terminals comes with a reasonable default and hides all comments and whitespace from the parser rules.

If a rule defines hidden symbols, you can think of a kind of scope that is automatically introduced. Any rule that is called from the declaring rule uses the same hidden terminals as the calling rule, unless it defines other hidden tokens itself.

Person hidden(WS, ML_COMMENT, SL_COMMENT): 
  name=fullname age=INT ';';
Fullname: 
  (firstname=ID)? lastname=ID;

The sample rule "Person" defines multiple-line comments (ML_COMMENT), single-line comments (SL_COMMENT), and whitespaces (WS) to be allowed between the fullname and the age. Because "Fullname" does not introduce another set of hidden terminals, it allows the same symbols to appear between 'firstname' and 'lastname' as the calling rule "person." Thus, the following input is perfectly valid for the given grammar snippet:

John /* comment */ Smith // line comment
  /* comment */
  42      ;

To see a list of all default terminals like WS look up Language Inheritance.

Meta-Model inference

The meta model of a textual language describes the structure of its abstract syntax trees (AST).

Xtext uses Ecore EPackages to define meta models. Meta models are declared to be either inferred (generated) from the grammar or imported. By using the 'generate' directive, one tells Xtext to derive an EPackage from the grammar. All elements/types can occur multiple times, but are generated only once.

Type and Package Generation

Xtext creates

an EPackage
  • for each generated package declaration. After the directive generate a list of parameters follows. The name of the EPackage will be set to the first parameter, its nsURI to the second parameter. An optional alias as the third parameter allows to distinguish generated EPackages later. Only one generated package declaration without an alias is allowed per grammar. The aliases have to be unique among every generated EPackage.
an EClass
  • for each return type of a parser rule. If a parser rule does not define a return type, an implicit one with the same name is assumed. You can specify more than one rule that return the same type but only one EClass will be generated.
  • for each type defined in an action or a cross-reference.
an EEnum
  • for each return type of an enum rule.
an EDatatype
  • for each return type of a terminal rule or a datatype rule.

All EClasses, EEnums and EDatatypes are added to the EPackage referred to by the alias provided in the type reference they were created from.

Feature and Type Hierarchy Generation

While walking through the grammar, the algorithm keeps track of a set of the currently possible return types to add features to.

  • Entering a parser rule the set contains only the return type of the rule.
  • Entering a group in an alternative the set is reset to the same state it was in when entering the first group of this alternative.
  • Leaving an alternative the set contains the union of all types at the end of each of its groups.
  • After an optional element, the set is reset to the same state it was before entering it.
  • After a mandatory (non-optional) rule call or mandatory action the set contains only the return type of the called rule or action.
  • An optional rule call does not modify the set.
  • A rule call is optional, if its cardinality is '?' or '*'.

While iterating the parser rules Xtext creates

an EAttribute in each current return type
  • of type EBoolean for each feature assignment using the '?=' operator. No further EReferences or EAttributes will be generated from this assignment.
  • for each assignment with the '=' or '+=' operator calling a terminal rule. Its type will be the return type of the called rule.
an EReference in each current return type
  • for each assignment with the '=' or '+=' operator in a parser rule calling a parser rule. The EReference type will be the return type of the called parser rule.
  • for each action. The reference's type will be set to the return type of the current calling rule.

Each EAttribute or EReference takes its name from the assignment/action that caused it. Multiplicities will be 0...1 for assignments with the '=' operator and 0...* for assignments with the '+=' operator.

Furthermore, each type that is added to the currently possible return types automatically inherits from the current return type of the parser rule. You can specify additional common supertypes by means of "artificial" parser rules, that are never called, e.g.

CommonSuperType:
  SubTypeA | SubTypeB | SubTypeC;

Enum Literal Generation

For each alternative defined in an enum rule, the transformer creates an enum literal, when another with the same name cannot be found. The 'literal' property of the generated enum literal is set to the right hand side of the declaration. If it is ommitted, you'll get an enum literal with equal 'name' and 'literal' attributes.

enum MyGeneratedEnum:
  NAME = 'literal' | EQUAL_NAME_AND_LITERAL;

Feature Normalization

As a last step, the generator examines all generated EClasses and lifts up similar features to supertypes if there is more than one subtype and the feature is defined in every subtypes. This does even work for multiple supertypes.

Error Conditions

The following conditions cause an error

  • An EAttribute or EReference has two different types or different cardinality.
  • There are an EAttribute and an EReference with the same name in the same EClass.
  • There is a cycle in the type hierarchy.
  • An new EAttribute, EReference or supertype is added to an imported type.
  • An EClass is added to an imported EPackage.
  • An undeclared alias is used.
  • An imported metamodel cannot be loaded.

Importing existing Meta Models

With the import directive in Xtext you can refer to existing Ecore metamodels and reuse the types that are declared in an EPackage. Xtext uses this technique by itself to leverage Ecore datatypes.

import "http://www.eclipse.org/emf/2002/Ecore" as ecore;

Specify an explicit return type to reuse such imported types. Note that this even works for lexer rules.

terminal INT returns ecore::EInt : ('0'..'9')+;

Language Inheritance

Xtext supports language inheritance. Grammars that are created via the Xtext wizard extend *org.eclipse.xtext.common.Terminals* by default.

 grammar org.xtext.example.MyDsl with org.eclipse.xtext.common.Terminals
 
 generate myDsl "http://www.xtext.org/example/MyDsl"
 
 ....

Inheriting from another grammar makes the rules defined in that grammar referable. It is also possible to overwrite rules from the super grammar. Example :

 grammar my.SuperGrammar
 ...
 RuleA : "a" stuff=RuleB;
 RuleB : "{" name=ID "}";
 grammar my.SubGrammar with my.SuperGrammar
 
 Model : (ruleAs+=RuleA)*;
 
 // overwrites my.SuperGrammar.RuleB
 RuleB : '[' name=ID ']';

Default tokens

Xtext is shipped with a default set of predefined, reasonable and often required terminal rules. This grammar is defined as follows:

 grammar org.eclipse.xtext.common.Terminals hidden(WS, ML_COMMENT, SL_COMMENT)
 
 import "http://www.eclipse.org/emf/2002/Ecore" as ecore
 
 terminal ID  		: '^'?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')* ;
 terminal INT returns ecore::EInt: ('0'..'9')+ ;
 terminal STRING	: 
			'"' ( '\\' ('b'|'t'|'n'|'f'|'r'|'"'|"'"|'\\') | !('\\'|'"') )* '"' |
			"'" ( '\\' ('b'|'t'|'n'|'f'|'r'|'"'|"'"|'\\') | !('\\'|"'") )* "'"
			; 
 terminal ML_COMMENT	: '/*' -> '*/' ;
 terminal SL_COMMENT 	: '//' !('\n'|'\r')* ('\r'? '\n')? ;
 
 terminal WS		: (' '|'\t'|'\r'|'\n')+ ;
 
 terminal ANY_OTHER:	. ;

Architectural Overview

TMF Xtext itself and every language infrastructure developed with TMF Xtext is configured and wired-up using dependency injection (DI). We use Google Guice as the underlying framework, and haven't built much on top of it as it pretty much does what we need. So instead of describing how google guice works, we refer to the website, where additional information can be found [2].

Using DI allows everyone to set up and change all components. This does not mean that everything which gets configured using DI (we use it a lot) is automatically public API. But we don't forbid use of non-public API, as we think you should decide, if you want to rely on stable API only or use things which might be changed (further enhanced ;-)) in future. See Xtext/Documentation/API.

Runtime setup (ISetup)

For each language there is an implementation of ISetup generated. It implements a method called 'doSetup()', which can be called do the global setup.

This class is intended to be used for runtime and unit testing, only. The setup method returns an Injector, which can further be used to obtain a parser, etc. The setup method also registers the ResourceFactory and generated EPackage with the respective global registries provided by EMF.

So basically you can just run the setup and start using EMF API to load and store models of your language.

Setup within Eclipse / Equinox

Within Eclipse we have a generated Activator, which creates a Guice injector using the modules. In addition an IExecutableExtensionFactory is generated for each language, which is used to create ExecutableExtensions. This means that everything which is created via extension points is managed by guice as well, i.e. you can declare dependencies and get them injected up on creation.

The only thing you have to do in order to use this factory is to prefix the class with the factory (guice.Aware) name followed by a colon.

 <extension
        point="org.eclipse.ui.editors">
     <editor
           class="guice.Aware:org.eclipse.xtext.ui.core.editor.XtextEditor"
           contributorClass="org.eclipse.ui.editors.text.TextEditorActionContributor"
           default="true"
           extensions="ecoredsl"
           id="org.eclipse.xtext.example.EcoreDsl"
           name="EcoreDsl Editor">
     </editor>
 </extension>

Modules

The Guice Injector configuration is done through the use of Modules (also a Guice concept). The generator provides two modules when first called, one for runtime ([MyLanguage]RuntimeModule) and one for UI ([MyLanguage]UIModule). These modules are initially empty and intended to be manually edited when needed. These are also the modules used directly by the setup methods. By default these modules extend a fully generated module.

Generated Modules

The fully generated modules (never touch them!) are called Abstract[MyLanguage]RuntimeModule and Abstract[MyLanguage]UiModule resp. They contain all components which have been generated specifically for the language at hand. Examples are: the generated parsers, serializer or for UI a proposal provider for content assist is generated. What goes into the modules depends on how your generator is configured.

Default Modules

Finally the fully generated modules extend the DefaultRuntimeModule (resp. DefaultUiModule), which contains all the default configuration. The default configuration consists of all components for which we have generic default implementations (interpreting as opposed to generated). Examples are all the components used in linking, the outline view, hyperlinking and navigation.

Changing Configuration

We use the primary modules ([MyLanguage]RuntimeModule and [MyLanguage]UiModule) in order to change the configuration. The class is initially empty and has been generated only to allow for arbitrary customization.

In order to provide a simple and convenient way, in TMF Xtext every module extends AbstractXtextModule. This class allows to write bindings like so:

public Class<? extends MyInterface> bind[anyname]() {
    return MyInterfaceImpl.class;
}


Such a method will be interpreted as a binding from MyInterface to MyInterfaceImpl. Note that you simply have to override a method from a super class (e.g. from the generated or default module) in order to change the respective binding. Although this is a convenient and simple way, you have of course also the full power of Guice, i.e. you can override the Guice method void bind(Binding) and do what every you want.

Runtime Architecture

Value Converters

Linking

The linking feature allows for specification of cross references within an Xtext grammar. The following things are needed for the linking:

  1. declaration of a cross link in the grammar (at least in the meta model)
  2. specification of linking semantics

Declaration of cross links

In the grammar a cross reference is specified using square brackets.

CrossReference :
  '[' ReferencedEClass ('|' terminal=AbstractTerminal)? ']'

Example:

ReferringType :
  'ref' referencedObject=[Entity|(ID|STRING)];

The meta model derivation would create an EClass 'ReferringType' with an EReference 'referencedObject' of type 'Entity' (containment=false). The referenced object would be identified either by an ID or a STRING and the surrounding information (see scoping).

Example: While parsing a given input string, say

ref Entity01

Xtext produces an instance of 'ReferringType'. After this parsing step it enters the linking phase and tries to find an instance of Entity using the parsed text 'Entity01'. The input

ref "EntityWithÄÖÜ"

would work analogously. This is not an ID (umlauts are not allowed), but a STRING (as it is apparent from the quotation marks).

Specification of linking semantics

The default ILinker implementation is provided with an instance of ILinkingService. Although the default linking behavior is appropriate in many cases there might be scenarios where this is not sufficient. For each grammar a linking service can be implemented/configured, which implements the following interface:

public interface ILinkingService extends ILanguageService {
 
	/**
	 * Returns all EObjects referenced by the given link text in the given context.
	 */
	List<EObject> getLinkedObjects(EObject context, EReference reference, LeafNode text);
 
	/**
	 * Returns the textual representation of a given object as it would be serialized in the given context.
	 * 
	 * @param object
	 * @param reference
	 * @param context
	 * @return the text representation.
	 */
	String getLinkText(EObject object, EReference reference, EObject context);
}

The method getLinkedObjects is directly related to this topic whereas getLinkText adresses complementary functionality: it is used for Serialization.

A simple implementation of the linking service (DefaultLinkingService.java) is shipped with Xtext and is used for any grammar per default. It uses the default implementation of IScopeProvider.

An IScopeProvider is responsible for providing an IScope for a given EObject and it's EReference, for which all candidates shall be returned.

public interface IScopeProvider extends ILanguageService {
 
	/**
	 * Returns a scope for the given context.
	 *
	 * @param context - the element from which an element shall be referenced 
	 * @param reference - the reference to be filled.  
	 * @return {@link IScope} representing the inner most {@link IScope} for the passed context and reference
	 */
	public IScope getScope(EObject context, EReference reference);
}

An IScope represents an element of a linked list of scopes. That means that a scope can be nested within an outer scope. For instance Java has multiple kinds of scopes (object scope, type scope, etc,).

For Java one would create the scope hierarchy as commented in the following example:

// file contents scope
import static my.Constants.STATIC;
 
public class ScopeExample { // class body scope
	private Object field = STATIC;
 
	private void method(String param) { // method body scope
		String localVar = "bar";
		innerBlock: { // block scope
			String innerScopeVar = "foo";
			Object field = innerScopeVar;
			// the scope hierarchy at this point would look like so:
			//blockScope{field,innerScopeVar}->
			//methodScope{localVar,param}->
			//classScope{field}-> ('field' is overlayed)
			//fileScope{STATIC}->
			//classpathScope{'all qualified names of accessible static fields'} ->
			//NULLSCOPE{}
                        //
		}
		field.add(localVar);
	}
}

In fact the class path scope should also reflect the order of class path entries. For instance: classpathScope{stuff from bin/} -> classpathScope{stuff from foo.jar/} -> ... -> classpathScope{stuff from JRE System Library} -> NULLSCOPE{}

Default linking semantics

The default implementation for all languages, looks within the current file for an EObject of the respective type ('Entity') which has a name attribute set to 'Entity01'.

Example: Given the grammar :

...
Model : (stuff+=(Ref|Entity))*;

Ref : 'ref' referencedObject=[Entity|ID] ';';

Entity : 'entity' name=ID ';';

In the following model :

ref Entity01;
entity Entity01;

the ref would be linked to the declared entity ('entity Entity01;').

Default Imports

There is a default implementation for inter-resource referencing, which as well uses convention:

Example: Given the grammar :

...
Model : (imports+=Import)* (stuff+=(Ref|Entity))*;

Import : 'import' importURI=STRING ';';

Ref : 'ref' referencedObject=[Entity|ID] ';';

Entity : 'entity' name=ID ';';

With this grammar in place it would be possible to write three files in the new DSL where the first references the other two, like this:

--file model.dsl

import "model1.dsl";
import "model2.dsl";

ref Foo;
entity Bar;

--file model1.dsl

entity Stuff;

--file model2.dsl

entity Foo;

The resulting default scope list is as follows:

Scope (model.dsl) {
 parent : Scope (model1.dsl) {
  parent : Scope (model2.dsl) {}
 }
}

So, the outer scope is asked for an Entity named 'Foo', as it does not contain such a declaration itself its parent is asked and so on. The default implementation of IScopeProvider creates this kind of scope chain.

Fragment Provider - Playing nice with EMF URIs

Although inter-Xtext linking is not done by URIs, you may want to be able to reference your EObject from non-Xtext models. In those cases URIs are used, which are made up of a part identifying the resource. Each EObject contained in a Resource can be identified by a so called 'fragment'. A fragment is a part of an EMF URI and needs to be unique per resource. The generic XMI resource shipped with EMF provides a generic path-like computation of fragments. With an XMI or other binary-like serialization it is also common and possible to use UUIDs.

However with a textual concrete syntax we want to be able to compute fragments out of the given information. We don't want to force people to use UUIDs (i.e. synthetic identifiers) or relative generic pathes (very fragile), in order to refer to EObjects. Therefore one can contribute a so called IFragmentProvider per language.

 public interface IFragmentProvider extends ILanguageService {
 
	/**
	 * Computes the local ID of the given object. 
	 * @param obj
	 *            The EObject to compute the fragment for
	 * @return the fragment, which can be an arbitrary string but must be unique
	 *         within a resource. Return null to use default implementation
	 */
	String getFragment(EObject obj);
 
	/**
	 * Locates an EObject in a resource by its fragment. 
	 * @param resource
	 * @param fragment
	 * @return the EObject 
	 */
	EObject getEObject(Resource resource, String fragment);
 }

However, the currently available default fragment provider does nothing.

UI Architecture

For the following part we will refer to a concrete example grammar in order to explain certain aspect of the UI more clearly. The used example grammar is as follows:

Model :
  "model" intAttribute = INT ( stringDescription = STRING ) ? "{" 
		( rules += AbstractRule | types += CustomTypeParserRule ) * 
  "}" 
;
 
AbstractRule:
  RuleA | RuleB
;
 
RuleA :
	 "RuleA" "(" name = ID ")" ;
 
RuleB :
	 "RuleB" "(" ruleA = [RuleA] ")" ;
 
CustomTypeParserRule returns ReferenceModel::CustomType :
	'type' name=ID;

Content Assist

The Xtext generator, amongst other things, generates the following two content assist (CA) related artefacts:

  • a concrete proposal provider class named [Language]GenProposalProvider generated into the src-gen folder within the 'ui' project
  • a service framework configuration for the related CA interfaces (IProposalProvider,IContentAssistant and IContentAssistProcessor)

First we will investigate the generated [Language]GenProposalProvider which contains the following methods (for the example grammar above):

ProposalProvider

public List<? extends ICompletionProposal> completeModelIntAttribute(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("1", offset));		
}
 
public List<? extends ICompletionProposal> completeModelStringDescription(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("\"ModelStringDescription\"", offset));		
}
 
public List<? extends ICompletionProposal> completeModelRules(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.emptyList();
}
 
public List<? extends ICompletionProposal> completeModelTypes(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.emptyList();
}
 
public List<? extends ICompletionProposal> completeRuleAName(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("RuleAName", offset));
}
 
public List<? extends ICompletionProposal> completeRuleBRuleA(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return lookupCrossReference(((CrossReference)assignment.getTerminal()), model, offset);
}
 
public List<? extends ICompletionProposal> completeCustomTypeParserRuleName(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("CustomTypeParserRuleName", offset));
}
 
public List<? extends ICompletionProposal> completeReferenceModelCustomType(RuleCall ruleCall, EObject model, String prefix,IDocument doc, int offset) {
  return Collections.emptyList();
}
 
@Override
protected String getDefaultImageFilePath() {
  return "icons/editor.gif";
}
 
@Override
protected String getPluginId() {
  return UI_PLUGIN_ID;
}

As you can see from the snippet above the generated ProposalProvider class contains a 'completion' proposal method for each assignment and rule with a custom return type. In addition to the methods declared in interface IProposalProvider, the framework tries to call these methods for assignments and rules using reflection. The signature of the generated 'completion' proposal methods are named after the following pattern.

for assignments

public List<ICompletionProposal> complete[Typename][featureName](Assignment ele, EObject model, String prefix, int offset);

for rules with a custom return type

public List<? extends ICompletionProposal> complete[ModelAlias][ReturnType](RuleCall ruleCall, EObject model, String prefix,IDocument doc, int offset);

Note that if you have generated Java classes for your domain model (meta model) you can alternatively declare the second parameter (model) using a specific type.

for assignments with a custom return type

public List<ICompletionProposal> completeCustomTypeParserRuleName(Assignment ele, ReferenceModel.CustomType model, String prefix, int offset);

Service Configuration

The configuration of the CA related part goes into the generated [Namespace]Gen[Grammar]UiConfig class and includes the following three interfaces.

  • org.eclipse.xtext.ui.common.editor.codecompletion.IProposalProvider
  • org.eclipse.jface.text.contentassist.IContentAssistant ([[3]])
  • org.eclipse.jface.text.contentassist.IContentAssistProcessor ([[4]])

TODO: describe/link where to configure a manual implementation of IProposalProvider??

Runtime Examples

model 0 >>CA<<

will execute the following call sequence to IProposalProvider implementations

  1. completeRuleCall 'STRING'
  2. completeModelStringDescription feature 'stringDescription'
  3. completeKeyword '{'
model 0 "ModelStringDescriptionSTRING" {
 >>CA<<
}
  1. completeRuleCall 'AbstractRule'
  2. completeRuleCall 'RuleA'
  3. completeKeyword 'RuleA'
  4. completeRuleCall 'RuleB'
  5. completeKeyword 'RuleB'
  6. completeModelRules feature 'rules'
  7. completeRuleCall 'CustomTypeParserRule'
  8. completeReferenceModelCustomType 'CustomTypeParserRule'
  9. completeKeyword 'type'
  10. completeModelTypes feature 'types'
  11. completeKeyword '}'
model 0 "ModelStringDescriptionSTRING" {
 type >>CA<<
}
  1. completeRuleCall 'ID'
  2. completeCustomTypeParserRuleName feature 'name'
model 0 "ModelStringDescriptionSTRING" {
	RuleA (RuleAName)
	RuleB (>>CA<<
}
  1. completeRuleBRuleA feature 'ruleA' - Which in turn invokes lookupCrossReference which delegates to the configured ILinkingCandidatesService#getLinkingCandidates to determine all available 'link' candidates.