Skip to main content
Jump to: navigation, search

Xtext/Documentation

< Xtext
Revision as of 09:15, 13 November 2008 by Peter.friese.itemis.de (Talk | contribs) (EPackage declarations)

What is Xtext?

The TMF Xtext project provides a domain-specific language (the grammar language) for description of textual programming languages and domain-specific languages. It is tightly integrated with the Eclipse Modeling Framework (EMF) and leverages the Eclipse Platform in order to provide language-specific tool support. In contrast to common parser generators the grammar language is much simpler but is used to derive much more than just a parser and lexer. From a grammar the following is derived:

  • incremental, Antlr3-based parser and lexer
  • Ecore-based meta models (optional)
  • a serializer, used to serialize instances of such meta models back to a parseable textual representation
  • an implementation of the EMF Resource interface (based on the parser and the serializer)
  • a full-fledged integration of the language into Eclipse IDE
    • syntax coloring
    • navigation (F3, etc.)
    • code completion
    • outline views
    • code templates
    • folding, etc.

The generated artifacts are wired up through a dependency injection framework, which makes it easy to exchange certain functionality in a non-invasive manner. For example if you don't like the default code assistant implementation, you need to come up with an alternative implementation of the corresponding service and configure it via eclipse extension point.

The Grammar Language

The grammar language is the corner stone of Xtext and is defined in itself - of course. It can be found here.

The grammar language is a DSL carefully designed for description of textual languages, based on ANTLR's LL(*) parsing strategy. The main idea is to let users describe the concrete syntax, and to automatically derive an in-memory model (semantic model) from that.

First an example

To get an idea of how it works we'll start by implementing a [simple example] introduced by Martin Fowler. It's mainly about describing state machines used as the (un)lock mechanism of secret compartments.

One of those state machines could look like this:

 events
  doorClosed  D1CL
  drawOpened  D2OP
  lightOn     L1ON
  doorOpened  D1OP
  panelClosed PNCL
 end
 
 resetEvents
  doorOpened
 end
 
 commands
  unlockPanel PNUL
  lockPanel   PNLK
  lockDoor    D1LK
  unlockDoor  D1UL
 end
 
 state idle
  actions {unlockDoor lockPanel}
  doorClosed => active
 end
 
 state active
  drawOpened => waitingForLight
  lightOn    => waitingForDraw
 end
 
 state waitingForLight
  lightOn => unlockedPanel
 end
 
 state waitingForDraw
  drawOpened => unlockedPanel
 end
 
 state unlockedPanel
  actions {unlockPanel lockDoor}
  panelClosed => idle
 end

So we have a bunch of declared events, commands and states. Within states there are references to declared actions, which should be executed when entering such a state. Also there are transitions consisting of a reference to an event and a state. Please read [Martin's description] is it is not clear enough.

In order to implement this language with Xtext you need to write the following grammar:

 language SecretCompartments
 
 generate secretcompartment "http://www.eclipse.org/secretcompartment"
 
 Statemachine :
  'events'
     (events+=Event)+
  'end'
  ('resetEvents'
     (resetEvents+=[Event])+
  'end')?
  'commands'
     (commands+=Command)+
  'end'
  (states+=State)+;
 
 Event :
  name=ID code=ID;
 
 Command :
  name=ID code=ID;
 
 State :
  'state' name=ID
     ('actions' '{' (actions+=[Command])+ '}')?
     (transitions+=Transition)*
  'end';
 
 Transition :
  event=[Event] '=>' state=[State];

In the following the different concepts of the grammar language are explained. We refer to this grammar when useful.

Language Declaration

The first line

language SecretCompartments

declares the name of the language. Xtext leverages Java's classpath mechanism. This means that the name can be any valid Java qualifier. The file name needs to correspond and have the file extension '*.xtext'. This means that the name needs to be "SecretCompartments.xtext" and must be placed in the default package on the Java's class path.

If you want to place it within a package (e.g. 'foo/SecretCompartment.xtext') the first line must read:

language foo.SecretCompartment

The first line can also be used to declare a super language to inherit from. This mechanism is described here.

EPackage declarations

Xtext parsers instantiate Ecore models (aka meta model). An Ecore model basically consists of an EPackage containing EClasses, EDatatypes and EEnums. Xtext can infer Ecore models from a grammar (see MetaModel Inference) but it is also possible to instantiate existing Ecore models. You can even mix this and use multiple existing ecore models and infer some others from one grammar.

EPackage generation

The easiest way to get started is to let Xtext infer the meta model from your grammar. This is what is done in the secret compartment example. To do so just state:

generate secretcompartment "http://www.eclipse.org/secretcompartment"

Which says: generate an EPackage with name secretcompartment and nsURI "http://www.eclipse.org/secretcompartment" (these are the properties needed to create an EPackage).

EPackage import

If you already have created such an EPackage somehow, you could import it using either the name space URI or a resource URI (URIs are an EMF concept):

import "http://www.eclipse.org/secretcompartment"

Using multiple packages

If you want to use multiple EPackages you need to specify aliases like so:

generate secretcompartment "http://www.eclipse.org/secretcompartment"
import "http://www.eclipse.org/anotherPackage" as another

When referring to a type somewhere in the grammar you need to qualify them using that alias (example "another::CoolType"). We'll see later where such type references occur.

Rules

The parsing is based on ANTLR 3, which is a parser generator framework based on an [LL(*) algorithm]. Basically parsing can be separated in the following phases.

  1. lexing
  2. parsing
  3. linking
  4. validation

Lexer Rules

In the first phase a sequence of characters (the text input) is transformed into a sequence of so called tokens. Each token consists of one or more characters and was matched by a particular lexer rule. In the secret compartments example there are no explicitly defined lexer rules, since it uses lexer rules inherited from the default super language (called org.eclipse.xtext.builtin.XtextBuiltin (see #Language Inheritance)), only (the ID rule).

Therein the ID rule is defined as follows:

lexer ID : 
  "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*"; 

It says that a Token ID starts with an optional '^' character followed by a letter ('a'..'z'|'A'..'Z') or underscore ('_') followed by any number of letters, underscores and numbers ('0'..'9'). Note that this declaration is just a string literal which is passed to a generated Antlr-Grmmar as is.

The optional '^' is used, to escape an identifier for cases where there are conflicts with keywords. It is removed during parsing.

This is the formal definition of lexer rules:

LexerRule :
  'lexer' name=ID ('returns' type=TypeRef)? ':' body=STRING ';'
;

Return types

A lexer rule returns a value, which defaults to a string (type ecore::EString). However, if you want to have a different type you can specify it. For instance, the built-in lexer rule 'INT' is defined like so:

lexer INT returns ecore::EInt : 
  "('0'..'9')+";

This says, that the lexer rule INT returns instances of ecore::EInt. It is possible to define any kind of data type here, which just needs to be an instance of ecore::EDataType. In order to tell the parser how to convert the parsed string to a value of the declared data type, you need to provide your own implementation of 'IValueConverterService'.

Have a look at [org/eclipse/xtext/builtin/conversion/XtextBuiltInConverters.java] to find out how such an implementation looks like.

The implementation needs to be registered as a service (see #Service Framework). This is also the point where you can remove things like quotes form string literals or the '^' from identifiers.

Enum Rules

TODO Not yet implemented

String Rules

TODO Not yet implemented

Parser Rules

The parser reads in a sequence of tokens produced by the lexer and walks through the parser rules.

to be continued.

Model Construction

Meta-Models

The meta-model of a textual language describes the structure of its abstract syntax trees (AST).

Xtext uses Ecore EPackages to define meta models. Meta models are declared to be either inferred from the grammar #EPackage _generation or imported #EPackage _import. A meta model's declaration can also define an alias name, that is used in other places of the grammar to qualify references to its types. Each language inherits the aliases from its superlanguage, but it can also override these. The default meta model does not have an alias in its declaration. It will contain all types that are referred to without qualifier.

language MyLang

generate MyMetaModel "http://www.mysite.org/myMetaModel" // default meta model
import "http://www.eclipse.org/emf/2002/Ecore" as ecore  // imported meta model with alias

RuleA returns MyType:         // reference to default meta model
  'mytype' name=ID; 
   
RuleB returns ecore::EObject: // reference to imported EPackage
  name=ID;


Meta-Model Inference

By using the generate directive (see #EPackage _generation), Xtext derives one or more metamodel(s) from the grammar. All elements/types can occur multiple times, but are generated only once.

Type and Package Generation

Xtext creates

an EPackage
  • for each generated package declaration. The name of the EPackage will be set to the first parameter, its nsURI to the second parameter. An optional alias allows to distinguish generated EPackages later. Only one generated package declaration without an alias is allowed per grammar.
an EClass
  • for each return type of a parser rule. If a parser rule does not define a return type, an implicit one with the same name is assumed. You can specifiy more than one rule that return the same type but only one EClass will be gererated.
  • for each type defined in an action or a cross-reference.
an EDatatype
  • for each return type of a lexer rule.

All EClasses and EDatatypes are added to the EPackage referred to by the alias provided in the type reference they were created from.

Feature and Type Hierarchy Generation

While walking through the grammar, the algorithm keeps track of a set of the currently possible return types to add features to.

  • Entering a parser rule the set contains only the return type of the rule.
  • Entering a group in an alternative the set is reset to the same state it was in when entering the first group of this alternative.
  • Leaving an alternative the set contains the union of all types at the end of each of its groups.
  • After an optional element, the set is reset to the same state it was before entering it.
  • After a mandatory (non-optional) rule call or mandatory action the set contains only the return type of the called rule or action.
  • An optional rule call or optional action does not modify the set.
  • A rule call or an action is optional, if its cardinality is '?' or '+'.

While iterating the parser rules and Xtext creates

an EAttribute in each current return type
  • of type EBoolean for each feature assignment using the '?=' operator. No further EReferences or EAttributes will be generated from this assignment.
  • for each assignment with the '=' or '+=' operator calling a lexer rule. Its type will be the return type of the called rule.
an EReference in each current return type
  • for each assignment with the '=' or '+=' operator in a parser rule calling a parser rule. The EReferences type will be the return type of the called parser rule.
  • for each action. The reference's type will be set to the return type of the current calling rule.

Each EAttribute or EReference takes its name from the assignment/action that caused it. Multiplicities will be 0...1 for assignments with the '=' operator and 0...* for assignments with the '+=' operator.

Furthermore, each type that is added to the currently possible return types automatically inherits from the current return type of the parser rule. You can specify additional common supertypes by means of "artificial" parser rules, that are never called, e.g.

CommonSuperType:
  SubTypeA | SubTypeB | SubTypeC;

Feature Normalization

As a last step, the generator examines all generated EClasses and lifts up similar features to supertypes if there is more than one subtype and the feature is defined in every subtypes. This does even work for multiple supertypes.

Error Conditions

The following conditions cause an error

  • An EAttribute or EReference has two different types or differnt cardinality.
  • There are an EAttribute and an EReference with the same name in the same EClass.
  • There is a cycle in the type hierarchy.
  • An new EAttribute, EReference or supertype is added to an imported type.
  • An EClass is added to an imported EPackage.
  • An undeclared alias is used.
  • An imported metamodel cannot be loaded.

Importing existing Meta Models

With the import directive in Xtext you can refer to existing Ecore metamodels and reuse the types that are declared in an EPackage. Xtext uses this technique by itself to leverage Ecore datatypes.

import "http://www.eclipse.org/emf/2002/Ecore" as ecore;

Specify an explicit return type to reuse such imported types. Note that this even works for lexer rules.

lexer INT returns ecore::EInt : "('0'..'9')+";

Language Inheritance

This concept is about to be changed in the future. Since simple inheritance is too restrictive, we'll come up with something more mixin-like..

Xtext support language inheritance. By default (implicitly) each language extends a language called *org.eclipse.xtext.builtin.XtextBuiltin* and is defined as follows:

 abstract language org.eclipse.xtext.builtin.XtextBuiltIn_Temp
 
 import "http://www.eclipse.org/emf/2002/Ecore" as ecore;
 
 
 lexer ID : 
   "('^')?('a'..'z'|'A'..'Z'|'_') ('a'..'z'|'A'..'Z'|'_'|'0'..'9')*";
 
 lexer INT returns ecore::EInt : 
   "('0'..'9')+";
 
 lexer STRING : 
   "
   '\"' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\"') )* '\"' | 
   '\\'' ( '\\\\' ('b'|'t'|'n'|'f'|'r'|'\\\"'|'\\''|'\\\\') | ~('\\\\'|'\\'') )* '\\''
   ";
 
 lexer ML_COMMENT : 
   "'/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}";
 
 lexer SL_COMMENT : 
   "'//' ~('\\n'|'\\r')* '\\r'? '\\n' {$channel=HIDDEN;}";
 
 lexer WS : 
   "(' '|'\\t'|'\\r'|'\\n')+ {$channel=HIDDEN;}";
 
 lexer ANY_OTHER : 
   ".";

Just ignore the grammar if you don't yet understand it. It basically provides some commonly used lexer rules which can be used in all grammars.

Service Framework

Runtime Architecture

Value Converters

Linking

The linking feature allows for specification of cross references within an Xtext grammar.

To do the linking several things are processed:

  1. the derived/referenced ecore model contains a respective cross reference (containment=false).
  2. the syntax of a crossreference is expressed by a lexer rule. usually an identifier or a fully qualified name
  3. there is a linking phase right after parsing
  4. there is linking semantics provided for a specific link.

In the grammar a cross reference is specified using square brackets.

CrossReference :
  '[' ReferencedEClass ('|' lexerRuleName)? ']'

Example:

ReferringType :
  'ref' referencedObject=[Entity|ID];

That results in a Type 'ReferringType' with an EReference 'referencedObject' of type 'Entity'. The referenced object will be identified by an ID (can be omitted).

At run-time while parsing a given input string, say

ref Entity01

Xtext produces an instance of 'ReferringType'. After this parsing step it enters the linking phase and tries to link the ID token 'Entity01'. To do this it searches the corresponding EObject that is compatible with the type 'Entity' and sets the attribute 'referencedObject' of the 'ReferringType' accordingly.

Although the default linking behavior is appropriete in most cases there might be scenarios where this is not sufficient. For each grammar a linking service can be implemented/configured, which implements the following interface:

 public interface ILinkingService extends ILanguageService {
 
 	/**
 	 * Returns the URIs of all EObjects referenced by the given link text in the
 	 * given context.
 	 */
 	List<URI> getLinkedObjects(EObject context, CrossReference ref, LeafNode text);
 
 	/**
 	 * Returns a link text of a referenced object. If more than one textual
 	 * representation is possible (e.g. relative vs. absolute), try to provide
 	 * the unambiguous one (here: absolute)
 	 */
 	String getLinkAsText(EObject context, URI referencedObject);
 
 	/**
 	 * Returns all possible link matches of a partially provided link text. This
 	 * could be the starting of a link text or in case of nested namespaces the
 	 * fragment.
 	 */
 	List<Pair<String, URI>> getLinkCandidates(EObject context, CrossReference ref, String textFragment);
 }

The method getLinkedObjects is directly related to this topic whereas the other two methods address complementary functionality. The method getLinkAsText is used for Serialization, getLinkCandidates is used for Code Assist.

A built-in linking service ships with Xtext and is used for any grammar per default. This built-in service will translate from and to textual representations using an IFragmentProvider (see next section) as well as an ILinkingScopeService. Using the service registry one can either replace the linking service or configure it with a special fragment provider/scope service.

The ILinkingScopeService will be changed in future. See bug 250439 for more details.
 public interface ILinkingScopeService extends ILanguageService {
 
 	/**
 	 * Provides all EObjects that can be accessed from a given context.
 	 * @param context meta model element that defines the scope 
 	 * @return List of EObjects in the given scope
 	 */
 	public List<EObject> getObjectsInScope(EObject context);
 }

To resolve a given textual representation the shipped XtextBuiltinLinkingService asks every EObject in the scope of a given context (here: every EObject in every resource of the resource set) whether its URI fragment matches the passed representation. The built-in also ensures that the EObject's type is compatible with the cross reference (see method signature of getLinkedObjects).

Identification of Elements (Fragment provider)

The way of identifying elements in the context of linking might be extended in the future. See bug 254995 for more details.

Each EObject contained in a Resource can be identified by a so called 'fragment'. A fragment is a part of an EMF URI and needs to be unique per resource. The generic XMI resource shipped with EMF provides a generic path-like computation of fragments. It is also common and possible to use UUIDs.

However with a textual concrete syntax we want to be able to compute fragments out of the given information. We don't want to force people to use UUIDs or relative generic pathes, in order to do cross-referencing. Therefore one can contribute a so called IFragmentProvider per language.

 interface IFragmentProvider extends ILanguageService {
     String getFragment(EObject obj);
 }

There is a default provider registered for the xtext builtin language, wich can be easily overwritten. The default implementation just returns the "name" value if it exists and is unique, otherwise it delegates to the path-like fragments computation known from XMI resources.

UI Architecture

For the following part we will refer to a concrete example grammar in order to explain certain aspect of the UI more clearly. The used example grammar is as follows:

Model :
  "model" intAttribute = INT ( stringDescription = STRING ) ? "{" 
		( rules += AbstractRule | types += CustomTypeParserRule ) * 
  "}" 
;
 
AbstractRule:
  RuleA | RuleB
;
 
RuleA :
	 "RuleA" "(" name = ID ")" ;
 
RuleB :
	 "RuleB" "(" ruleA = [RuleA] ")" ;
 
CustomTypeParserRule returns ReferenceModel::CustomType :
	'type' name=ID;

Content Assist

The Xtext generator, amongst other things, generates the following two CA related artefacts:

  • a concrete proposal provider class named [Language]GenProposalProvider generated into the src-gen folder within the 'ui' project
  • a service framework configuration for the related CA interfaces (IProposalProvider,IContentAssistant and IContentAssistProcessor)

First we will investigate the generated [Language]GenProposalProvider which contains the following methods (for the example grammar above):

ProposalProvider

public List<? extends ICompletionProposal> completeModelIntAttribute(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("1", offset));		
}
 
public List<? extends ICompletionProposal> completeModelStringDescription(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("\"ModelStringDescription\"", offset));		
}
 
public List<? extends ICompletionProposal> completeModelRules(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.emptyList();
}
 
public List<? extends ICompletionProposal> completeModelTypes(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.emptyList();
}
 
public List<? extends ICompletionProposal> completeRuleAName(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("RuleAName", offset));
}
 
public List<? extends ICompletionProposal> completeRuleBRuleA(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return lookupCrossReference(((CrossReference)assignment.getTerminal()), model, offset);
}
 
public List<? extends ICompletionProposal> completeCustomTypeParserRuleName(Assignment assignment, EObject model, String prefix, IDocument doc,int offset) {
  return Collections.singletonList(createCompletionProposal("CustomTypeParserRuleName", offset));
}
 
public List<? extends ICompletionProposal> completeReferenceModelCustomType(RuleCall ruleCall, EObject model, String prefix,IDocument doc, int offset) {
  return Collections.emptyList();
}
 
@Override
protected String getDefaultImageFilePath() {
  return "icons/editor.gif";
}
 
@Override
protected String getPluginId() {
  return UI_PLUGIN_ID;
}

As you can see from the snippet above the generated ProposalProvider class contains a 'completion' proposal method for each assignment and rule with a custom return type. In addition to the methods declared in interface IProposalProvider, the framework tries to call this methods for assignments and rules using reflection. The signature of the generated 'completion' proposal methods are named after the following pattern.

for assignments

public List<ICompletionProposal> complete[Typename][featureName](Assignment ele, EObject model, String prefix, int offset);

for rules with a custom return type

public List<? extends ICompletionProposal> complete[ModelAlias][ReturnType](RuleCall ruleCall, EObject model, String prefix,IDocument doc, int offset);

Note that if you have generated Java classes for your domain model (meta model) you can alternatively declare the second parameter (model) using a specific type.

for assignments with a custom return type

public List<ICompletionProposal> completeCustomTypeParserRuleName(Assignment ele, ReferenceModel.CustomType model, String prefix, int offset);

Service Configuration

The configuration of the CA related part goes into the generated [Namespace]Gen[Grammar]UiConfig class and includes the following three interfaces.

  • org.eclipse.xtext.ui.common.editor.codecompletion.IProposalProvider
  • org.eclipse.jface.text.contentassist.IContentAssistant ([[1]])
  • org.eclipse.jface.text.contentassist.IContentAssistProcessor ([[2]])

TODO: describe/link where to configure a manual implementation of IProposalProvider??

Runtime Examples

model 0 >>CA<<

will execute the following call sequence to IProposalProvider implementations

  1. completeRuleCall 'STRING'
  2. completeModelStringDescription feature 'stringDescription'
  3. completeKeyword '{'
model 0 "ModelStringDescriptionSTRING" {
 >>CA<<
}
  1. completeRuleCall 'AbstractRule'
  2. completeRuleCall 'RuleA'
  3. completeKeyword 'RuleA'
  4. completeRuleCall 'RuleB'
  5. completeKeyword 'RuleB'
  6. completeModelRules feature 'rules'
  7. completeRuleCall 'CustomTypeParserRule'
  8. completeReferenceModelCustomType 'CustomTypeParserRule'
  9. completeKeyword 'type'
  10. completeModelTypes feature 'types'
  11. completeKeyword '}'
model 0 "ModelStringDescriptionSTRING" {
 type >>CA<<
}
  1. completeRuleCall 'ID'
  2. completeCustomTypeParserRuleName feature 'name'
model 0 "ModelStringDescriptionSTRING" {
	RuleA (RuleAName)
	RuleB (>>CA<<
}
  1. completeRuleBRuleA feature 'ruleA' - Which in turn invokes lookupCrossReference which delegates to the configured ILinkingService#getLinkCandidate to determine all available 'link' candidates.

Back to the top