Skip to main content
Jump to: navigation, search


Xtext Grammar Considerations

Reduce Terminal Rules to an Absolute Minimum

Avoid terminal rules if not really necessary. Instead prefer parser rules. The lexer will split the input stream into tokens according to terminal rules plus keywords (all text in quotes appearing in parser rules). If the terminal rules are designed badly your parser rules might fail since the actual tokenization differs from what you expected. If in doubt you can read this blog post and use the code given there to test your grammar (it is a little bit out-dated and requires minor modifications). The test code lets you check whether the tokenization works as expected.

Use Parser Rules Plus Validation

Consider the following example. You want to parse decimal numbers with exponent, e.g. 2.3e3.

With Xtext's standard terminal definitions this is tokenized as



If you tried to parse this with the straight forward rule

DecimalPosExp: ('+'|'-')? INT '.' INT 'e' INT;

you would fail.

According to this our rule should read

DecimalPosExp hidden(): ('+'|'-')? INT '.' INT ID;

We leave it to validation to make sure that the actually parsed ID is just an 'e' followed by an integer.

Note the hidden() directive in the rule which allows no hidden symbols (like e.g. white space and comments) in this rule. Without that (and the implied hidden(WS, ML_COMMENT, SL_COMMENT) of the standard terminals grammar)

2 . 2/* comment */e3

would be parsed by the rule.

Example Grammar for Primitive Type Literals

Here is a grammar fragment that is based on the standard xtext terminals.

	Integer | Real

Integer returns ecore::EInt:
	SignedInteger | Hexadecimal
SignedInteger hidden(): ('+'|'-')? INT;
Hexadecimal hidden(): HEX;

Real returns ecore::EDouble:
	Decimal | DotDecimal | DecimalDot | DecimalExp | DecimalPosExp
Decimal hidden(): ('+'|'-')?  INT '.' INT;
DotDecimal hidden(): ('+'|'-')? '.' INT;
DecimalDot hidden(): ('+'|'-')? INT '.';
DecimalExp hidden(): ('+'|'-')? INT '.' INT ID ('+'|'-')? INT; // has to be checked by validation since ID may only be e or E
DecimalPosExp hidden(): ('+'|'-')? INT '.' INT ID; // has to be checked by validation since ID may only be e INT or E INT

terminal HEX: ('0x'|'0X') ('0'..'9'|'a'..'f'|'A'..'F')+;

Based on this the rules for the literals can be written.

	BooleanLiteral |
	RealLiteral |
	IntLiteral |

	{BooleanLiteral} ('false' | isTrue?='true');

RealLiteral :
	{RealLiteral} value=Real;

IntLiteral :
	{IntLiteral} value=Integer;

	{StringLiteral} value=STRING;

Those rules have to be completed by some validation of course. They are written in a way that they return Ecore primitive types like EInt etc. Thus you would have to add the corresponding value converters.

Back to the top