Xtext Grammar Considerations

Reduce Terminal Rules to an Absolute Minimum

Avoid terminal rules if not really necessary. Instead prefer parser rules. The lexer will split the input stream into tokens according to terminal rules plus keywords (all text in quotes appearing in parser rules). If the terminal rules are designed badly your parser rules might fail since the actual tokenization differs from what you expected. If in doubt you can read this blog post and use the code given there to test your grammar (it is a little bit out-dated and requires minor modifications). The test code lets you check whether the tokenization works as expected.

Use Parser Rules Plus Validation

Consider the following example. You want to parse decimal numbers with exponent, e.g. 2.3e3.

With Xtext's standard terminal definitions this is tokenized as

INT '.' INT ID

Surprised?

If you tried to parse this with the straight forward rule

DecimalPosExp: ('+'|'-')? INT '.' INT 'e' INT;

you would fail.

According to this our rule should read

DecimalPosExp hidden(): ('+'|'-')? INT '.' INT ID;

We leave it to validation to make sure that the actually parsed ID is just an 'e' followed by an integer.

Note the hidden() directive in the rule which allows no hidden symbols (like e.g. white space and comments) in this rule. Without that (and the implied hidden(WS, ML_COMMENT, SL_COMMENT) of the standard terminals grammar)

2 . 2/* comment */e3

would be parsed by the rule.

Example Grammar for Primitive Type Literals

Here is a grammar fragment that is based on the standard xtext terminals.

IntOrReal:
	Integer | Real
;

Integer returns ecore::EInt:
	SignedInteger | Hexadecimal
;
SignedInteger hidden(): ('+'|'-')? INT;
Hexadecimal hidden(): HEX;

Real returns ecore::EDouble:
	Decimal | DotDecimal | DecimalDot | DecimalExp | DecimalPosExp
;
Decimal hidden(): ('+'|'-')?  INT '.' INT;
DotDecimal hidden(): ('+'|'-')? '.' INT;
DecimalDot hidden(): ('+'|'-')? INT '.';
DecimalExp hidden(): ('+'|'-')? INT '.' INT ID ('+'|'-')? INT; // has to be checked by validation since ID may only be e or E
DecimalPosExp hidden(): ('+'|'-')? INT '.' INT ID; // has to be checked by validation since ID may only be e INT or E INT

terminal HEX: ('0x'|'0X') ('0'..'9'|'a'..'f'|'A'..'F')+;

Based on this the rules for the literals can be written.

Literal:
	BooleanLiteral |
	RealLiteral |
	IntLiteral |
	StringLiteral
;

BooleanLiteral:
	{BooleanLiteral} ('false' | isTrue?='true');

RealLiteral :
	{RealLiteral} value=Real;

IntLiteral :
	{IntLiteral} value=Integer;

StringLiteral:
	{StringLiteral} value=STRING;

Those rules have to be completed by some validation of course. They are written in a way that they return Ecore primitive types like EInt etc. Thus you would have to add the corresponding value converters.

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

ETrice/Development/Grammar

Contents

Xtext Grammar Considerations

Reduce Terminal Rules to an Absolute Minimum

Use Parser Rules Plus Validation

Example Grammar for Primitive Type Literals

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

ETrice/Development/Grammar

Contents

Xtext Grammar Considerations

Reduce Terminal Rules to an Absolute Minimum

Use Parser Rules Plus Validation

Example Grammar for Primitive Type Literals