M2TBackend

From Eclipsepedia

Jump to: navigation, search

Contents

Common platform for M2T languages

This page describes both the ideas behind the common M2T backend and its implementation.

The backend originated as part of the evolution of the Xpand and Xtend languages, and the packages are currently named accordingly. It is however intended as a runtime environment for all M2T languages, supporting common performance optimizations, interoperability and potential reuse of other code. If it proves to be useful this way, it should probably be moved to a non-xtend namespace to reflect its common nature.

Comments are very welcome and indeed necessary to incorporate the requirements of other languages than Xpand and Xtend and make the backend useful to them.

The backend code is currently located in the modeling CVS at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.backend* with related code at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.middleend*. This distinction will be explained in the remainder of this document.


Overview

Design goals

The design of the backend was driven by the following forces:

  • Performance. In large projects, generator speed is an issue, and the backend is designed with performance in mind. This requirement is what actually sparked its development in the first place.
  • Compiler. For performance and/or obfuscation reasons, the backend will serve as a basis for compilation into Java classes.
  • Language independence. Concrete languages evolve, concrete convenience syntax is added, and anyway different languages have different concrete syntaxes. In order to leverage the performance tuning effort, the backend is designed to be largely independent of the concrete syntax of languages built on top of it. The developer skills and mindset required for frontend and backend development are quite different, and the separation gives performance and universality efforts of the backend a more stable basis. This is probably a point where some implicit assumptions will prove less general than desirable, and feedback from other m2t language development teams is necessary.
  • Independent of parse tree. There is a strict and complete separation between the data structures used by the backend and those used by the frontends. The previous item explained how this separation is useful for the backend, but the development of frontend tooling also benefits. Since the parse tree of the frontend need not directly serve as a basis for execution, it becomes simpler to implement features like fault tolerant parsing.
  • Language interoperability. The common backend is intended to facilitate interoperability of languages, i.e. making it as simple as possible to have code in one language call code written in another.
  • Reuse of Tooling. The backend will incorporate support for tooling that requires runtime support - debugging, profiling etc. - in such a way as to minimize implementation effort for the different languages that wish to support them.


Layers

The backend serves as the runtime environment, and its data structures are independent of the concrete syntax of a given language. The frontend tooling on the other hand is intended to use its own AST that should be free of runtime concerns.

Therefore a translation layer is introduced, called middle end for want of a different term. It is specific for every concrete language, and its purpose is to transform the AST of the frontend into the data structures required by the backend. This involves mainly the following transformations which will be explained in more detail in subsequent sections:

  • functions. The data structures representing code are structured around the key abstraction of function in the backend. A function is a piece of code that can be called using parameters and that returns an object.
  • primitive operations. The code inside a function is represented by a tree of expressions. Since stability is one of the key design requirements of the backend, the middle ends must map the specific functionality of that language onto the given set of expression nodes available in the backend.
  • types. Since m2t languages - just like many other languages - operate on data, data types have a representation in the backend. So every middle end must transform data types from the language specific representation into the common representation of the backend.


Execution sequence

In line with the performance goal of the backend, the backend is as static in its execution as possible. Everything that can be evaluated by the middle end has no support in the backend.

One prominent aspect that is affected by this decision is parsing of source code, which is left entirely to the middle end. The backend is designed so as to never parse any resources. This decision deeply affects the function resolution strategy and other implementation aspects of the backend, and therefore it should be reviewed especially carefully for its implications at an early stage if possible!

From an execution perspective, several steps must be performed in order to execute a program:

  1. Call the middle end to transform the program into the backend representation. This requires the frontend AST as an input, so the middle end will call the corresponding front end parser. The output of this step is an initialized backend data structure.
    There is a conscious design decision at this point. At first glance, it would also be possible to pass the front end AST to the middle end instead of having the middle end call the front end parser. This approach would work well for single source files, but it would be difficult to maintain if one source file referenced another, potentiall even written in a different language. Therefore the decision was made to have the middle end call the front end parser.
  2. Initialize the runtime data structure for the backend. This step can be implicitly performed by a facade, but it allows detailed control over reuse of runtime data across several invocations (see below for details).
  3. Actually call the backend to "execute" the data structures returned by the middle end.


Middle End

The middle end in general and the MiddleEnd class in particular forms the single point of access and integration for the backend. The middle end forms the interface between users of the back end, and providers contributing functionality. Let's look at these two perspectives separately.

Using the middle end

An instance of the MiddleEnd class can be seen conceptually as a wrapper around an ExecutionContext instance, combined with access to functions by resource name.

There are mainly three methods in the MiddleEnd class.

    public FunctionDefContext getFunctions (String resourceName);
    public void applyAdvice (String resourceName);
    public ExecutionContext getExecutionContext ();

The first of the methods retrieves the FunctionDefContext representing a resource, i.e. typically a source file. It delegates this task to the registered language specific providers.

The second of the methods retrieves all advice that is defined in a given resource, and registers it with the ExecutionContext instance that it keeps.

The third of the methods provides access to the ExecutionContext stored with the MiddleEnd.

These three methods provide the entire API that is required to load and invoke arbitrary programs. And while the methods are somewhat technical in nature - they are after all typically hidden behind IDE tooling or workflow components - they are simple enough to use.

Contributing to the middle end

The generic middle end as an integration layer knows nothing of specific programming languages, or how to transform them into backend representations. It relies on specific contributions to do that.

This is done through an extension point with the ID org.eclipse.xtend.middleend.MiddleEnd, where classes that implement the interface LanguageSpecificMiddleEndFactory can be registered. But before we look at this interface, a discussion on the use of the extension point mechanism is warranted.

Using extension points means that the middle end will always require both an OSGi container, and the Eclipse registry plugin. That is a significant increase in footprint, and it adds to configuration management complexity. The rationale that led to the decision is that the next version of the workflow engine will be built on top of OSGi anyway, and the Eclipse registry is not a big increase beyond that. The benefit of using extension points here on the other hand significantly simplifies the programming model.

Now about the LangaugeSpecificMiddleEndFactory interface. It contains three methods:

public interface LanguageSpecificMiddleEndFactory {
    String getName ();
    int getPriority ();
    LanguageSpecificMiddleEnd create (MiddleEnd middleEnd, Object specificData);
}

getName returns a human readable name used for logging.

getPriority provides a means to decide the order in which implemenetations are asked to process a given resource in the rare cases where several contributors feel responsible for the same resources.

create finally is the actual factory implementation and returns the instance that can actually handle resources and contribute functions and advice.

This create method has two parameters. The first of these is the MiddleEnd instance. It is useful if the implementation wants to reference other resources - all such references should be resolved through the middle end to achieve independence of the concrete programming language being referenced.

The second parameter is intended as a generic tunnel for specific data required by the implementation. The Xpand middle end for example requires an instance of XpandExecutionContext, a frontend construct.

These generic parameters must be passed to the MiddleEnd constructor in a map:

    public MiddleEnd (BackendTypesystem ts, Map <Class<?>, Object> specificParams) {
        ...

Obviously, if a specific middle end contribution requires specific initialization in order to function, it will not work unless it receives this initialization. So in order to be able to use a specific implementation, one must call the MiddleEnd constructor with correct and sufficient initialization. This breaks the middle end encapsulation to a certain degree, but it is unavoidable.

If a LanguageSpecificMiddleEnd implementation feels it has insufficient specificData to operate, the corresponcing create method in the factory may throw an IllegalArgumentException. This is the official and specified way for it to remove itself from the list of available handlers for this MiddleEnd instance.

This dynamic behavior where a handler can be present for one MiddleEnd instance and absent for another is one reason for the two-step initialization with the separation between factory and actual implementation.

Another reason is that the actual implementations are encouraged to use aggressive caching of whatever intermediate results they have. Using a new MiddleEnd instance is the way to re-initialize these caches.

Oh, and finally let us take a look at the LanguageSpecificMiddleEnd interface itself:

public interface LanguageSpecificMiddleEnd {
    String getName ();
    boolean canHandle (String resourceName);
    FunctionDefContext getContributedFunctions (String resourceName);
    List<AroundAdvice> getContributedAdvice (String resourceName);
}

No surprises here. getName returns a human readable name for logging, canHandle is called to determine whether this implementation feels responsible for a given resource. They are called in sequence, first one to return true gets the resource.

getContributedFunctions and getContributedAdvice retrieve the functions and advice defined in a resource, respectively.

Functions

Key abstractions

Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document) and one to provide information about its parameter types (for details about types, see the section about types):

public interface Function {
    public Object invoke (ExecutionContext ctx, Object[] params);
    List<? extends BackendType> getParameterTypes ();  
    
    ... // other methods that will be explained later
}

There is no distinction between "stand-alone functions" and "methods defined on a type" - they are all represented by functions. If a concrete language supports "method" style calls, these must be mapped to functions with an additional first parameter of the "class" type by the respective middle end.

So, for example

MyType.doSomething (int a, String b)

would be represented by

doSomething (MyType this, int a, String b)


Functions have no name per se, they are like function pointers in C in this regard. This supports a functional style of programming, Closures etc. In contexts where functions require a name, they are represented by the NamedFunction class that is basically a container for a function and a string representing its name.


Polymorphism

It is possible - even desirable - for several functions to have the same name, as long as they differ in their parameter types. Every function invocation made by the backend is done polymorphically, i.e. if there are several functions with the right name and number of parameters, the best fit is picked and invoked.

This choice is done based on the actual object types of the parameters and not on the "reference type" (a concept that the backend has no notion of) as in Java. So the resolution is done at runtime rather than compile time, and if there is no matching function (or no single "best match"), a runtime error occurs.

Let us for example assume we have the following functions ("List" and "Set" being subtypes of "Collection"):

  f(Collection c, Object o);
  f(List       c, Object o);
  f(Object o, Collection c);

The following table describes the backend behavior based on the types of the parameters passed to the function:

first parameter type second parameter type function being invoked
List String second function is invoked. Both first and second function match, but the second is more specific.
Set String first function is invoked because it is the only match.
String String none of the function signatures match --> runtime error
List List All three functions match. The second function is a specialization of the first, but the second is no specialization of the third, nor the other way round --> runtime error


Linking and Function Scope

In order for polymorphic resolution to work, the backend has the concept of the scope of functions. It is part of the definition of every function to know which other functions are visible from it. It is part of the static linking nature of the backend that the middle end must determine this list of all named functions visible from within a given function. It is stored in a data structure called FunctionDefinitionScope; for details on how it is stored and initialized, see the section on data structures.

For languages that support referencing functions from other compilation units, e.g. using fully qualified names like Xpand, this requires the middle end to analyze which functions are potentially referenced from a given compilation unit and initialize the FunctionDefinitionScope accordingly.

There are some standard functions available in the so-called system library (syslib). These functions are of a general nature and are available to all languages implicitly. They comprise things like string concatenation, basic collection operations or streaming data to a file. Since the backend supports several concrete languages that may each have their own concrete syntax for these functions, the middle end will in general have to perform a mapping from the concrete syntax to the syslib names of the corresponding functions.


Advanced concepts

Backend functions support advanced functionality that may or may not be required by a given frontend language. They are represented by methods in the Function interface, and they are designed not to get in tahe way if they are not required.

Guards

It is possible for a function to be associated with a so-called guard expresson, or guard. The Function interface has a getter method for it:

public interface Function {
    ExpressionBase getGuard();
    ... // other methods
}


Such a guard is a predicate (boolean expression) that is evaluated for all candidate functions before polymorphic resolution is performed. Only functions for which the guard evaluates to true are actual candidates for execution.

If a function has no guard, the getGuard() method can simply return null.


Caching

The backend supports caching of function invocations. To indicate this, the Function interface has a method

boolean isCached ();

If a method is cached, the backend executes it only once for a given set of parameters, remembering the result and returning the result from its cache for subsequent invocations. The implementation takes care of direct or indirect recursion.

This feature is useful for two scenarios, performance optimization and functions with side effects.

  • performance optimization. A language can provide a feature to annotate expensive functions at the source level so that their results need not be computed more than once.
  • functions with side effects. This feature is invaluable for functions with side effects that should be executed only once, especially during the creation of object graphs. For this purpose, a given language may or may not choose to expose the feature in its syntax, but it will likely benefit from using it internally.

AOP and Advice

The system supports AOP in the form of "around" advice. This section describes the concepts and how they are implemented.

Overview

AOP in this context means that from *outside* a function, one or more other functions can be declaratively wrapped around this function. The outermost of these wrapped-around advice is executed whenever a call to the actual function is made. It has the choice of either providing a result directly, or making a call to "proceed" (for details see below), calling the next advice in the chain or the actual function, if it is the innermost advice.

One of the strengths of AOP is that a single advice can be applied to a large number of functions. This is specified using wildcard matching, and it is described in the subsection on pointcuts.

The backend implementation of AOP goes beyond other common approaches in that advice can be registered dynamically during program execution. This is intended for external customization of cartridges, e.g. selective overwriting of the function used to generate attribute names (i.e. naming conventions). This can be scoped, i.e. the backend supports the going out-of-scope of such dynamically registered advice. It is obviously up to the programming languages built on top of the backend to decide in what way and to what degree they want to make this feature available.

Pointcuts

A pointcut defines where an advice is to be applied. Currently ExectionPointcut is the only kind of pointcut that is supported. It selects the functions around which the advice is to be wrapped. This is done based on two criteria:

  • function name. The function name is given as a string which can contain one or more '*' wildcards to denote any number of characters.
  • parameter types. The parameter types can be given either as an explicit list, or as a 'varargs' parameter, meaning that any number of subsequenc parameters will match the pointcut, or a mixture of the two: a list of explicitly given parameter types, and after that a variable number of paramaters. For each of the parameters, a type must be given. Together with this type, there is a flag that specifies if the type must be an exact match of may be a subtype of the type given.

NB: The parameter types are matched against the declared types of the functions to decide which function matches and which does not. That does not mean that the actual parameters passed to the function match the same criteria! If this confuses you, I suggest reading a tutorial on AspectJ or some other AOP framework.

As a convenience, every pointcut is internally complemented with an implicit "&& ! within <any advice>" pointcut to avoid endless recursion in situations where advice calls a function to which the advice itself is applied. This is a well-known AOP idiom, so I will not go into it here. If this confuses you, you probably do not need to understand this - it is probably the behavior you expect to see ;-)

Advice

Advice implementations consist of an expression. This expression is executed in its own FunctionDefContext which must be supplied when the AroundAdvice instance is created.

Exactly two local variables are bound to the advice implementation: thisJoinPointStaticPart and thisJoinPoint, containing instances of ThisJoinPointStaticPart and ThisJoinPoint, respectively.

thisJoinPointStaticPart gives access to the the function object that was matched and its name, i.e. all information that is independent of the current runtime situation.

thisJoinPoint gives access to the actual parameters that were passed in, and to the stack trace of the current execution (including local variables). NB: Keeping track of the stack trace is costly, so there is a global switch in the ExecutionContext to switch it on and off. If it is switched off, the stack trace provided here will always be empty.

thisJoinPoint also has the proceed operations mentioned above. There are two of these.

The first proceed operation takes all parameters as a list. This allows the advice to pass different parameters than it received, if it wishes to do so.

The second proceed operation takes no parameters and is a shortcut for "proceed (thisJoinPoint.parameters)". This is useful for the common case that advice wants to pass the original parameters on.

Execution

All information regarding advice is stored in an AdviceContet instance which is part of the ExecutionContext.

Advice is applied non-locally, which is part of the AOP concept. Advice can be added before running a program - or in some other part of the program - and be applied on functions that are written in languages that do not even have the concept of AOP.

Advice is applied at runtime in the order it was declared. So advice declared first forms the outermost layer, with advice declared later forming layers inside it.

Caching

Functions can be cached, and sometimes it is desirable for advice to be cached too. If for example the advice performs some kind of logging or tracing, it should be executed only when the underlying function is executed, and not when a cached result of the underlying function is returned.

In other cases however, advice should always be executed regardless of the caching behavior of the underlying function, e.g. if a function in a cartridge is dynamically overwritten by "advice" that does not call proceed.

To reflect this range of application scenarios, AroundAdvice has a cacheable flag. This flag causes advice to be cached if and only if the matched function is cached and all advice "wrapped inside" the advice is also cacheable.

Since the advice that is declared last is applied furthes inside, adding new advice invalidates all advice caching, making it a costly operation.

Types

The type system is in a way the blood that courses through the backend's veins. It is vitally important, yet it is hard to localize, and it connects many different parts of the backend.

The backend proper knows nothing whatsoever of types. Any data element is just a java.lang.Object to it, and all interaction with data elements are through the type system.

Every type is represented by an instance of an implementation of the BackendType interface. And the typical way to retrieve such a type is to go through the BackendTypesystem that serves as a factory for BackendTypes.

And while even developers integrating a new language with the backend will have to deal with the typesystem only superficially, it is essential in order to understand the inner workings of the backend and is therefore described here.


BackendTypesystem

The type system of the backend is designed to be pluggable and extensible. To this end, BackendTypesystem is an interface, and users can contribute arbitrary implementations (e.g. based on XML DOMs). The only requirement is that their implementations meet the contracts of BackendTypesystem and BackendType for the types returned by the BackendTypesystem implementation.

The actual typesystem of the backend is a hierarchy. At its root, there is an implementation that knows about the built-in types like String or Collection. Below this root there is a list of contributed implementations that know about specific kinds of complex types, e.g. EMF, UML2 or Java Beans. Whenever the type system is asked for a type, it asks all contributed type systems until one of them provides a BackendType.

The BackendTypesystem interface essentially consists of two factory methods:

public interface BackendTypesystem {
    BackendType findType (Object o);
    BackendType findType (Class<?> cls);
}

The first of these is the more commonly used. It identifies the type of a given Object and is for example used by the backend to identify the types of the parameters if a function is called. If a BackendTypesystem feels "responsible" for the object, it returns the type it assigns to it, otherwise it returns null.

The second of the methods is used to analyze Java methods for their types. While this may sound like a very special case, Java is a prominent language for providing implementations of functions, and therefore the decision was made that every type system must be able to analyze types based on a Java Class as well as based on an instance.

There are two more methods in the BackendTypesystem interface, a getter and a setter for the root typesystem. These are housekeeping methods to deal with the hierarchical structure of the type system in a generic fashion.


BackendType

BackendType is the interface through which the backend performs all interaction with model data. And since the data types are transformed in the middle end anyway, it contains no more functionality than is strictly required by the backend. This decision is based on the assumption that the front end requires a whole different set of functionality for dealing with types (e.g. identifying the type by name, based on the import statements in a compilation unit) that are specific for the differernt languages, so including them in the backend would make things more complex without adding any real benefit.

The interface contains the following methods:

public interface BackendType {
    Object create ();
    
    boolean isAssignableFrom (BackendType other);
    
    Object getProperty (ExecutionContext ctx, Object o, String name);
    void setProperty (ExecutionContext ctx, Object o, String name, Object value);

    Collection<? extends NamedFunction> getBuiltinOperations ();
    
    // stuff required for reflection / meta programming
    String getName ();
    Collection<? extends BackendType> getSuperTypes ();
    Map<String, ? extends Property> getProperties ();
    Map<String, ? extends StaticProperty> getStaticProperties ();
}

The methods provide the following functionality:

  • create. This method creates an instance of the type at runtime. This method is optional, i.e. it is permissible for a type to throw a runtime exception when it is invoked. And it is obviously up to every language if it provides features that are mapped to object creation expressions in the middle end.
  • get/setProperty. These methods serve to access properties of the data element. If a property does not exist (or is read-only while setProperty is called for it), a runtime exception is thrown. The design idea is that it is the middle end's responsibility to statically ensure correct property access.
  • getBuiltinOperations. This method returns the "methods" directly associated with the type. In the Xpand / Xtend world it has become more and more established best practice to move most methods to extensions rather than to include them in the metamodel. Nonetheless, built-in operations are necessary, and it is a matter of taste in what ways and to what degree languages expose and encourage the use of this feature.
  • getName. This method exists only to make the name of a type available via reflection.
  • isAssignableFrom and getSuperTypes. These methods expose the type hierarchy. The single root of the inheritance hierarchy is ObjectType (see the section on built-in types). Although these methods provide the same functionality, they are both included in the

interface to allow performance optimizations.

  • getProperties. This method allows reflective access to all properties of the type.
  • getStaticProperties. This method allows reflective access to all static properties of the type, i.e. constants and enum values. The reader may have noticed that there is no "getStaticProperty" method to retrieve a static property by name. The reason is that there is no need for such a lookup at runtime - it can be done during startup and therefore falls into the responsibility of the middle end.


Points of interaction with the type system

Different parts interact with the type system in different ways, and this section gives an overview of these.


Inside the backend

Different kinds of expression nodes are based on types, typically expecting them as a parameter in their constructors. The CreateCachedExpression is a typical example of this.

All code associated with choosing and invoking functions also interacts with the type system. So does all code dealing with Java functions.

For details see the section about internals of the backend.


Configuration of type system contributions

The type system relies on contributed implementations in order to deal with complex data types, e.g. for EMF and Java Beans. These contributions must be configured.

The mechanism for this configuration will probably be based on a configuration file of some kind. Using extension points was considered, but a conscious design decision was made against them for two reasons. Firstly, the ordering of the contributed type systems carries semantic significance since it determines which type system is asked first for a given object. Secondly, it must be possible to have different type system configurations in the same work space.


in the middle end

The middle end must translate type information from the specific front end representation to the corresponding BackendTypes. In order to do this, it must know about the detailed semantics of string representations in the language it deals with, and create BackendType instances accordingly, using them to initialize the Expression nodes it creates.

There is no restriction as to how the middle end acquires its BackendType instances. It is perfectly permissible for it to go through the BackenTypesystem as a factory, and there will likely be generic cases where that is the only feasible way. In other contexts it can for example access built-in types through their flyweight means (i.e. public static final fields with the sole instance - see below in the section on built-in types).

And if a language specifically deals only with EMF types, its middle end can of course create EmfType instances directly, although this would deprive it of the support for pluggable type systems.


built-in types

flyweight properties

All built-in types have no variable properties. Each type is represented by a separate class, and a single instance of each of these classes will suffice. Therefore they all have private constructors and a public static final field "INSTANCE" that holds their sole instances.


canonical and Java representations

Every built-in type has a canonical representation inside the backend. That means that there is exactly one Java type for which it is guaranteed that this Java type will be assignable from every instance of the BackendType.

For example, instances of LongType have the type java.lang.Long, and instances of BooleanType have the type java.lang.Boolean. This canonical type can ben an interface, e.g. java.util.Set and java.util.List for SetType and ListType, respectively.

The canonical representation for StringType is java.lang.CharSequence, i.e. every subtype of CharSequence is treated as a string. That allows for String, StringBuffer, StringBuilder etc., but also for EfficientLazyString (see next subsection).

Java however has a richer system of "built-in" types than the backend provides. The reduction of the built-in types to those actually supported was a conscious design decision that should be reviewed thoroughly!

To simplify interaction with Java methods, a variety of implicit type conversions is performed by the backend when interfacing with Java code. They are documented in the class JavaBuiltinConverter. Parameters are implicitly converted to a variety of Java types, e.g. long to int or a List to an array. The reverse transformations are performed for return values.


EfficientLazyString

The backend specifically focuses on template languages, and string concatenation is both fequent and potentially expensive. One common approach to deal with this is to build streaming output directly into the languages, and that works well as far as it goes. Experience has shown however that in non-trivial generators there tends to be supporting "logic" code that performs operations, building and returning strings that are then further processed and only later actually written to the generated file. Concatenating a fully qualified name using recursive descent is a simple example of this.

To address this dilemma, the backend contains a specialized CharSequence implementation, EfficientLazyString. It internally stores the segments it consists of as a tree without actually concatenating them, saving the overhead associated with concatenation.

If it is written to a stream, it does that using recursive descent, i.e. without creating an intermediate string representation in memory. But it can also be passed as a String parameter, e.g. to a Java method (or any other function), and only then will a string representation be created in memory.

This is one of the major performance enhancements in the backend, and it requires diligent use. A debug log statement logging the contents of a file would for example incur significant overhead because it would prevent direct streaming to the file.

Kinds of Expressions

The internal representation of a function in the backend is as a tree of expressions, as is typical of language representations. The backend representation is however further removed from the concrete syntax of the language than is usually the case for an interpreter, so it warrants a section of its own in this documentation.

These expressions are the building blocks of which the functionality supported by the backend is made, and it should therefore be reviewed with particularly thoroughly. Please check the requirements of the other M2T languages against this set of functionality!

Currently, there are expressions to handle common basic functionality. There will likely be additions here, both initially and over time, especially as requirements of new languages become apparent. One area being enhanced as part of the work on "Xtend++" is support for functional programming, e.g. currying.

The following subsections describe them in groups of related functionality.


Operators and operations

Most operators are implemented in the system library and therefore have no specific node representations. The binary boolean operators are an exception because of their shortcut evaluation semantics:

  • AndExpression
  • OrExpression

String concatenation is another exception to avoid ambiguity. Concatenation is sometimes represented using the '+' operator, requiring polymorphic resolution at runtime to decide whether numbers are to be added or strings concatenated. In other contexts and/or languages, no such ambiguity exists, and the backend should know that concatenation is the intended oparation:

  • ConcatExpression

Function invocation carries special semantics if it is performed on a collection, and so there are three different kinds of nodes to enable the middle end to optimize.

  • InvocationOnWhateverExpression. This is the most general of the invocation expressions. It decides its semantics at runtime - if the first parameter is not a collection, the function is invoked "as is", and if the first parameter is a collection, it invokes the function of the same name without the first parameter on every element of the collection, returning a collection with the results. Actually the situation is a little more complex - if the function is defined with a collection as a first parameter, it is executed on the collection in the normal fashion rather than elementwise.
  • InvocationOnCollectionExpression. This expression performs the invocation on all elements of a collection as described in the previous item, but it assumes without checking that the first parameter is a collection. So if the middle end can be sure that the first parameter is a collection and the elementwise invocation is desired, it can create this kind of expression as an optimization.
  • InvocationOnObjectExpression. This is the "normal" invocation - again allowing the middle end optimizations.


Properties

The backend supports the same special semantics for property access as it does for function invocation. This is reflected in the same three kinds of expressions:

  • PropertyOnWhateverExpression
  • PropertyOnCollectionExpression
  • PropertyOnObjectExpression

There is also a kind of expression to modify a property of an object:

  • SetPropertyExpression


Literals

Whereever a value can be determined statically by the middle end, it is passed to the backend in a LiteralExpression:

  • LiteralExpression

There is a special kind of expression for a list literal, i.e. a list that is defined by specifying all its values. Such a ListLiteral is different from a "normal" literal in that it contains a list of expressions, one for each of its elements, that must be evaluated at runtime.

  • ListLiteralExpression

Finally, there is a special kind of expression representing the definition of a "function literal", i.e. a closure. The code with its body is static, but it requires special evaluation at runtime to permanently bind the local variables to its scope.

  • InitClosureExpression


Variables

Local variables come in many concrete syntactical flavors, but most languages have them. Since the middle end can determine statically whether a local variable hides another variable of the same name or has a new name, there are two different kinds of expression to avoid the unnecessary overhead at runtime.

  • NewLocalVarDefExpression
  • HidingLocalVarDefExpression

Local variables are only really useful if they can be accessed, so there is an expression for that:

  • LocalVarEvalExpression

And there is the concept of "global variables", a convenient way to pass constant values from outside to arbitrary places in the code. We are considering the removal of this feature - any opinions?

  • GlobalVarExpression


Control flow

The simplest kind of control flow is the subsequent execution of several expressions in the fashion of statements. This is roughly analogous to the comma operator in C.

  • SequenceExpression

And there is support for the classical control structures:

  • IfExpression
  • SwitchExpression

Note that there is no special kind of expression for loops of any kind. The reason for that is that the semantics of loops are somewhat dependent on the concrete language - e.g. if there is an iterator, and if so, what is its functionality. Does it implicitly concatenate the result of each operation, or is it about side effects? etc. These considerations led to the design decision to move loop functionality into the sytem library - please comment on this and describe your needs for loop support in the system library!

Some of the use cases for loops are by the way dealt with by the collection operations in the system library, such as "collect" or "select".


Object creation

Some languages may require object creation. For this purpose, there are two kinds of expressions:

  • CreateUncachedExpression. This kind of expression just creates a new instance of a given type.
  • CreateCachedExpression. This kind of expression remembers the newly created instance based on a - potentially composite - key. For the same value of the key, it performs the actual creation only once and after that always returns the object created previously.

Usage patterns

This section is intended as a pool for usage patterns, both intended and mined in practical usage. It is intended to grow as we gain experience with the backend.


Templates and File I/O

The backend has no built-in support for I/O operations in general and streaming to an output in particular. This begs the question how to map an m2t language to the backend, since writing data to a file is one of the key functinoality of a generator language.

The intended usage pattern is to have the generator code return an instance of EfficientLazyString and pass that to a syslib function (see class "FileOperations") that deals with the file output.

Internals and implementation

This section finally takes a closer look at the source tree and documents some internals of the implementation. It is intended to be a rough overview of the source tree and the design decisions rather than a replacement for browsing the source and source documentation.


plugin org.eclipse.xtend.backend

This plugin contains the core of the backend and all its generic functionality.


org.eclipse.xtend.backend

This package contains only one public class, "BackendFacade". This class serves as a facade for the backend, even as its name suggests.

It provides functionality to create a fully initialized ExecutionContext (see package org.eclipse.xtend.backend.common), and to invoke a function on a given ExecutionContext.


org.eclipse.xtend.backend.common

This package contains all the abstractions - mostly Java interfaces - that the rest of the backend relies on.

For this overview, the ExecutionContext interface is most relevant. It serves as a container for the entire state that the backend requires to execute a program. Some of this state is "quasi-static", being set once at initialization time (e.g. the function definitions), while other parts of this state are highly volatile (e.g. the local variables currently in scope).

In detail, the ExecutionContext consists of the following parts:

  • LocalVarContext. This is where the backend stores the local variables that are currently in scope during evaluation. It is basically a Map with the variable names as keys and their values as values.
  • GlobalVarContext. This part contains the "global variables" that can be read everywhere in the code but are initialized only up front.
  • BackendTypesysten. Since the typesystem in use can vary from invocation to invocation, it is also stored in the ExecutionContext.
  • FunctionDefContext. This is a collection with all functions that are currently "in scope", i.e. visible. It is used as the basis for polymorphic resolution.
    The FunctionDefContext varies for different parts of a program, potentially every function, more typically every compilation unit. This allows the implementation of different "visibilities" of functions, include mechanisms etc. It is the responsibility of the middle end to initialize this properly.
  • FunctionInvoker. This is the single point through which all function invocations go. It takes care of the caching functionality (see above in the section on Functions). It is easy to mix this up with the FunctionDefContext, but the two have different functionality, potentially different life cycles and different scope.
    The FunctionDefContext provides the list of (named) functions that are available in the current scope - and it is replaced in the ExecutionContext when the scope changes, e.g. because an invocation went into a different compilation unit. The FunctionInvoker on the other hand remains the same for the entire duration of an invocation, caching calls to the same function regardless of where they originated.
    all FunctionDefContexts together are the representation of the program, they contain the definitions of all functions the program consists of. For a given program, they never change and can be reused for multiple subsequent calls without impact on their semantics. The FunctionInvoker on the other hand collects cache data of previously invoked functions. For subsequent invocations, it is possible to either use a new FunctionInvoker instance or reuse the old one, causing different behavior (that can both be intended, depending on the situation).
  • CreationCache. This is where the backend stores the cache of newly created objects (see section on Kinds of Expressions). The same lifecycle considerations as for FunctionInvoker apply here.
  • ContributionStateContext. This is a pass-through black box where libraries etc. can store data they wish to preserve for the duration of an invocation.

There is also the method "logNullDeRef". The rationale behind this method is that code execution in the backend should be robust with regard to null values - models that are wrongly or incompletely initialized should not cause the execution to abort with a NullPointerException. On the other hand, in some contexts it is desirable for a developer to debug a faulty generator chain and be informed about such null dereferencing. Therefore the backend logs all occurrences of a dereferenced null pointer through this method, using a special logger, but continue program execution without throwing an exception.


org.eclipse.xtend.backend.expr

This package contains all the implementation classes for the different kinds of expressions.


org.eclipse.xtend.backend.functions

This package contains all code related to functions - the Function interface, Closure, PolymorphicResolver etc.


org.eclipse.xtend.backend.functions.java

This package knows all about how to treat Java methods as functions. It was originally part of the java type system plugin but was moved here because its functionality is frequently required without explicitly using Java Beans.

All conversions from and to the canonical, internal type representations are implemented here.

The class "JavaDefinedFunction" is of particular interest because it contains most of the logic for calling Java code from the backend. It has a factory method to extract all public methods from a given Java class and wrap them in Functions. Static methods are obviously called without an instance associated to them, but if there are non-static methods, there is a guarantee that for every ExecutionContext, there will be exactly one instance of this class, and all calls will be made on it.

If a Java class implements the marker interface ExecutionContextAware, the backend will always inject the current ExecutionContext prior to any invocation.

The annotation M2tHidden is also defined here. A method annotated with it will be ignored by the factory method in JavaDefinedFunction.

org.eclipse.xtend.backend.iface

This package only contains the interface BackendContributor. It is intended as a common abstraction to facilitate integration of different languages, but that part is very much work in progress.


org.eclipse.xtend.backend.types

This package contains convenience and default implementations that make implementing types and type sytems easier.


org.eclipse.xtend.backend.types.builtin

This package contains all the built-in types.


org.eclipse.xtend.backend.util

This package contains utility classes that are of a general nature, e.g. data structures.


plugin org.eclipse.xtend.backend.emftypes

This plugin contains the EMF implementation of BackendTypesystem.


plugin org.eclipse.xtend.backend.javatypes

This plugin contains the the Java Beans implementation of BackendTypesystem.

plugin org.eclipse.xtend.syslib

This is where the system library is implemented. The class SysLibContributor serves as a factory to access the completely initialized system library.

This library contains only functionality that is meaningful to a variety of languages. If a language requires specific functiona to support its semantics, it should provide its own, specific "library" (see ...xtendlib). These specific libraries are of course not imported implicitly, and it is the middle end's responsibility to take care of that.

plubin org.eclipse.xtend.middleend

This plugin contains commonly useful code for all middle ends.


plugin org.eclipse.xtend.middleend.old

This plugin - while maybe not ideally named - contains the middle end implementation for the current versions of Xpand and Xtend. They are largely finished but still work in progress, and they are not well tested. They are however conceptually sound and should serve to illustrate the ideas and concepts.


plugin org.eclipse.xtend.middleend.old.test

This plugin is located in the test branch of the CVS repository. It contains a rudimentary main method to exercise the middleend implementation.

To Do