Difference between revisions of "M2TBackend"

Revision as of 14:07, 22 January 2008

Common platform for M2T languages

This page describes both the ideas behind the common M2T backend and its implementation.

The backend originated as part of the evolution of the Xpand and Xtend languages, and the packages are currently named accordingly. It is however intended as a runtime environment for all M2T languages, supporting common performance optimizations, interoperability and potential reuse of other code. If it proves to be useful this way, it should probably be moved to a non-xtend namespace to reflect its common nature.

Comments are very welcome and indeed necessary to incorporate the requirements of other languages than Xpand and Xtend and make the backend useful to them.

The backend code is currently located in the modeling CVS at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.backend* with related code at org.eclipse.m2t/org.eclipse.xpand/plugins/org.eclipse.xtend.middleend*. This distinction will be explained in the remainder of this document.

Overview

Design goals

The design of the backend was driven by the following forces:

Performance. In large projects, generator speed is an issue, and the backend is designed with performance in mind. This requirement is what actually sparked its development in the first place.
Compiler. For performance and/or obfuscation reasons, the backend will serve as a basis for compilation into Java classes.
Language independence. Concrete languages evolve, concrete convenience syntax is added, and anyway different languages have different concrete syntaxes. In order to leverage the performance tuning effort, the backend is designed to be largely independent of the concrete syntax of languages built on top of it. The developer skills and mindset required for frontend and backend development are quite different, and the separation gives performance and universality efforts of the backend a more stable basis. This is probably a point where some implicit assumptions will prove less general than desirable, and feedback from other m2t language development teams is necessary.
Independent of parse tree. There is a strict and complete separation between the data structures used by the backend and those used by the frontends. The previous item explained how this separation is useful for the backend, but the development of frontend tooling also benefits. Since the parse tree of the frontend need not directly serve as a basis for execution, it becomes simpler to implement features like fault tolerant parsing.
Language interoperability. The common backend is intended to facilitate interoperability of languages, i.e. making it as simple as possible to have code in one language call code written in another.
Reuse of Tooling. The backend will incorporate support for tooling that requires runtime support - debugging, profiling etc. - in such a way as to minimize implementation effort for the different languages that wish to support them.

Layers

The backend serves as the runtime environment, and its data structures are independent of the concrete syntax of a given language. The frontend tooling on the other hand is intended to use its own AST that should be free of runtime concerns.

Therefore a translation layer is introduced, called middle end for want of a different term. It is specific for every concrete language, and its purpose is to transform the AST of the frontend into the data structures required by the backend. This involves mainly the following transformations which will be explained in more detail in subsequent sections:

functions. The data structures representing code are structured around the key abstraction of function in the backend. A function is a piece of code that can be called using parameters and that returns an object.
primitive operations. The code inside a function is represented by a tree of expressions. Since stability is one of the key design requirements of the backend, the middle ends must map the specific functionality of that language onto the given set of expression nodes available in the backend.
types. Since m2t languages - just like many other languages - operate on data, data types have a representation in the backend. So every middle end must transform data types from the language specific representation into the common representation of the backend.

Execution sequence

In line with the performance goal of the backend, the backend is as static in its execution as possible. Everything that can be evaluated by the middle end has no support in the backend.

One prominent aspect that is affected by this decision is parsing of source code, which is left entirely to the middle end. The backend is designed so as to never parse any resources. This decision deeply affects the function resolution strategy and other implementation aspects of the backend, and therefore it should be reviewed especially carefully for its implications at an early stage if possible!

From an execution perspective, several steps must be performed in order to execute a program:

Call the middle end to transform the program into the backend representation. This requires the frontend AST as an input, so the middle end will call the corresponding front end parser. The output of this step is an initialized backend data structure.
There is a conscious design decision at this point. At first glance, it would also be possible to pass the front end AST to the middle end instead of having the middle end call the front end parser. This approach would work well for single source files, but it would be difficult to maintain if one source file referenced another, potentiall even written in a different language. Therefore the decision was made to have the middle end call the front end parser.
Initialize the runtime data structure for the backend. This step can be implicitly performed by a facade, but it allows detailed control over reuse of runtime data across several invocations (see below for details).
Actually call the backend to "execute" the data structures returned by the middle end.

Functions

Key abstractions

Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document) and one to provide information about its parameter types (for details about types, see the section about types):

public interface Function {
    public Object invoke (ExecutionContext ctx, Object[] params);
    List<? extends BackendType> getParameterTypes ();  
    
    ... // other methods that will be explained later
}

There is no distinction between "stand-alone functions" and "methods defined on a type" - they are all represented by functions. If a concrete language supports "method" style calls, these must be mapped to functions with an additional first parameter of the "class" type by the respective middle end.

So, for example

MyType.doSomething (int a, String b)

would be represented by

doSomething (MyType this, int a, String b)

Functions have no name per se, they are like function pointers in C in this regard. This supports a functional style of programming, Closures etc. In contexts where functions require a name, they are represented by the NamedFunction class that is basically a container for a function and a string representing its name.

Polymorphism

It is possible - even desirable - for several functions to have the same name, as long as they differ in their parameter types. Every function invocation made by the backend is done polymorphically, i.e. if there are several functions with the right name and number of parameters, the best fit is picked and invoked.

This choice is done based on the actual object types of the parameters and not on the "reference type" (a concept that the backend has no notion of) as in Java. So the resolution is done at runtime rather than compile time, and if there is no matching function (or no single "best match"), a runtime error occurs.

Let us for example assume we have the following functions ("List" and "Set" being subtypes of "Collection"):

  f(Collection c, Object o);
  f(List       c, Object o);
  f(Object o, Collection c);

The following table describes the backend behavior based on the types of the parameters passed to the function:

first parameter type	second parameter type	function being invoked
List	String	second function is invoked. Both first and second function match, but the second is more specific.
Set	String	first function is invoked because it is the only match.
String	String	none of the function signatures match --> runtime error
List	List	All three functions match. The second function is a specialization of the first, but the second is no specialization of the third, nor the other way round --> runtime error

Linking and Function Scope

In order for polymorphic resolution to work, the backend has the concept of the scope of functions. It is part of the definition of every function to know which other functions are visible from it. It is part of the static linking nature of the backend that the middle end must determine this list of all named functions visible from within a given function. It is stored in a data structure called FunctionDefinitionScope; for details on how it is stored and initialized, see the section on data structures.

For languages that support referencing functions from other compilation units, e.g. using fully qualified names like Xpand, this requires the middle end to analyze which functions are potentially referenced from a given compilation unit and initialize the FunctionDefinitionScope accordingly.

There are some standard functions available in the so-called system library (syslib). These functions are of a general nature and are available to all languages implicitly. They comprise things like string concatenation, basic collection operations or streaming data to a file. Since the backend supports several concrete languages that may each have their own concrete syntax for these functions, the middle end will in general have to perform a mapping from the concrete syntax to the syslib names of the corresponding functions.

Advanced concepts

Backend functions support advanced functionality that may or may not be required by a given frontend language. They are represented by methods in the Function interface, and they are designed not to get in tahe way if they are not required.

Guards

It is possible for a function to be associated with a so-called guard expresson, or guard. The Function interface has a getter method for it:

public interface Function {
    ExpressionBase getGuard();
    ... // other methods
}

Such a guard is a predicate (boolean expression) that is evaluated for all candidate functions before polymorphic resolution is performed. Only functions for which the guard evaluates to true are actual candidates for execution.

If a function has no guard, the getGuard() method can simply return null.

Caching

The backend supports caching of function invocations. To indicate this, the Function interface has a method

boolean isCached ();

If a method is cached, the backend executes it only once for a given set of parameters, remembering the result and returning the result from its cache for subsequent invocations. The implementation takes care of direct or indirect recursion.

This feature is useful for two scenarios, performance optimization and functions with side effects.

performance optimization. A language can provide a feature to annotate expensive functions at the source level so that their results need not be computed more than once.
functions with side effects. This feature is invaluable for functions with side effects that should be executed only once, especially during the creation of object graphs. For this purpose, a given language may or may not choose to expose the feature in its syntax, but it will likely benefit from using it internally.

@@ Line 53: / Line 53: @@
 === Key abstractions ===
-Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document):
+Functions are the backend abstraction for any code that can be called. They are represented by the Function interface (see below for details), they can be passed parameters and they always return an object (see the section on usage patterns for a description of how e.g. the implementation of a template language is intended). The Function interface has a method to invoke it (ignore the ExecutionContext for now. It is a data structure containing all runtime relevant data and will be explained later in this document) and one to provide information about its parameter types (for details about types, see the section about types):
 <code><pre>
 public interface Function {
      public Object invoke (ExecutionContext ctx, Object[] params);
+    List<? extends BackendType> getParameterTypes ();
      ... // other methods that will be explained later
@@ Line 154: / Line 155: @@
 * ''performance optimization''. A language can provide a feature to annotate expensive functions at the source level so that their results need not be computed more than once.
 * ''functions with side effects''. This feature is invaluable for functions with side effects that should be executed only once, especially during the creation of object graphs. For this purpose, a given language may or may not choose to expose the feature in its syntax, but it will likely benefit from using it internally.
 == Types ==

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "M2TBackend"

Revision as of 14:07, 22 January 2008

Contents

Common platform for M2T languages

Overview

Design goals

Layers

Execution sequence

Functions

Key abstractions

Polymorphism

Linking and Function Scope

Advanced concepts

Guards

Caching

Types

Kinds of Expression

Usage patterns

Internal data structures

To Do

Breadcrumbs

Notice: this Wiki will be going read only early in 2024 and edits will no longer be possible. Please see: https://gitlab.eclipse.org/eclipsefdn/helpdesk/-/wikis/Wiki-shutdown-plan for the plan.

Difference between revisions of "M2TBackend"

Revision as of 14:07, 22 January 2008

Contents

Common platform for M2T languages

Overview

Design goals

Layers

Execution sequence

Functions

Key abstractions

Polymorphism

Linking and Function Scope

Advanced concepts

Guards

Caching

Types

Kinds of Expression

Usage patterns

Internal data structures

To Do