Skip to main content
Jump to: navigation, search

Difference between revisions of "JDT Core Programmer Guide/ECJ"

(initial overview)
Line 53: Line 53:
** <code>finalizeProblems</code>: before errors and warnings are actually reported to the user, they are filtered by any <code>@SuppressWarnings</code> annotations found in the source.
** <code>finalizeProblems</code>: before errors and warnings are actually reported to the user, they are filtered by any <code>@SuppressWarnings</code> annotations found in the source.
===Subpages dedicated to these phases===
* [[/Parse]]
==Demand-driven Computations==
==Demand-driven Computations==

Revision as of 08:04, 30 June 2020

A Hitchhiker's Guide to ECJ

What IS the Compiler / ECJ?

Strange enough this question does not have a single true answer.

Project organization

The following locations contribute to the compiler:

  • org.eclipse.jdt.core/compiler
  • org.eclipse.jdt.core/batch
  • org.eclipse.jdt.compiler.tool
  • org.eclipse.jdt.compiler.apt

Since the compiler does not directly correspond to any project / plug-in the following measures are relevant:

  • Classes in source folders compiler and batch are not allowed to access classes in other source folders of org.eclipse.jdt.core. To avoid any violations, a secondary project has been created: org.eclipse.jdt.core.ecj.validation. This project should be imported into the workspace before working on the compiler. It contains only links to the two mentioned source folders and will signal errors, if any class outside this scope is used. The project is not intended for editing.
  • During production builds class files from different projects need to be merged into the single ecj.jar (this jar file is created as org.eclipse.jdt.core-*-SNAPSHOT-batch-compiler.jar and renamed to ecj.jar afterwords). Search for "batch-compiler" in pom files of the projects mentioned above, to see how the compiler is assembled.
  • Additionally, an ant script exists, org.eclipse.jdt.core/scripts/export-ecj.xml, that should allow manually creating ecj.jar from within Eclipse. This script is also executed when building org.eclipse.jdt.core using PDE/Build, probably happening also when interactively exporting org.eclipse.jdt.core as a deployable plug-in using the export wizard.

The ant adapter

For using ecj with ant, jdtCompilerAdapter.jar is created from org.eclipse.jdt.core/antadapter. The same class files are also added to ecj.jar.

Interfacing with other components

  • Name Environments: To interface with its environment, the compiler needs an instance of org.eclipse.jdt.internal.compiler.env.INameEnvironment. During batch compilation, class org.eclipse.jdt.internal.compiler.batch.FileSystem is used. But using different implementations of this interface other components like the builder can provide required classes into the compiler.
  • IBinaryType: Different use cases use different implementations to represent existing .class files to which Java sources being compiled can refer.
  • ITypeRequestor: Whenever the new type is found from the name environment, it is first passed to methods of the type requestor. Normally, the Compiler itself acts as the type requestor, which will add a representation of the discovered type to the internal data structurs of the compiler, but code assist, type hierarchy, search and indexing each have their own implementation of these hooks into the compiler.

Variants of the compiler

The central class is org.eclipse.jdt.internal.compiler.Compiler, which is used as-is in some use cases, but also a few subclasses exist, which are variants of the compiler, with purposes different from generating .class files. In other use cases, not the Compiler class, but org.eclipse.jdt.internal.compiler.parser.Parser is subclasses to achieve different functionality. The latter strategy is used notably for code select and code complete functionality.


As is standard in compiler technology, ecj operates on Java files in several phases, which are roughly outlined as:

  • Scan and parse, i.e., transform a character stream first into a stream of tokens, then into the abstract syntax tree (AST)
  • Build and connect type bindings, i.e., overlay the syntactic tree structure (AST) with a semantic graph of bindings.
  • Verify methods: analyse inheritance, overriding and overloading of methods
  • Resolve: interpret identifiers and link them to the bindings which they represent
  • Analyse: perform flow analysis in order to detect errors like variables read before assigned, final variables re-assigned, and also analysis of (potential) null pointers and resource leaks. This phase may also detect a few more errors that need the AST to be fully resolved.
  • Generate: Allocate positions to variables (as used in load and store operations of the byte code), then generate the byte code, in the steps shown below. Note that still during code generation some errors may be detected and reported.
    • Generate the general class file structure with relevant byte code attributes
    • Generate the Code attributes containing the actual byte code instructions for methods, constructors and initializers.

Looking at class Compiler the phases are written slightly differently:

  • beginToCompiler/internalBeginToCompile:
    • parse or dietParse
    • buildTypeBindings
    • completeTypeBindings, here bindings are linked / connected with each other, which requires all bindings to already exist.
  • optionally: processAnnotations
  • processCompilationUnits / process -- at this point a separate compilation thread may be spawned: ProcessTaskManager
    • getMethodBodies: the initial parse may have skipped method bodies, parse them now, perhaps only selectively
    • faultInTypes: ensure that all bindings are properly created and initialized
    • verifyMethods
    • resolve
    • analyseCode
    • generateCode
    • finalizeProblems: before errors and warnings are actually reported to the user, they are filtered by any @SuppressWarnings annotations found in the source.

Subpages dedicated to these phases

Demand-driven Computations

In addition to the sequential process outlined by the phases above, some computations will be triggered on demand.

JDT had several bugs that were caused by demand-driven computations happening at an unexpected times. The question, which computation can safely be invoked at which point during compilation, doesn't have a simple answer. Some assumptions have never been made explicit.

As an example, invoking ReferenceBinding.getAnnotationTagBits() can cause subtle errors when the receiver is a SourceTypeBinding. Then a typical effect is re-entrance of a compilation step that is not prepared for re-entrance.

Other candidates that have caused misunderstandings in the past are methods unResolvedMethods and unResolvedFields (declared in ReferenceBinding), which may be doing more than what their names suggest.

Throughout the compiler implementation, many fields are public and are directly accessed all over the place. Obviously, direct field access has not effect on compilation order, hence in terms of processing order, this is OK. If, however, a field only has an accessor method, or if the field has documentation saying it should not be accessed directly, this is a start of a hint that more stuff may be happening than just reading a field, which is the first step towards influencing the order of compilation steps.

Back to the top