Skip to main content
Jump to: navigation, search

Difference between revisions of "JDT Core/Null Analysis/Options"

Line 1: Line 1:
{{Note|Disclaimer|This page is currently subject to discussion and does not necessarily describe the actual implementation in the JDT}}
+
{{Note|Disclaimer|This page is currently subject to discussion and does not describe the actual implementation in the JDT}}
  
The analysis of possible null pointer exceptions performed by the Eclipse Java Compiler is powerful, but the full power may result in a large number of errors and warnings. Not all projects can afford addressing all these issues.
+
The analysis of possible null pointer exceptions performed by the Eclipse Java Compiler is powerful, but the full power may result in a large number of errors and warnings. Not all projects can afford addressing all these issues (at once).
  
One potential solution might be to make the compiler "smarter" by adding heuristics to "guess" which problems are probably uninteresting, and which ones are the real bugs.
+
This page discusses the problem from two different directions:
 +
* On the road towards a fully checked, NPE-free program, what are the '''risks''' needing to be addressed before the solution can be considered complete, i.e., fully safe?
 +
* Should a more '''lenient''' analysis be applied to reduce the number of warnings perceived as uninteresting?
 +
** What are the risks that more relaxed rules would introduce?
 +
** What are the patterns of coding that (some) users would like to see accepted without a warning? What assumptions need to be maintained to render these patterns safe?
  
This page describes the opposite approach: let the compiler soberly perform its analysis without any bias to any coding style. However, empower the user to stepwise enable this analysis for gradually achieving more and more safety.
+
==Risks of NPE despite annotation based null analysis==
 +
If annotation based null analysis applies strict pessimistic rules, a full guarantee of absence of NPE is only threatened by two risks:
 +
* specification of nullness is either incomplete or incompletely checked
 +
* checked null specifications are bypassed at runtime
  
The following sections each address one potential reason why null pointer problems might remain undetected, pointing at strategies for avoiding each particular kind of risk.
+
Aside from these an approach that strictly views null annotations as an extension of the type system is fully safe.
  
==Incomplete Specification==
+
===Incomplete Specification===
  
 
If you start with an existing project and only turn on annotation based null analysis, you will not notice any difference. This is because the analysis implicitly supports ''three'' kinds of reference types:
 
If you start with an existing project and only turn on annotation based null analysis, you will not notice any difference. This is because the analysis implicitly supports ''three'' kinds of reference types:
Line 17: Line 24:
 
So before you actually add annotations to your methods and fields, all types will be considered as legacy types and so the unsafe semantics of Java apply where both assigning null ''and'' dereferencing are legal.
 
So before you actually add annotations to your methods and fields, all types will be considered as legacy types and so the unsafe semantics of Java apply where both assigning null ''and'' dereferencing are legal.
  
===Adding individual annotations===
+
====Adding individual annotations====
 
This path is obvious: add just those annotations where you are certain about the design intent and after each annotation added address the problems reported by the compiler.
 
This path is obvious: add just those annotations where you are certain about the design intent and after each annotation added address the problems reported by the compiler.
  
===Making nonnull the default===
+
====Making nonnull the default====
 
Instead of going in tiny little steps you may want to do jumps of some size (cf. {{bug|331647}}):
 
Instead of going in tiny little steps you may want to do jumps of some size (cf. {{bug|331647}}):
 
* annotate a method as @NonNullByDefault to affect all its parameters and its return
 
* annotate a method as @NonNullByDefault to affect all its parameters and its return
Line 31: Line 38:
 
For this situation a parameter has been added to the default annotation. Declaring a type that inherits from a legacy type with @NonNullByDefault(false) cancels the applicable default.
 
For this situation a parameter has been added to the default annotation. Declaring a type that inherits from a legacy type with @NonNullByDefault(false) cancels the applicable default.
  
===Third-party code===
+
====Third-party code====
 
Every project depends on third party code, which it doesn't control, so adding null annotations is not directly possible.
 
Every project depends on third party code, which it doesn't control, so adding null annotations is not directly possible.
  
Line 38: Line 45:
 
Support for this feature is planned for version 3.9, see {{bug|331651}}.
 
Support for this feature is planned for version 3.9, see {{bug|331651}}.
  
===Avoiding the risk of incompleteness===
+
====Avoiding the risk of incompleteness====
 
In order to ensure that no unchecked legacy types are used in an application, these things must be ensured:
 
In order to ensure that no unchecked legacy types are used in an application, these things must be ensured:
 
* @NonNullByDefault is the globally enforced default
 
* @NonNullByDefault is the globally enforced default
Line 50: Line 57:
 
'''TODO''': Add a note to {{bug|331651}} for checking complete coverage of referenced libraries by available nullity profiles.
 
'''TODO''': Add a note to {{bug|331651}} for checking complete coverage of referenced libraries by available nullity profiles.
  
==Side effects==
+
----
The simplest problem with analysing the nullness of fields results from side effects in methods:
+
===Bypassing compile time checks===
 +
Any guarantees given by the analysis can still be bypassed at runtime if one of the following is applied:
 +
* runtime reflection beyond pure introspection, i.e, whenever reflective field access or method invocation is involved.
 +
* bytecode manipulation, i.e., whenever the bytes being executed by the VM are not identical to the bytes produced at the time when null analysis was performed.
 +
 
 +
These risks are typically considered outside the scope of linguistic means, developers know about these issues at a general level and don't expect a static analysis to consider these.
 +
 
 +
==Lenient analysis==
 +
Some warnings/errors flagged by a strict analysis may seem counter intuitive. Some may actually be irrelevant in specific contexts.
 +
 
 +
Practical investigations showed that many of those "unwanted" warnings/errors relate to field access.
 +
When focusing on these, the most obvious approach at distinguishing interesting from uninteresting warnings is to apply some kind of flow analysis for fields (which the pessimistic approach strictly avoids).
 +
 
 +
===Risks of flow analysis for fields===
 +
When flow analysis is applied to fields the following reasons may cause unexpected NPE:
 +
* side effects
 +
* aliasing
 +
* concurrency
 +
 
 +
Null analysis for the following kinds of fields is essentially unaffected by  these risks:
 +
* final fields
 +
* @NonNull fields
 +
For both kinds of fields only object initialization may produce unexpected results (yes, even Java's definite assignment rule can be circumvented). Once initialized their null status is either known or amenable to flow analysis.
 +
 
 +
''TODO: more precise distinction for final fields: is static relevant? does initialization occur directly as part of the field declaration?''
 +
 
 +
====Side effects====
 +
The simplest problem with flow analysis for fields results from side effects in methods:
 
<source lang="java">
 
<source lang="java">
   if (this.f != null)
+
class X {
    System.out.println(this.toString() + this.f.toString());
+
   @Nullable Object f;
 +
  void test() {
 +
      if (this.f != null) {
 +
          foo();
 +
          this.f.bar();
 +
      }
 +
  }
 +
  void foo() { /* arbitrary code here */ }
 +
}
 
</source>
 
</source>
In contrived situations the execution of <code>this.toString()</code> could potentially assign null to <code>f</code> thus invalidating the above null check.
+
The execution of <code>foo()</code> could potentially assign null to <code>f</code> thus invalidating the above null check.
 
+
===Avoiding risks of side effects regarding null analysis===
+
The most common model for avoiding that side effects spoil null analysis is to discard any nullness information for fields whenever a method call is seen. In the above example this would imply that the call <code>this.f.toString()</code> is ''not'' considered as safe.
+
  
==Aliasing==
+
====Aliasing====
When analysing nullness of fields the following snippet demonstrates how aliasing threatens the validity:
+
When applying flow analysis for checking nullness of fields the following snippet demonstrates how aliasing threatens the validity:
 
<source lang="java">
 
<source lang="java">
 
class X {
 
class X {
     Object f;
+
     @Nullable Y f;
     void foo(X other) {
+
     void test(X other) {
 
         if (this.f != null) {
 
         if (this.f != null) {
 
             other.f = null;
 
             other.f = null;
             System.out.println(this.f.toString()); // potential NPE
+
             this.f.bar(); // potential NPE
 
         }
 
         }
 
     }
 
     }
 
     void breakIt() {
 
     void breakIt() {
         foo(this); // definitely triggers the NPE
+
         test(this); // definitely triggers the NPE
 
     }
 
     }
 
}
 
}
Line 79: Line 118:
 
Without further annotations an intra-procedural analysis cannot see that the assignment to <code>other.f</code> affects the value of <code>this.f</code>.
 
Without further annotations an intra-procedural analysis cannot see that the assignment to <code>other.f</code> affects the value of <code>this.f</code>.
  
A potential remedy would be to discard all null information for a field once an assignment to the same field is seen, regardless of the receiver. This would put the analysis in a state where at the toString() invocation we have no information about f's null status.
+
====Concurrency====
 +
Also concurrency can invalidate the null related guarantees of flow analysis for fields. In a concurrent application not even this snippet is safe:
 +
<source lang="java">
 +
  if (this.f != null)
 +
      this.f.bar();
 +
</source>
  
Another remedy is to restrict analysis of fields by a simplistic implied ownership model: consider that each non-static field is ''owned'' by the enclosing object and that only references via <code>this</code> have permission to update the field. Conversely, only for read access via <code>this</code> can we assume that our analysis has the information of the field's null status. This analysis would falsely consider the above example as safe. If additionally assignment to foreign fields like <code>other.f</code> would be prohibited the bug could be detected, regaining safety.
+
Thus concurrency adds to the problems of side effects and aliasing, because null information from one expression may already be invalid when evaluating the very next expression.
  
More strategies could be thought of. Without ownership annotations none of these are fully satisfactory. Usefulness highly depends on the coding style: if objects strictly encapsulate their fields, the poor-men's ownership model would be suitable. If access to foreign fields is common other rules are a better fit.
+
===Drawing a line: what intermediate code to admit?===
 +
Given the above risks, we can admit certain styles of code if a certain risk is considered irrelevant in a given context. The discussion will use this basic question:
 +
<source lang="java">
 +
class X {
 +
  @Nullable Object f;
 +
  void test() {
 +
      if (this.f != null) {
 +
          // what code is allowed here?
 +
          this.f.bar(); // want to be sure there's no NPE here
 +
      }
 +
  }
 +
}
 +
</source>
 +
What code can we admit at the designated location? What assumptions must a developer ensure to render that code valid?
  
===Avoiding the risk of aliasing regarding null analysis===
+
====Admitting method calls====
Null analysis for the following kinds of fields is actually unaffected by  aliasing:
+
If the code between check and dereference contains any method call, we have the fragment used for demonstrating side effects:
* final fields
+
<source lang="java">
* @NonNull fields
+
class X {
For both kinds of fields only object initialization may produce unexpected results (yes, even Java's definite assignment rule can be circumvented). Once initialized their null status is either known or amenable to flow analysis.
+
  @Nullable Object f;
 +
  void test() {
 +
      if (this.f != null) {
 +
          foo();
 +
          this.f.bar();
 +
      }
 +
  }
 +
}
 +
</source>
 +
It turns out that all three risks mentioned above are effective in this situations.
  
For non-final @Nullable fields the simplest strategy for avoiding nullness related risks due to aliasing is to pessimistically assume potential null at ''every'' read. This means for strict checking no flow analysis should be applied to @Nullable fields.
+
This requires to confidently state the assumption that one's application is free from side effects, aliasing and concurrency. In typical object-oriented code that assumption, however, is unfounded. Thus admitting any method call in this location lacks theoretical nor practical foundation.
  
While this appears to be a very drastic restriction, the remedy is quite easy: before dereferencing a @Nullable field it has to be assigned to a local variable. Flow analysis is then safely applied to the local variable with no risk of aliasing, since local variables are truly owned.
+
====Admitting field assignments====
 
+
==Concurrency==
+
Also concurrency can invalidate the guarantees of flow analysis. In a concurrent application not even this snippet is safe:
+
 
<source lang="java">
 
<source lang="java">
   if (this.f != null)
+
class X {
      System.out.println(this.f.toString());
+
   @Nullable Object f;
 +
  void test(X other) {
 +
      if (this.f != null) {
 +
          other.f = null;
 +
          this.f.bar();
 +
      }
 +
  }
 +
}
 
</source>
 
</source>
  
Thus concurrency adds to the problems of side effects and aliasing, because null information from one expression may already be invalid when evaluating the very next expression.
+
When demonstrating the effects of aliasing we discussed that the above fragment is not OK.
  
A pragmatic remedy is to let nullness information "age" quickly. However, it is unclear, how "quick" would be quick enough, given the pessimistic observation above.
+
Still similar patterns could be accepted, if both of the following hold:
 +
* no other thread concurrently accesses the field f, ''and''
 +
* we can statically determine that a field f being assigned is ''different'' from the field being dereferenced afterwards.
 +
 
 +
For the latter assumption we apply one of several strategies:
 +
* full alias analysis (not realistic for the JDT)
 +
* static approximation: ignore that different objects hold different incarnations of the same field, treat any assignment to a field as invalidating the analysis results for all objects of that class.
 +
 
 +
Using static approximation the above fragment would be considered as unsafe, because the assignment refers to the same field, although at runtime different incarnations (of different receiver instances 'this' and 'other') may be involved. This static approximation will thus flag a few locations that might be OK given more knowledge about the instances 'this' and 'others', but the guarantees given by this approach can only be broken by concurrency.
 +
 
 +
===Drawing a line: what kinds of field references to include?===
 +
 
 +
''TODO''
 +
 
 +
...
  
===Avoiding the risks of concurrency regarding null analysis===
 
Again the same drastic restriction helps, that already helped for aliasing: don't rely on flow analysis for fields. Local variables are not subject to concurrent access (with the only exception of local variables shared by local classes, but then the local variable must be final and thus cannot change its null status).
 
  
 
==Summary==
 
==Summary==
Any flow analysis performs much better for local variables than for fields. Shifting all serious computation to local variables makes for much safer code. Whether or not a project can afford to fully enforce this strategy depends on many factors.
+
''TODO: clean-up this section and incorporate findings from team discussions''
  
Two exceptions exist:
+
For non-final @Nullable fields the simplest strategy for avoiding any nullness related risks is to pessimistically assume potential null at ''every'' read. This means for strict checking no flow analysis should be applied to @Nullable fields.
  
* @NonNull fields don't require any flow analysis (expect for the phase of object initialization).
+
While this appears to be a very drastic restriction, the remedy is quite easy: before dereferencing a @Nullable field it has to be assigned to a local variable. Flow analysis is then safely applied to the local variable with no risk of side effects, aliasing nor concurrency, since local variables are not shared with any code locations that would be outside the scope of the analysis. I.e., the flow analysis can see everything it needs to consider regarding local variables.
* Flow analysis for unannotated final fields is useful to find out their null status. Once determined, this status cannot change (except again for initialization issues).
+
 
 +
Any flow analysis performs much better for local variables than for fields. Shifting all serious computation to local variables makes for much safer code. Whether or not a project can afford to fully enforce this strategy depends on many factors.
  
 
This lets me conclude that these questions deserve further investigation:
 
This lets me conclude that these questions deserve further investigation:

Revision as of 10:30, 16 February 2012

Note.png
Disclaimer
This page is currently subject to discussion and does not describe the actual implementation in the JDT


The analysis of possible null pointer exceptions performed by the Eclipse Java Compiler is powerful, but the full power may result in a large number of errors and warnings. Not all projects can afford addressing all these issues (at once).

This page discusses the problem from two different directions:

  • On the road towards a fully checked, NPE-free program, what are the risks needing to be addressed before the solution can be considered complete, i.e., fully safe?
  • Should a more lenient analysis be applied to reduce the number of warnings perceived as uninteresting?
    • What are the risks that more relaxed rules would introduce?
    • What are the patterns of coding that (some) users would like to see accepted without a warning? What assumptions need to be maintained to render these patterns safe?

Risks of NPE despite annotation based null analysis

If annotation based null analysis applies strict pessimistic rules, a full guarantee of absence of NPE is only threatened by two risks:

  • specification of nullness is either incomplete or incompletely checked
  • checked null specifications are bypassed at runtime

Aside from these an approach that strictly views null annotations as an extension of the type system is fully safe.

Incomplete Specification

If you start with an existing project and only turn on annotation based null analysis, you will not notice any difference. This is because the analysis implicitly supports three kinds of reference types:

  • @NonNull types
  • @Nullable types and
  • types with unspecified nullness (legacy types).

So before you actually add annotations to your methods and fields, all types will be considered as legacy types and so the unsafe semantics of Java apply where both assigning null and dereferencing are legal.

Adding individual annotations

This path is obvious: add just those annotations where you are certain about the design intent and after each annotation added address the problems reported by the compiler.

Making nonnull the default

Instead of going in tiny little steps you may want to do jumps of some size (cf. bug 331647):

  • annotate a method as @NonNullByDefault to affect all its parameters and its return
  • annotate a type as @NonNullByDefault to affect all its methods and fieds
  • annotate a package as @NonNullByDefault (using package-info.java) to affect all its types
  • define a global policy that all packages should be @NonNullByDefault
    Unfortunately, a global default cannot directly be established but only via the indirection of package level defaults (see bug 366063 for background).

A particular problem arises when a type affected by @NonNullByDefault is a subtype of a legacy type with legacy signatures. It is illegal to override a method with legacy parameters by a method with @NonNull parameters. Specifying @Nullable for all parameters in such an overriding method may be too pessimistic, forcing the method body to do more null checks than actually useful.

For this situation a parameter has been added to the default annotation. Declaring a type that inherits from a legacy type with @NonNullByDefault(false) cancels the applicable default.

Third-party code

Every project depends on third party code, which it doesn't control, so adding null annotations is not directly possible.

To address this issue, the Eclipse Java Compiler should support nullity profiles, aka external annotations: separate files that capture the factual null contracts of all API methods and fields contained in a given library.

Support for this feature is planned for version 3.9, see bug 331651.

Avoiding the risk of incompleteness

In order to ensure that no unchecked legacy types are used in an application, these things must be ensured:

  • @NonNullByDefault is the globally enforced default
  • @NonNullByDefault(false) is never used
  • all libraries come with API-complete nullity profiles

Each of these steps brings a project closer to the safety guarantees of complete analysis.

TODO: Issue a configurable warning when @NonNullByDefault(false) is used.

TODO: Add a note to bug 331651 for checking complete coverage of referenced libraries by available nullity profiles.


Bypassing compile time checks

Any guarantees given by the analysis can still be bypassed at runtime if one of the following is applied:

  • runtime reflection beyond pure introspection, i.e, whenever reflective field access or method invocation is involved.
  • bytecode manipulation, i.e., whenever the bytes being executed by the VM are not identical to the bytes produced at the time when null analysis was performed.

These risks are typically considered outside the scope of linguistic means, developers know about these issues at a general level and don't expect a static analysis to consider these.

Lenient analysis

Some warnings/errors flagged by a strict analysis may seem counter intuitive. Some may actually be irrelevant in specific contexts.

Practical investigations showed that many of those "unwanted" warnings/errors relate to field access. When focusing on these, the most obvious approach at distinguishing interesting from uninteresting warnings is to apply some kind of flow analysis for fields (which the pessimistic approach strictly avoids).

Risks of flow analysis for fields

When flow analysis is applied to fields the following reasons may cause unexpected NPE:

  • side effects
  • aliasing
  • concurrency

Null analysis for the following kinds of fields is essentially unaffected by these risks:

  • final fields
  • @NonNull fields

For both kinds of fields only object initialization may produce unexpected results (yes, even Java's definite assignment rule can be circumvented). Once initialized their null status is either known or amenable to flow analysis.

TODO: more precise distinction for final fields: is static relevant? does initialization occur directly as part of the field declaration?

Side effects

The simplest problem with flow analysis for fields results from side effects in methods:

class X {
   @Nullable Object f;
   void test() {
       if (this.f != null) {
           foo();
           this.f.bar();
       }
   }
   void foo() { /* arbitrary code here */ }
}

The execution of foo() could potentially assign null to f thus invalidating the above null check.

Aliasing

When applying flow analysis for checking nullness of fields the following snippet demonstrates how aliasing threatens the validity:

class X {
    @Nullable Y f;
    void test(X other) {
        if (this.f != null) {
            other.f = null;
            this.f.bar(); // potential NPE
        }
    }
    void breakIt() {
        test(this); // definitely triggers the NPE
    }
}

Without further annotations an intra-procedural analysis cannot see that the assignment to other.f affects the value of this.f.

Concurrency

Also concurrency can invalidate the null related guarantees of flow analysis for fields. In a concurrent application not even this snippet is safe:

   if (this.f != null)
       this.f.bar();

Thus concurrency adds to the problems of side effects and aliasing, because null information from one expression may already be invalid when evaluating the very next expression.

Drawing a line: what intermediate code to admit?

Given the above risks, we can admit certain styles of code if a certain risk is considered irrelevant in a given context. The discussion will use this basic question:

class X {
   @Nullable Object f;
   void test() {
       if (this.f != null) {
           // what code is allowed here?
           this.f.bar(); // want to be sure there's no NPE here
       }
   }
}

What code can we admit at the designated location? What assumptions must a developer ensure to render that code valid?

Admitting method calls

If the code between check and dereference contains any method call, we have the fragment used for demonstrating side effects:

class X {
   @Nullable Object f;
   void test() {
       if (this.f != null) {
           foo();
           this.f.bar();
       }
   }
}

It turns out that all three risks mentioned above are effective in this situations.

This requires to confidently state the assumption that one's application is free from side effects, aliasing and concurrency. In typical object-oriented code that assumption, however, is unfounded. Thus admitting any method call in this location lacks theoretical nor practical foundation.

Admitting field assignments

class X {
   @Nullable Object f;
   void test(X other) {
       if (this.f != null) {
           other.f = null;
           this.f.bar();
       }
   }
}

When demonstrating the effects of aliasing we discussed that the above fragment is not OK.

Still similar patterns could be accepted, if both of the following hold:

  • no other thread concurrently accesses the field f, and
  • we can statically determine that a field f being assigned is different from the field being dereferenced afterwards.

For the latter assumption we apply one of several strategies:

  • full alias analysis (not realistic for the JDT)
  • static approximation: ignore that different objects hold different incarnations of the same field, treat any assignment to a field as invalidating the analysis results for all objects of that class.

Using static approximation the above fragment would be considered as unsafe, because the assignment refers to the same field, although at runtime different incarnations (of different receiver instances 'this' and 'other') may be involved. This static approximation will thus flag a few locations that might be OK given more knowledge about the instances 'this' and 'others', but the guarantees given by this approach can only be broken by concurrency.

Drawing a line: what kinds of field references to include?

TODO

...


Summary

TODO: clean-up this section and incorporate findings from team discussions

For non-final @Nullable fields the simplest strategy for avoiding any nullness related risks is to pessimistically assume potential null at every read. This means for strict checking no flow analysis should be applied to @Nullable fields.

While this appears to be a very drastic restriction, the remedy is quite easy: before dereferencing a @Nullable field it has to be assigned to a local variable. Flow analysis is then safely applied to the local variable with no risk of side effects, aliasing nor concurrency, since local variables are not shared with any code locations that would be outside the scope of the analysis. I.e., the flow analysis can see everything it needs to consider regarding local variables.

Any flow analysis performs much better for local variables than for fields. Shifting all serious computation to local variables makes for much safer code. Whether or not a project can afford to fully enforce this strategy depends on many factors.

This lets me conclude that these questions deserve further investigation:

  • How can gradual migration to a fully safe coding style fully based on local variables be supported. I.e., what intermediate levels are useful, what configuration options are needed and what warning messages best convey the vagueness/certainty of each issue?
  • How can the above options be communicated to users? Is it OK to offer options like "perform flow analysis for fields", or should options be labeled as "consider aliasing", or "pessimistically consider concurrency for null analysis"?
  • How can initialization be made safer? How relevant are these issues in practice? Which annotation / set of annotations is the best buy? Is support for several styles required?

Copyright © Eclipse Foundation, Inc. All Rights Reserved.