When I was timing how long it takes Jira 5.0.1 to reach a steady state for GC & code compilation with JDK 1.6.0_26 for a GC tuning guide; I noticed a log message that I’d never seen before:

[cc lang=’text’]Java HotSpot(TM) 64-Bit Server VM warning: CodeCache is full. Compiler has been disabled.
Java HotSpot(TM) 64-Bit Server VM warning: Try increasing the code cache size using -XX:ReservedCodeCacheSize=[/cc]

This cache is a memory area separate from the JVM heap that contains all the JVM bytecode for a method compiled down to native code, each called an nmethod1. This is where the JIT compiled methods are kept. On server classed 64 bit VMs, the reserved cache size is 48 MB (Used to be 1 GB, see Bug 6245770).

Once the compiler is switched off due to the Code Cache being full, it does not switch back on. Existing nmethods will continue to be used until they are flushed from the cache. As this was the first time I’d seen this interesting message, I decided to do some digging as I hadn’t encountered the VM code cache with previous research in to HotSpot garbage collection.

Impact of the Code Cache being full is that JIT compilation is now off for the remainder of the JVM’s life. We don’t have any statistics as to the impact of this on long running performance; the impression that I have is that CPU utilisation & request service time would increase over long running VMs for code that could have been optimised. For a system that is over-committed on CPU resources the application throughput could decrease noticeably.

The suspicion that I have here is that if I give the Code Cache some fatpants by increasing -XX:ReservedCodeCacheSize=, I’d just be delaying the problem. So I dug in to the HotSpot code to see if there was a way that the JVM can manage this for me. Turns out, there is; in the HotSpot source for jdk6 the following flag is available for product releases, -XX:+UseCodeCacheFlushing:

[cc lang=’java’]product(bool, UseCodeCacheFlushing, false, “Attempt to clean the code cache before shutting off compiler”)[/cc]

It is not on by default. So I turned this on, and I no longer observed the log message about the code cache being exhausted with the particular environment being used. I wanted to confirm that this was working as advertised, so I read through the HotSpot code to determine how the Code Cache flushes nmethods during default operation and when -XX:+UseCodeCacheFlushing is used.

There is a way to infer when nmethods are flushed by interpreting the log messages printed with -XX:+PrintCompilation, the lines that are important are the ones where an nmethod has been made a zombie:
[cc lang=’text’]552287 4320 made zombie org.ofbiz.core.entity.EntityExpr::makeWhereString (557 bytes)[/cc]

With -XX:+UseCodeCacheFlushing, what you would be looking for is a large group of these messages in quick succession, as it generally indicates the processing of older nmethods to be flushed from the code cache. As opposed to normal flushing due to class unloading or deoptimisation.

This doesn’t give the whole story; as when an nmethod is made zombie it is not immediately flushed from the cache, the state progression of nmethods may not be linear, there is a race between cache flushing and code compilation or what causes the significant code cache churn with a long running Jira instance, &c…

Code Cache Flushing in a nutshell

Code Cache flushing is achieved via two mechanisms. A Scanning mechanism, where nmethods are earmarked when they were last seen and a Sweeping mechanism, where nmethods progress through multiple stages in successive sweeps so that they can be flushed safely.

It is a race between compilation and flushing; where the compiler is disabled when flushing loses.

With -XX:+UseCodeCacheFlushing; when the code cache size is near to exhaustion, the oldest half of the code cache is speculatively flushed. The speculatively flushed nmethods are disconnected. If they are determined to still be of use within a period of time, they are reconnected; otherwise they are flushed. Details are below.

Default code cache flushing

With default code cache flushing, nmethods are marked when they are no longer needed, from events such as class unloading or when the nmethod has to be deoptimised. To ensure safe flushing of an nmethod from the cache, there are two main marks that are used: a not entrant mark and a zombie mark.

An nmethod is marked as not entrant when it begins the flushing process, as there may be references to the nmethod in an execution stack frame. Once these references are cleared, the nmethod may be marked as a zombie where it can be flushed.

Scanning is done during a safepoint2 cleanup, where execution stack frames are checked for not entrant nmethods, where they are stamped with the current value of a counter that tracks the number of stack traversals during the life of the VM. This is used during sweeping to determine if it is safe to mark a not entrant nmethod as a zombie.

This ensures that live not entrant nmethods aren’t flushed by comparing the the stack traversal stamp with the current stack traversal count. Essentially if the stack traversal stamp is lower than the current stack traversal count by a safe margin; (2 to be precise), then a not entrant nmethod may be converted to a zombie.

Sweeping is broken up in to fractions and operates outside of safepoints, as it is performed by a compiler thread after a stack scan. It is also complicated by the possibilty of multiple compiler threads. The benefit of there being multiple compiler threads is that one may be performing a sweep fraction whilst the others are servicing compilation jobs. Of course, this is assuming that the race of flushing vs. compiling is in the favor of flushing.

Inside the main loop of the compiler thread, before obtaining any new jobs from the compile queue, a sweep fraction is potentially performed. The compiler thread first attempts to get an atomic lock, to ensure that only one compiler thread performs code cache sweeping at any given time. The number of nmethods to sweep is then estimated, based on which invocation out of -XX:NmethodSweepFraction=<n> (default 4) the sweeper is at. The final sweep invocation visits all remaining nmethods.

For each nmethod visited during a sweep fraction:

  • If the nmethod is in use by the VM:
    • If the nmethod is entrant clean up any inline caches to not entrant & zombie nmethods.
  • Else if the nmethod is a zombie:
    • If the zombie is marked for reclamation, flush it from the code cache.
    • Else mark the zombie nmethod for reclamation. This is to ensure that by the next time the zombie is seen by the sweeper that all inline caches that reference the nmethod are cleaned up.
  • Else if the nmethod is not entrant:
    • If the nmethod is no longer in an execution stack frame, mark it as a zombie.
    • Else clean up inline caches.
  • Else if the nmethod is for unloaded code:
    • If this is an On Stack Replacement (OSR) nmethod, flush it immediately.
    • Else mark the nmethod as a zombie.
  • Else, clean up inline caches.

Speculative flushing, -XX:+UseCodeCacheFlushing

When -XX:+UseCodeCacheFlushing is enabled if the unallocated code cache becomes less than -XX:CodeCacheMinimumFreeSpace=<n>[g|m|k] (Default 500K), or the amount of free code cache drops to less than -XX:CodeCacheFlushingMinimumFreeSpace=<n>[g|m|k] (Default 1500K), pre-emptive code cache flushing may start. In the former compilation is disabled, with the latter compilation is enabled but compiles are delayed until free code cache is greater than -XX:CodeCacheFlushingMinimumFreeSpace=<n>. Speculative flushing is used for both scenarios.

When speculative flushing begins, the older half of the nmethods based on compile id are saved in to a list named old and disconnected from their respective methodOop.

Scanning differs to the default behavior where if the code cache no longer needs flushing the compiler will be allow to run new jobs. This is achieved by checking:

  • Sufficient space has been cleaned out to suffice -XX:CodeCacheFlushingMinimumFreeSpace=<n>.
  • The interval since when the code cache was last full is longer than the number of seconds specified for -XX:MinCodeCacheFlushingInterval=<n> (30 default).

Sweeping is done with greater aggressiveness in comparison to the default behavior:

  • After the first potential sweep fraction:
    • While there are no compile tasks:
      • Wait for -XX:NmethodSweepCheckInterval=<n> seconds.
      • If there was no safepoint before waking up, possibly sweep again.
  • When performing a potential sweep, before performing a clean up inline cachesfrom the last step in the default sweep steps above:
    • If the nmethod was saved in the old list & after two sweep scans the nmethod wasn’t reconnected, make the nmethod not entrant.

When speculative flushing is active, during JIT compilation if for a method there is a saved nmethod in the speculatively disconnected list that is entrant for the method being optimised it is removed from from the old list & and reconnected to that method.

References:

  1. From the HotSpot glossary:

    nmethod
    A block of executable code which implements some Java bytecodes. It may be a complete Java method, or an ‘OSR’ method. It routinely includes object code for additional methods inlined by the compiler.

  2. From the HotSpot glossary:

    safepoint
    A point during program execution at which all GC roots are known and all heap object contents are consistent. From a global point of view, all threads must block at a safepoint before the GC can run. (As a special case, threads running JNI code can continue to run, because they use only handles. During a safepoint they must block instead of loading the contents of the handle.) From a local point of view, a safepoint is a distinguished point in a block of code where the executing thread may block for the GC. Most call sites qualify as safepoints. There are strong invariants which hold true at every safepoint, which may be disregarded at non-safepoints. Both compiled Java code and C/C++ code be optimized between safepoints, but less so across safepoints. The JIT compiler emits a GC map at each safepoint. C/C++ code in the VM uses stylized macro-based conventions (e.g., TRAPS) to mark potential safepoints.

Read more from our developers at developer.atlassian.com.

CodeCache is full. Compiler has been disabled.