Distilling Mirah: Type Inference

Recently, I’ve been watching the work of a handful of developers on a new programming language: Mirah. As a fan of the Ruby programming language and a slave to the Java overlords, Mirah offers instant appeal to me, as it borrows the core Ruby syntax almost verbatim, but creates true Java class files runnable on any Java VM. This decision makes perfect sense, considering the originator of Mirah is Charlie Nutter, one of the key developers of JRuby, a language heavily invested in both Ruby and Java. (Mirah actually reuses the JRuby parser at the time of this writing, if that gives you any indicator how similar the syntax is).

Because of my interest in the development of Mirah, I’ve decided to begin spelunking into the implementation as it stands today, sharing with you what is going on internally. Many of you are probably familiar with my “Distilling JRuby” series, and while these articles will likely read similarly, I suspect they will be more brief and hand-wavy. This is partially out of a desire to cover more topics over a short period of time, but also because the implementation for Mirah is very fluid, and is likely to change, rendering these articles invalid or at least out-dated.

Without further ado - let’s kick this pig. On to Mirah’s type-inferencing!

Article Series

  1. Distilling Mirah: Type Inference

Mirah Overview

There are a few key concepts that need to be discussed regarding Mirah before we get started:

  • Mirah is not Ruby! Mirah looks like Ruby at first glance, but that is only superficial in nature. We will see why over the next series of topics.
  • Unlike JRuby, Mirah is not implemented in Java (well, mostly not). It is actually implemented in Ruby - this is going to make the way we traverse the code in these articles very different than the JRuby series.
  • While I say that Mirah borrows the Ruby syntax, it has to modify and add certain structures to fit the mold which has been carved for it. So while it is possible to write some programs that are almost true Ruby syntax, most Mirah programs will have a few extra sprinkles.
  • Mirah is statically typed, and compiles to standard Java bytecode. This is one of the key reasons that Mirah is not 100% Ruby-grammar compatible.
  • Mirah is designed from the ground up to be a language specification that can be implemented on several platforms (.NET is a perfect example). This introduces indirection in the code that may, at first, seem confusing.
  • One of the key principals of Mirah is to avoid runtime encumbrances if at all possible. What this means is that all features in Mirah as it currently stands are implemented by either a compiler plug-in, or by using existing core libraries of the underlying platform (or a combination, of course). This goal is to hopefully avoid the 3-5 MB ball-and-chain that many languages (i.e. Scala, Clojure, JRuby) hang around your neck to run deployed code. The idea being that, if you want runtime features, you can bring Mirah modules in per your own decision, but if you want to fit a Mirah program on a micro-controller that can run Java bytecode (or Dalvik cough), you should be able to by forgoing some of those features that require large runtime libraries.

The Mirah site can be found at http://www.mirah.org, and the official Mirah ‘master’ repo is available at github: http://github.com/mirah/mirah. Feel free to checkout and follow along, although one last disclaimer - the code is changing quickly, so my articles are bound to fall out of the times.

I’d suggest before proceeding you familiarize yourself with the language syntax - I don’t plan to stop along the way.

A Background on Type Inference

Most JVM language replacements that are garnering attention right now in one way or another avoid explicit typing in the syntax to some degree. Languages that are compiled to normal byte-code with some degree of implicit typing, must involve some form of type inference. This is the process of statically analyzing code to determine the runtime types the code is using by inferring from the use of variables and parameters. Statically compiled languages on the VM must do this, because Java bytecode (and the VM) expects to work with types - and if the compiler can’t figure it out, it can’t compile the bytecode.

Consider this Java statement:

1HashMap<String,String> myMap = new HashMap<String,String>();

There is really no reason you need to define the left-hand side (the declaration) so explicitly, considering that the right-hand side (the assignment) has already told you exactly what the variable is. Surely this should be sufficient:

1var myMap = new HashMap<String,String>();

Anyone familiar with C# will likely recognize this syntax convenience. Of course, this is simple example, because you only have to examine this one line to infer the variable type. Things get much more complex when there are control structures, methods, and other language features in the way.

That being said, type inferencing is a well-tread path - it’s certainly not unique to JVM languages; far from it. There are different levels of type inference, with the most complete often using something like Hindley-Milner to deduce types recursively (excellent description of Hindley-Milner by Daniel Spiewak on his blog).

Mirah’s Type Inferencing

As it stands today, Mirah currently implements a variant of type inference somewhere between true “local” type inference, and fully recursive type inference like Hindley-Milner. Mirah’s inference uses a multi-pass infer process, where the first phase does simple local inference (or line-by-line inference), and then subsequent passes are made, looking for new type resolutions from those higher constructs. For example, consider these two Mirah methods:

1def method_a()
2  return method_b(5) + method_b(6)
3end
4
5def method_b(x:int)
6  return x * -1
7end

In this case, ‘method_a’ is obviously dependent upon ‘method_b’ - but if ‘method_a’ is inferred first, it will have no way to know what it’s return type is, because method_b hasn’t been inferred yet. In this case, ‘method_a’ is ‘deferred’ for a later inference pass. Shortly thereafter, ‘method_b’ will be processed, and since it can be completely analyzed through local inference, it will resolve to return an int. At that point, method_a can look at the two invocations that are involved in the return statement, and can in turn determine that it should also return an int.

The Algorithm

From an implementation standpoint, Mirah does this inference by utilizing the ASTs generated from the source. Each AST knows individually how to infer itself based on its recursive contents - this is something we’ll investigate in more detail shortly.

Mirah defines a namespace and class called Typer that is used to orchestrate this entire process. The Typer is asked to analyze each AST tree parsed by Mirah individually, and then to iteratively resolve:

1typer = Typer.new
2asts.each { |ast| typer.infer(ast) }
3typer.resolve

The infer method for an individual AST node is pretty straightforward:

1class Typer
2  def infer(node)
3    node.infer(self)
4    # error handling business
5  end
6end

Notice that the typer passes itself into the node - this allows the nodes to callback into the typer for a variety of reasons. For example, each node has to decide for itself whether or not it has enough information to infer. If it doesn’t, it will tell the typer that it needs to be ‘deferred’, meaning it doesn’t yet have enough information. All this effectively does is record the node for later:

1class Typer
2  def defer(node)
3    @deferred_nodes << node
4  end
5end

So the typer calls infer on the top level AST node, at which point the AST hierarchy will recurse, inferring and deferring nodes as appropriate. After the first recursive inference pass, the typer is then asked to resolve AST nodes iteratively until all nodes are inferred, or until no progress is made:

 1class Typer
 2  def resolve
 3    old_len = @deferred_nodes.length
 4    while true
 5      @deferred_nodes.each do |node|
 6        type = infer(node)
 7        if type != nil
 8          @deferred_nodes.remove(node)
 9        end
10      end
11
12      if @deferred_nodes.length == 0
13        break
14      elsif old_len == @deferred_nodes.length
15        raise # can't infer error!
16      end
17    end
18  end
19end

AST Working Together

Understanding the concept of the AST recursively inferring is the key component to understanding the typer. Consider, for example, the statement x = method_b(5) - this is represented by a tree of AST nodes. For those of you with experience in parsers, or experience with my previous JRuby articles, it probably won’t be too hard to derive the types of nodes involved - it’s basically this:

LocalDeclaration
|
.-- LocalAssignment (type_node)
    |
    .-- FunctionalCall (value)
        |
        .-- Fixnum (parameters)
            |
            .-- "5" (literal)

The idea is that the declaration will ask the assignment, which will in turn ask the call being made with the parameter types in play, and will then return the type of the call return type. Here is a sketch of the various infer methods for these nodes:

 1class LocalDeclaration
 2  def infer(typer)
 3    type = @type_node.infer(typer)  #type_node is the local assignment
 4    if(!type)
 5      typer.defer(self)
 6    end
 7    return type
 8  end
 9end
10
11class LocalAssignment
12  def infer(typer)
13    type = @value.infer(typer) #value is the "functional" call.
14    if(!type)
15      typer.defer(self)
16    end
17    return type
18  end
19end
20
21class FunctionalCall
22  def infer(typer)
23    @parameters.each { |param| param.infer(typer) }
24    if #all parameters inferred, and method with params and scope is known
25      return typer.method_type(@method_name, method_scope, @parameters)
26    else
27       typer.defer(self)
28       return nil
29    end
30  end
31end
32
33class FixNum
34  def infer(typer)
35    return typer.fixnum_type(@literal) #literal is '5'
36  end
37end

A few things to note here:

  • This is totally pseudo code - the actual code has all kinds of branches for caching and other good bits.
  • The one literal we have, Fixnum, calls back into the typer to get the actual fixnum type - we’ll see this come in to play momentarily.
  • The typer has the ability to look up a method type by a signature - when methods are scanned during type inference, they record themselves in the typer for other nodes, like this one, to use when inferring since they are one case of node “hopping”, where one AST can be linked to another by reference.
  • We’re dodging how the functional call determines things like ‘method scope’ for now.

Resolving Literals

As noted above, the Fixnum node is asking the typer to give it back a fixnum type. This is done for all of the literal types. It’s done this way so that the platform implementation (in this particular case, Java) can plug in a particular type. So in this particular case, the Java implementation, in the JVM::Types module, provides a FixnumLiteral that looks at the provided value, and determines where in the hierarchy it belongs (for you Java folks, that’s byte, short, int, long, etc). When asked to actually compile, these AST nodes actually know how to generate the ultra-fast JVM bytecode-ops for primitives.

Type Annotations

As seen in one of the snippets above, Mirah supports type definitions for places where typing is either required (due to a lack of inference) or desired (widening a type, for example). Forgoing the fact this is a contrived implementation for a moment, consider this method:

1import java.util.Map
2import java.util.HashMap
3class SomeClass
4  def singleton_map(a:string, b:string):Map
5    map = HashMap.new
6    map.put(a,b)
7    return map  
8  end
9end

Here we are declaring both variable types so we can control inputs, and then we are declaring the return type. The reason you might want to declare a return type like this is so that the compiled method doesn’t expose too narrow of an implementation. Remember, we’re compiling to Java class files here - so if the compiled type inferred that the method returned a HashMap, that is a contraint we may never be able to change in the future. By changing it to ‘Map’, we can adjust the API like we would in the Java world to avoid tying ourselves to an implementation. To see this in action, here’s the output from mirahc when asked to generate Java code for this with and without the return type:

 1// With:
 2public class SomeClass extends java.lang.Object {
 3  public java.util.Map singleton_map(java.lang.String a, java.lang.String b) {
 4    java.util.HashMap map = new java.util.HashMap();
 5    map.put(a, b);
 6    return map;
 7  }
 8}
 9
10// Without:
11public class SomeClass extends java.lang.Object {
12  public java.util.HashMap singleton_map(java.lang.String a, java.lang.String b) {
13    java.util.HashMap map = new java.util.HashMap();
14    map.put(a, b);
15    return map;
16  }
17}

Individual AST nodes know about these definitions (sometimes known as forced types), and will respect those over the corresponding inferred types. That’s not to say that it will just take them for granted; the type inference still occurs. In the example above, the method body is still inferred to ensure it returns a type that can be widened to ‘java.util.Map’ - otherwise the code will cause runtime errors in the VM. Here’s a snippet of the method definition AST analysis:

 1class MethodDefinition
 2  def infer(typer)
 3    forced_type = @return_type
 4    inferred_type = @body.infer(typer)
 5    actual_type = if forced_type.nil?
 6      inferred_type
 7    else
 8      forced_type
 9    end
10
11    if !actual_type.is_parent(inferred_type)
12      raise "inference error"
13    end
14    return actual_type
15  end
16end

The return_type field will be set by the parser if provided, and takes precedent so long as it’s still able to be used in place of the actual inferred type of the method body.

Uncovered Topics

So this was a quick spin through Mirah-land, but even for the inference engine, a lot was left on the table if you’d like to explore from here:

  • Finding “native” types (in this case, calls into and returning Java types)
  • Tracking class/method scope when inferring
  • Inferring against intrinsics (such as ‘+’, and ‘||')
  • Dealing with multi-node inference - several nodes, like ‘MethodDefinition’ are expected to infer several parts, including arguments, return type, throws types, etc. This increases the complexity of the implementation, but doesn’t have much impact on concept.
  • Superclasses, invocation of ‘super’, overriding, overloading, etc.
  • Framework vs. Implementation (i.e. JVM) Responsibilities

Stay tuned as the Mirah story unfolds!

JRuby and Sinatra in 2 Minutes

While at RubyMidwest I decided to explore Sinatra in more detail. I’ve spent a lot of time with Rails, and while I love it, there is something alluring about the simplicity of Sinatra (and, well… ooh shiny). Being a recovering Java developer (Hi, I’m R.J., and I haven’t developed in Java for 18 hours) I have a server that runs Java, and would like to be able to use Sinatra to build my fancy-awesome web-apps. On those lines, I want all of the shiny benefits of JRuby’s multi-threading awesome-ness, as opposed to just trying to use WEBrick, which does not a powerful server make. So here is a 2 minute tutorial (well, depending on the performance of your computer, and how fast you type) startup with Sinatra, JRuby, Bundler, and Glassfish

I’m cheating already by assuming you already have JRuby installed as your default Ruby installation. No? Go get it!

Next step is to get bundler:

1gem install bundler

Now we need to make a home for our application, and prep it for Bundler:

1mkdir testapp
2cd testapp
3edit Gemfile

Here I’m creating a new file in testapp called ‘Gemfile’ in your favorite editor. This is where we will sketch out our dependencies for Bundler to do all the hard work for us - here are the contents for this example:

1source :rubygems
2gem "sinatra"
3gem "glassfish"

Frankly, that’s it. We tell Bundler to look for gems in RubyGems core repo, and then we ask it to make sure we have Sinatra and Glassfish. Now we can create the program - create the file ‘hello.rb’, and use these contents:

1require "rubygems"
2require "bundler"
3Bundler.setup
4
5require "sinatra"
6
7get '/hi' do
8	"Hello World!"
9end

So what’s special for JRuby? Absolutely nothing. We do have special sauce for Bundler, (by calling Bundler.setup prior to the require for ‘sinatra’) but trust me - you’ll be happy you used it. You’ll also make @wycats happy.

And - that’s it! Now, if you were to start this file the standard (well, bundler-standard) way, we’ll see this:

1realjenius$ bundle exec hello.rb
2== Sinatra/1.0 has taken the stage on 4567 for development with backup from WEBrick
3[2010-07-17 11:24:46] INFO  WEBrick 1.3.1
4[2010-07-17 11:24:46] INFO  ruby 1.8.7 (2010-06-06) [java]
5[2010-07-17 11:24:46] INFO  WEBrick::HTTPServer#start: pid=44490 port=4567

…and we can visit this URL: http://localhost:4567/hi. But, recall that our goal was to work with Glassfish, not WEBrick. All that has to change (and for folks who has done Glassfish/Rails before, this won’t be a surprise) is to run this startup instead

1realjenius$ bundle exec glassfish
2Log file /Users/realjenius/Projects/testapp/log/development.log does not exist. Creating a new one...
3Starting GlassFish server at: 0.0.0.0:3000 in development environment...
4Writing log messages to: /Users/realjenius/Projects/testapp/log/development.log.
5Press Ctrl+C to stop.
6
7Running sinatra

This time, we’ll visit this URL: http://localhost:3000/hi, and if all worked as desired, Sinatra will be crooning away. Boom goes the dynamite.

Distilling JRuby: Frames and Backtraces

/img/articles/jruby/logo.png

Welcome back JRuby fans. I took a poll on twitter about what distilling article to do next, and frames and backtraces was the clear winner - so here we are! (three months later).

In previous “distilling” articles, I discussed how methods are dispatched, and then how the scope of variables in each method and block is managed. The scope and dispatch rules are only part of the big picture, however. Ruby, as a programming language, must gather rich information about the execution of the program, and must be able to share this with the developer when errors occur. Furthermore, Ruby itself provides a number of kernel-level methods for accessing and manipulating the current invocation stack (such as Kernel.caller).

This article is all about how JRuby implements those concepts.

Overview

A frame in JRuby parlance is a representation of a method call, block call, eval, etc. kept for presentation to the developer. A backtrace is a representation of the active method stack at any point in time - in other words, it’s a stack of frames. In Java, this would typically be referred to as a ‘stack trace’ - at least, that’s the most direct counterpart.

It can be difficult when juggling a language implementation around in your head to realize that the trace we’re talking about is specific to the method calls in Ruby itself. JRuby may execute a number of “native” methods (code written in Java) that do not show up as part of this backtrace - the code that must run in between steps of the Ruby code executing is implementation-specific to JRuby; the Ruby developer shouldn’t care what internal magic JRuby had to do to get a method to invoke (nor would they know what to do with that knowledge if they did have it).

While it may not seem incredibly important initially, JRuby goes to great pains to be as compatible as possible with MRI in terms of what backtraces are generated (This ‘compatibility mode’ incurs a certain cost, and it may be preferable to turn this off to give JRuby an opportunity to bypass this internal bookkeeping if, as a developer, you don’t need a backtrace to match MRI; but we’ll get into those experimental optimizations later). Backtrace information turns out to be quite important, as it is the first set of information a developer typically uses to trace execution issues in their own code; if it isn’t accurate (or at least traversable) it could easily make a small problem a big one.

Tracking Frames

I would like to mention at this point that it would be in your best interests to read the earlier Distilling JRuby articles, if you haven’t already. Method dispatching, scope, and the JIT compiler are all entertwined with the concept of frames, and I will be talking about these various relationships throughout this article.

You may recall that during the article regarding tracking variable scope, I mentioned that the ThreadContext is consulted on a number of occasions to find data in the variable table. At the time, I was talking about how variables are managed; but that same context class acts as the main source for tracking the frames of method invocation. We saw previously that when a Ruby method is dispatched, a variant of method named preMethod{...}() would be called on the ThreadContext class, and that would in turn tell the ThreadContext to create another DynamicScope object and put it on the top of the stack. It turns out this is exactly where the frame is managed as well. Here is a block of code I showed from the JRuby codebase in that previous article:

1public void preMethodFrameAndScope(RubyModule clazz, String name, IRubyObject self, Block block, StaticScope staticScope) {
2    RubyModule implementationClass = staticScope.getModule();
3    pushCallFrame(clazz, name, self, block); // <-- What we care about this time
4    pushScope(DynamicScope.newDynamicScope(staticScope));
5    pushRubyClass(implementationClass);
6}

Note how this method not only creates a new scope to represent the method’s static scope, but also calls pushCallFrame(...). This is where the new frame is created to represent the method that is being invoked. This frame is represented by a ‘org.jruby.runtime.Frame’ object, which is put on the top of the frame stack.

By most accounts, the Frame object in JRuby is a simple mutable Java bean. The class is relatively simple, and carries a few key pieces of information:

  • The object that owns the code being invoked
  • The name of the method (or block or eval) being invoked
  • The visibility of the method
  • The name of the file where the invocation of the frame occurred.
  • The line number in the calling file where the invocation of the frame occurred.

… and that’s it! This is basically all that is required to produce a single line in a backtrace. The entire stack of frames then, in turn, represents the entire backtrace.

The Magic Line Number

While the program is executing, the line number is constantly changing. The frame has some idea of this line number, but only in terms of when the method was called in the enclosing code - it’s not a live representation. However, when you think about a running program, the number on the top of the trace is constantly changing - and on top of that, the frame that is the top of the trace is constantly changing as well - so who is keeping track of this magic number?

It turns out it’s the ThreadContext again, where an integer is kept to keep track of the most current line number (and actually the most current file, as well). In the basic (interpreted) mode of JRuby, the various AST nodes (control statements like if and while, blocks, methods, etc) all have their line number baked into them. When they are invoking, they will update the line number on the thread context. For example, here is the top part of the ‘interpret’ method on org.jruby.ast.IfNode:

1@Override
2public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
3    ISourcePosition position = getPosition();
4    context.setFile(position.getFile());
5    context.setLine(position.getStartLine());
6    // ...

For JIT compiler fans, note that it also manages the line number like the interpreted nodes, but as usual is a little more obscure. Two things are done for the JIT compiler: first - code is generated that will call into the ThreadContext to update the line number information like above (See ASTCompiler#compileNewLine). Additionally, however, the line numbers are also actually written into the generated Java bytecode using the standard label/line-number bytecode structures (This will provide distinct advantages in generating backtraces, as we will see later).

As the various code is invoked, this number is constantly being changed to represent the position from the original source. When a new method or block is invoked, that value is copied onto the frame and preserved. This allows the frame to keep track of when it lost control of the execution, while the thread context keeps track of the live line number.

Let’s take a look at a sample backtrace for a specific example:

1./another.rb:7:in `do_something_else': undefined method `call' for nil:NilClass (NoMethodError)
2from ./another.rb:3:in `do_something'
3from test.rb:5:in `run'
4from test.rb:9

So what this tells us is that in the method ‘do_something_else’ in another.rb, on line 7, we had a NoMethodError trying to call the method ‘call’ on a nil variable. Additionally, we know the three method calls it took to get to this point. Here is a diagram that shows what the frame stack looks like in the runtime at the moment this occurs (as usual, I’ve done some hand-wavy magic here to simplify a few less-important details…):

Note the line number mis-match

Note the line number mis-match

As you can see, the line number stored on the frame correlates to the position in the previous call where the invocation occurred. Also notice that the thread context carries the currently active file and line - but the method name is inferred from the top frame. This mixed relationship, while effective for the way that frames are recorded, can be confusing at first.

Managing Frames at Runtime

JRuby tries to avoid creating a huge volume of frame objects during execution; in general, the expectation is that a program is going to invoke a lot of methods during execution. If each method was represented by a frame, that would mean a lot of frame objects. To combat this, the frame objects are pre-allocated on the frame stack, and reused. Since programs are generally going to repeatedly traverse up and down the frame-stack, hovering around the same depth of execution, this is one place in Java code where pooling of objects probably makes good sense. Rather than JRuby creating thousands and thousands of frame objects, it will only create enough for the deepest level of execution per thread.

Internally speaking, the ThreadContext class keeps a growable Frame[], but in the process it also ensures that each slot is pre-filled with a ready-to-use frame object. If the allocation needs to grow, the array is increased by a capacity, the existing frame objects are moved to the new array, and the new empty slots are filled with additional Frame allocations.

When a Frame object is set up for use, some variant of the #updateFrame method is called, which basically captures all of the invocation information - it effectively behaves as a constructor:

 1public void updateFrame(RubyModule klazz, IRubyObject self, String name,
 2Block block, String fileName, int line, int jumpTarget) {
 3    this.self = self;
 4    this.name = name;
 5    this.klazz = klazz;
 6    this.fileName = fileName;
 7    this.line = line;
 8    this.block = block;
 9    this.visibility = Visibility.PUBLIC;
10    this.isBindingFrame = false;
11    this.jumpTarget = jumpTarget;
12}

Dispatching Options

In all of the past three articles, I have brushed by the CallConfiguration enumeration. This enum is a pretty significant lynch-pin in the dispatching and execution of program flow, as it decides a number of things about method and block invocation. Each method may be dispatched using a number of possible call configurations, based on the state of the running program, and the needs of code block being executed. This CallConfiguration decides not only whether a Frame object is required for the call, but also whether or not a Scope is required. This abstraction is very useful as both the interpreter, and the JIT-compiled code dispatch using this configuration strategy.

Just as certain methods may not require an explicit scope (no variables are mutated), some methods also don’t require frames. With both scope and frame, the primary reasons for skipping their use is performance. In the case of scope, the code which manages this is entirely transparent; you don’t care how your variables are managed, as long as they are managed.

However, with frames it’s not that simple. We’ve already discussed how method invocations can be compiled in to Java code in JRuby - effectively avoiding the overhead of making a series of reflection calls in favor for a generated block of Java code that properly preps the Ruby context, and invokes the method through the call site. In this process, a number of possible invocations can be generated - some that setup the frame constructs, and some that don’t. If you tell Ruby it can optimize away as much as possible through flags, it will generate these method connectors to be ultra-super-cool-fast.

Strictly speaking, to turn off frames in the compilation process, you simply need to set ‘jruby.compile.frameless’ to true - although to get the most speed, you could instead set ‘jruby.compile.fastest’ (which implies a number of other settings as well).

It should be noted that, by default, these settings are turned off, and both are marked as experimental. By leaving them disabled by default, it ensures that JRuby, out of the box, is compatible with MRI Ruby as it pertains to generated backtraces and frame manipulation, and is as stable as possible. Turning them on can easily break certain frameworks and libraries that expect the frame or backtrace to be consistent, and manageable. In many cases, however, your application won’t need that kind of control over the frame, and you may not care that the backtrace be exact.

Here are some general rules followed when dispatching to a method or block:

  • A full frame will always be used if compatibility mode is enabled (jruby.compile.frameless is set to false).
  • Certain system-level invocations (such as the ‘eval’ method) get a frame no matter what, as they are frame aware.
  • In all other cases either a backtrace-only frame (a “lite” frame) will be used at most.
  • If jruby.compile.fastest is set to true, then frames will not be used at all unless it is required by the running program to exist. This obviously has some impact on the readability of the execution.

As mentioned above, readability of backtraces is an issue with jruby.compile.fastest – less accurate information will be available. For example, the trace above that was used looks like this when run with fastest-compilation on:

1./another.rb:7:in `do_something_else': undefined method `call' for nil:NilClass (NoMethodError)
2from ./another.rb:3:in `do_something'
3from :1:in `run'
4from :1

Note that all frames but the top are completely unaware of the execution details - in many cases this may be sufficient information to fix a problem, but it is certainly less than what is available in compatibility mode.

The ‘lite’ backtrace-only frame I mentioned is basically a trimmed down representation that doesn’t hold on to references to the owning object. While this can reduce the usability of the frame (particularly for methods like ‘eval’ that may need to interact with the caller), it’s a significant optimization as it takes several objects off the object graph, preventing long-lived references to live objects in from the program flow (such as the object that is being invoked against). This will allow the GC to handle these objects sooner than may otherwise be possible.

Execution Flow

When an error occurs, the execution needs to stop and unwind from the exception to the first point it is properly handled (with a rescue, or all the way out of the program). This can make following the JRuby code complicated, as Java exception flow is used as the back-bone for the Ruby exception flow, and so the two intermingle and must be kept separate in your mind.

The primary class that represents an exception in Ruby is org.jruby.RubyException, which is the JRuby native implementation of the Exception class in Ruby. There are a number of subclasses that are constructed (such as ArgumentError, as we will see below) that let Ruby code handle errors in a typed way, but effectively everything extends this ‘Exception’ class. Now, while this is called ‘Exception’, it’s not actually a Java exception. It extends RubyObject (like all JRuby native peers), and is a representation of a Ruby exception for the runtime, but has no effect on Java as anything but a standard object.

However, RubyExceptions can be encountered during execution, and that should interrupt execution. Somehow this has to be handled in Java code. As an example, the ‘to_sym’ method on RubyString is implemented natively in Java, and that method, by contract, should throw an exception if the string is empty.

1    $ ruby -e '"".to_sym'
2    -e:1:in 'to_sym': interning empty string (ArgumentError)
3    from -e:1
4    $ jruby -e '"".to_sym'
5    -e:1: interning empty string (ArgumentError)

As it turns out, the easiest way to interrupt Java code like this is to use a Java exception. For this, JRuby uses the class org.jruby.RaiseException, which is, in fact, a real Java exception. As the name hints, it represents the execution of a ‘raise’ keyword in Ruby (which is roughly analogous to a Java throw, but is actually a method on the Thread class). RaiseException contains the RubyException representing the error in Ruby code.

When Ruby code invokes ‘raise’, this method will delegate through org.jruby.RubyKernel#raise, which for the most part will end up throwing a new RaiseException. Now, this is where it gets tricky to distinguish the two. Keep in mind that the RaiseException simply exists so JRuby can back up the Java code to find the right Ruby code to handle the error. On the other side of the equation, the code in JRuby follows a pattern roughly like this:

 1public void interpret() {
 2    try {
 3        runBodyRubyCode(...);
 4    }
 5    catch(RaiseException e) {
 6        runRescueRubyCode(e.getRubyException());
 7    }
 8    finally {
 9        runEnsureRubyCode(...);
10    }
11}

This is pseudo-code mixed from the AST RescueNode and EnsureNode, but it captures the idea. First, the code is run - then, if a RaiseException occurs, the exception is sent into a rescue block of code. Keep in mind that when the rescue code is run, the Ruby exception is unboxed so it’s directly accessible to that code block (as it always is in Ruby). The ensure code is actually handled by a separate AST node (since it may be included independently of rescue), but the concept is the same as seen above.

The JIT obviously changes how this code is actually invoked (via generated Java code), but the same general logic applies.

If an exception isn’t handled via a mechanism like raise, then the RaiseException itself is handled by the Java bootstrap (such as the executable). This has some special consequences when it comes to embedded code, and we’ll get into that shortly.

Additionally, there is a special version of RaiseException called NativeException (this also exists in MRI) - this is a special wrapper for exceptions that occur in Java code called from JRuby code. When this happens, the stack trace for those native parts is actually preserved in the Ruby stack up to the point the Ruby code invoked the Java code. Here is an example of a backtrace that was created by an exception occurring in some Java code:

1java/lang/NumberFormatException.java:48:in 'forInputString': java.lang.NumberFormatException: For input string: "15123sdfs" (NativeException)
2from java/lang/Long.java:419:in 'parseLong'
3from java/lang/Long.java:468:in 'parseLong'
4from ./another.rb:7:in 'do_something_else'
5from ./another.rb:5:in 'do_something'
6from [script]:5:in 'run'
7from [script]:9

Constructing Backtraces

Throughout this article, we’ve seen examples of backtraces that were (seemingly) generated off of the frame stack. To create a proper backtrace, the currently active frame stack must be copied and turned into a point-in-time snapshot of backtrace information. When an error occurs, the backtrace is captured with participation between the RaiseException, RubyException, and the ThreadContext.

When the RubyException is constructed, it asks the ThreadContext to create a backtrace, which then iterates over the current frame stack, creating a RubyStackTraceElement array. This array is then bound to the RubyException. Here is a sample of the loop that creates the backtrace array (I’ve trimmed some unnecessary details):

 1public static IRubyObject createBacktraceFromFrames(Ruby runtime, RubyStackTraceElement[] backtraceFrames) {
 2    RubyArray backtrace = runtime.newArray();
 3    if (backtraceFrames == null || backtraceFrames.length <= 0) return backtrace;
 4    int traceSize = backtraceFrames.length;
 5    for (int i = 0; i < traceSize - 1; i++) {
 6        RubyStackTraceElement frame = backtraceFrames[i];
 7        addBackTraceElement(runtime, backtrace, frame, backtraceFrames[i + 1]);
 8    }
 9    return backtrace;
10}

In the normal “framed backtrace” workflow, that’s all there is to it. That array can then be used to emit to the console, or whatever else needs to occur.

Interestingly, there are a number of other “super secret” ways the backtrace can be generated. As best as I can tell, these are entirely undocumented on the JRuby site - these are simply custom values for “jruby.backtrace.style” - these include:

  • “raw” - This version provides a very explicit output of what happened, including all of the internal JRuby stack - very useful for JRuby development:
     1from java/lang/Long.java:419:in `parseLong'
     2from java/lang/Long.java:468:in `parseLong'
     3from Thread.java:1460:in `getStackTrace'
     4from RubyException.java:143:in `setBacktraceFrames'
     5from RaiseException.java:177:in `setException'
     6from RaiseException.java:119:in `<init>'
     7    from RaiseException.java:101:in `createNativeRaiseException'
     8    from JavaSupport.java:188:in `createRaiseException'
     9    from JavaSupport.java:184:in `handleNativeException'
    10    from JavaCallable.java:170:in `handleInvocationTargetEx'
    11    (... removed the rest for brevity ...)
    

* "raw_filtered" - Just like 'raw', but it omits any Java classes starting with 'org.jruby'. This is handy if you have code-flows that go from Ruby -> Java -> Ruby -> Java, etc - and need to see the Java code intermixed. I've used this when coding in Swing and SWT where event hooks may go into Java, and back into Ruby.
* "ruby_framed" (the default) - This uses the internal Ruby frame stack to generate an MRI-friendly backtrace. "rubinus" is currently compatible with this version. Depending on the settings you have enabled, this can return different values (as described above).
* "ruby_compiled" - This uses the Java stack trace, and parses the compiled class names. When JRuby generates compiled invokers for methods, they will have mangled names that can be re-parsed (looking for sentinels like $RUBY$). Additionally, remember earlier how I said that the line numbers were actually compiled in to the Java code straight from the Ruby code? Well, that means the Java stack trace will automatically have the correct line numbers in it, so building the Ruby backtrace is truly just a matter of parsing the Java StackTraceElement\[\]. Because of the nature of the bytecode and the Java VM capturing this information, when running with jruby.compile.fastest set to true, this mode can actually return <em>more</em> accurate information than ruby_framed will. Note that if a method isn't compiled, it will not show up in the Java stack, and as such the stack will only contain Java methods that were invoked (of which there may be none).
* "ruby_hybrid" (currently disabled) - This version is meant to be able to munge compiled and interpreted information together into a mega-stack-trace, allowing for compiled and interpreted methods to show up in the same stack trace, using the Java stack to (auspiciously) improve performance where possible -- I'm assuming it's commented out due to some flaw in the implementation.

## Embedding JRuby Programs in Java

When embedding JRuby in Java programs, errors that occur can potentially leave the Ruby runtime altogether. When this happens, Java code is in total control. To make this transition as seamless as possible, JRuby performs some nifty tricks with traditional Java stack-traces.

Our old friend RaiseException actually generates the object-graph for a backtrace like above, and then creates a pseudo-Ruby stack in the Java code that lets a Java programmer see where in the Ruby code the error occurred. Here is the example from way up above as generated in Java code:

```bash
Exception in thread "main" javax.script.ScriptException: org.jruby.exceptions.RaiseException: undefined method 'call' for nil:NilClass
 at org.jruby.embed.jsr223.JRubyEngine.wrapException(JRubyEngine.java:112)
 at org.jruby.embed.jsr223.JRubyEngine.eval(JRubyEngine.java:173)
 at realjenius.SampleProgram.main(SampleProgram.java:13)
Caused by: org.jruby.exceptions.RaiseException: undefined method 'call' for nil:NilClass
 at Kernel.call(./another.rb:7)
 at Another.do_something_else(./another.rb:3)
 at Another.do_something([script]:5)
 at MyClass.run([script]:9)
 at (unknown).(unknown)(:1)

Other conversions for the backtrace (such as the fancy NativeException stuff) works naturally with this code as well, allowing for diversions in Ruby code to show up naturally in the Java stack.

Frame Peeking with Ruby Programs

I previously mentioned Kernel#caller, which is a method for peeking at the going-ons in the Ruby trace. Now that we understand the structure of the frames, it is probably pretty easy to see how they will be used. The implementation of org.jruby.RubyKernel#caller simply calls ThreadContext#createCallerBacktrace which is much like all of the other code we looked at, but it creates a RubyArray containing strings representing the state of the frames in the context at that time.

 1public IRubyObject createCallerBacktrace(Ruby runtime, int level) {
 2    int traceSize = frameIndex - level + 1;
 3    RubyArray backtrace = runtime.newArray(traceSize);
 4
 5    for (int i = traceSize - 1; i > 0; i--) {
 6        addBackTraceElement(runtime, backtrace, frameStack[i], frameStack[i - 1]);
 7    }
 8
 9    return backtrace;
10}

It’s probably also clear by now why optimizations like ‘jruby.compile.fastest’ can break these methods; the frames aren’t there for the ThreadContext to report against.

Conclusion

While the frame concepts in JRuby in and of themselves aren’t that complicated, you have to have a strong foundational knowledge of how Ruby works and how method dispatching in JRuby works to understand the code flows. I hope I’ve been able to condense the concepts into an easy enough walkthrough.

I’m by no means done with these JRuby articles – I took a little hiatus for work and personal reasons, but hope to have more coming out of the gates real soon. Here is a peek at some possible subjects:

  • The Library Load Service
  • Continuations (Kernel#callcc)
  • Java Proxying and Support
  • The New Kid on the Block: Duby

As usual, votes are welcome: Contact Me.

Stay Tuned!

JRuby "IO.foreach" Performance

/img/articles/jruby/logo.png

I’ve been spending some time dipping my toes in patch contribution for JRuby recently. I started with a few easy, isolated, spec issues, and have since been working my way into more entrenched problems. The past few weeks I spent a good bit of time toying with solutions to JRUBY-2810: “IO foreach performance is slower than MRI”. The exercise was interesting enough, that I thought it might be worth posting here. This isn’t meant to be a study of the JRuby code in particular, but more-so in the thought process of diagnosing a performance problem in foreign code.

Proof is in the Benchmark

Performance is a very multi-faceted thing - there are so many measuring sticks (CPU, memory, I/O, startup time, scalability, ‘warm up’ time, etc). This makes quantifying a performance problem hard.

Furthermore, improvements for most performance problems typically involves making some kind of trade-off (unless you’re just dealing with bad code). The goal is to trade-off a largely-available resource for a sparse one (cache more in memory to save the CPU, or use non-blocking IO to use more CPU rather than waiting on the disk, etc).

JRuby always has a few open, standing performance bugs. It’s the nature of the beast that it is compared to MRI (the “reference” implementation), and anywhere it performs less favorably is going to be considered a bug (fast enough be damned). The performance measurement is up to the beholder, but CPU timings are generally the most popular.

JRUBY-2810 is an interesting case. IO line looping was proving to be slower than MRI Ruby; in some cases much slower. In this particular case, CPU was the closely-watched resource.

The first step I took to analyzing the problem was reproducing it. With Ruby this is usually pretty easy, as arbitrary scripts can just be picked up and executed, as opposed to Java, where all too often you have to build a special harness or test class just to expose the problem. Scripts are very natural for this, and in this particular case, the user had already provided one in the benchmarks folder that ships with the JRuby source.

Having run that file, I quickly saw the performance discrepancy reported in the bug. At this point in my experimenting, I was running inside an Ubuntu VM through VirtualBox on my Windows machine, so I think that level of indirection exasperated the numbers, so I checked my Macbook Pro as well. In both cases, the differences were significant: on Ubuntu, MRI Ruby was running the code in under 10 seconds, where JRuby was taking 30 seconds to a minute; the Macbook was still twice as slow in JRuby (12 seconds) as compared to MRI (6.5 seconds).

When faced with a big gap like this, I generally start by profiling. Running the entire process under analysis will generally grab some hotspots that need some tuning. I’m enamored with how low the barrier to entry on profiling has become on modern Java VMs (something that I think is actually a big selling point for JRuby as compared to other Ruby implementations; but I digress). To do my work here, I simply ran the benchmark, and popped open VisualVM. From there, I simply connected and performed CPU profiling (which automagically connects and injects profiling code into the running system).

In this particular case, the first problem was quickly uncovered:

Great Odin's Raven!

Great Odin's Raven!

Clearly, a very large amount of time is being spent in ByteList.grow. I felt fairly fortunate at this point, as rarely is it this straightforward; having a performance problem reported with this singular of a hot-spot. When nearly 80% of the processing time is spent in a single method, it brings up several questions: What is ByteList? Why does IO.foreach use it? Why must it keeping ‘growing’? Did I leave the iron on? To answer these questions (most of them, anyway) you simply have to get your feet wet in the code.

Coding for Crackers

At its heart, IO.foreach (and the close counterpart, each/each_line) is simply a line iterator that hands off each line of text off to a receiving block - there are a number of caveats and subtleties built into that idea, but at its core, it allows developers to write code like this:

1io = #...
2io.each_line do |line|
3  puts line
4end

Deceptively, simple - isn’t it? It turns out that a lot of wrangling has to occur to make this so simple - much of it having to do with how files are encoded, and the variety of line separators that may exist. Thankfully, the good folks at JRuby have cemented this in the code fairly decently - for my part, I mostly had to draw boxes around the complex encoding and line termination algorithms, and focus on the loop and data-reading itself. Most of this was occurring in a single method (for the code-eager, this was in RubyIO#getline and its derivatives). This method is used in a number of scenarios: the traditional looping algorithms, the 1.9 inverted enumerator stuff (handing the ownership of “next” off to the caller) as well as basic calls to ‘IO.gets’. Internally, each call to getline allocates a new ByteList and copies data from the input stream into it.

This is where the high-CPU numbers started. ByteList is simply an easier-to-use wrapper around a byte[]. It backs several JRuby data structures - the most notable probably being RubyString (the Java peer for String objects in JRuby). In fact, the ByteList allocated in this routine is eventually given to a String object, and returned at the end of the call. The ‘grow’ method on ByteList (the offending code-point) is the automatic capacity increase mechanism, and does this via an an array-allocation and copy (much like ArrayList); this method uses a fairly standard 1.5x grow factor.

It’s easy to see how ByteList would be central to the benchmark since it represents the primary data structure holding the bytes from the input source, but it seemed suspicious that ‘grow’ was the offending hotspot. I would expect it to be one of the copy methods, like ‘append’, which is really where the algorithm should be spending its time (that, and ‘read’ from the input source). To understand why ‘grow’ was so cranky, I had to look more closely at the code I was invoking: the benchmark.

Understanding the Benchmark

The original benchmark used to test the ‘foreach’ performance in JRuby when 2810 was first opened performed something like 10,000 line iterations on a file with relatively short lines. Halfway through the life of this bug, those values were adjusted in this original benchmark in a way that exposed a short-coming in the JRuby line-read routine - by generating only 10 lines that were very, very long instead.

For any Ruby implementation, reading a file with particularly long lines using foreach is prohibitively expensive, as the entire line has to be read into memory as a single string object that is then shared with the code block. Normally, you wouldn’t want to read data this way if you knew that the file was structured so wide, and should probably consider a streamed-read instead. That being said, MRI Ruby performed much more admirably in this scenario, so it was something to be analyzed.

The root of the problem was this: JRuby was starting with an empty ByteList, and was then creating subsequently larger byte[]s indirectly (via ByteList.grow) - the 1.5x factor wasn’t enough, as the chunks were being read 4k at a time, and these files were significantly wider than 4k. For that reason alone, the ByteList was having to grow a number of times for each line, and when we’re talking about a byte[] several kilobytes in size, array copies are simply going to be expensive - all those together combine to make this an unfriendly performance proposition.

As I mentioned previously, the benchmark used to be a very different performance model. I decided at this point it was good to split the benchmark so that both could be run side by side, and I could see both the ‘wide’ scenario and the ‘tall’ scenario at the same time. It turned out via profiling that the tall file was experiencing pains from ‘grow’, but not nearly so badly. Even at 10,000 lines the amount of adverse memory allocation and churn was much smaller, as a single 4k allocation on each line was more than sufficient.

For reference, here is what the ‘tall’ benchmark looks like:

 1require 'benchmark'
 2
 3MAX  = 1000
 4BLOCKSIZE = 16 * 1024
 5LINE_SIZE = 10
 6LINES = 10000
 7FILE = 'io_test_bench_file.txt'
 8
 9File.open(FILE, 'w'){ |fh|
10  LINES.times{ |n|
11    LINE_SIZE.times { |t|
12      fh.print "This is time: {t} "
13    }
14    fh.puts
15  }
16}
17
18stat = File.stat(FILE)
19(ARGV[0] || 5).to_i.times do
20  Benchmark.bm(30) do |x|
21    x.report('IO.foreach(file)'){
22      MAX.times{ IO.foreach(FILE){} }
23    }
24  end
25end
26
27File.delete(FILE) if File.exists?(FILE)

The only difference in the wide benchmark is the tuning parameters:

1LINE_SIZE = 10000
2LINES = 10

So ‘tall’ can be read as ‘10000 lines, 10 sentences long’, and ‘wide’ can be read as ‘10 lines, 10000 sentences long’.

Also for reference, here is what it looks like to run a benchmark using this framework - 5 iterations are run (as defined in the file), and the various aspects of CPU usage are measured. Generally, the most important number is the ‘real’ column when measuring performance between Ruby and JRuby, as the two report user/system CPU usage very differently.

 1# Running with JRuby
 2realjenius:~/projects/jruby/bench$ jruby --server bench_io_foreach_wide.rb
 3                                     user     system      total         real
 4IO.foreach(file)                63.970000   0.000000  63.970000 ( 63.764000)
 5                                     user     system      total         real
 6IO.foreach(file)                30.212000   0.000000  30.212000 ( 30.212000)
 7                                     user     system      total         real
 8IO.foreach(file)                30.973000   0.000000  30.973000 ( 30.973000)
 9                                     user     system      total         real
10IO.foreach(file)                30.768000   0.000000  30.768000 ( 30.767000)
11                                     user     system      total         real
12IO.foreach(file)                32.813000   0.000000  32.813000 ( 32.813000)
13
14#Running with MRI Ruby
15realjenius:~/projects/jruby/bench$ ruby bench_io_foreach_wide.rb
16                                     user     system      total         real
17IO.foreach(file)                 0.200000   9.500000   9.700000 (  9.982682)
18                                     user     system      total         real
19IO.foreach(file)                 0.230000   9.430000   9.660000 (  9.889992)
20                                     user     system      total         real
21IO.foreach(file)                 0.560000   9.340000   9.900000 ( 10.232858)
22                                     user     system      total         real
23IO.foreach(file)                 0.520000   9.270000   9.790000 ( 10.054699)
24                                     user     system      total         real
25IO.foreach(file)                 0.600000   9.350000   9.950000 ( 10.348258)

After splitting the benchmarks, here is a breakdown of my two configurations:

Environment ‘wide’ MRI ‘wide’ JRuby ‘tall’ MRI ‘tall’ JRuby
Ubuntu VM 10 seconds 30 seconds 6 seconds 11 seconds
Macbook Pro 6.5 seconds 12 seconds 8 seconds 15 seconds

Keep in mind I’m just rounding here; not really trying to be exact for this blog post. Check the bugs for more exact numbers.

A Solution Lurks

So, we have performance problems on tall files, and a whole lot more performance problems on wide files, particularly depending on the environment. Because of the environmental discrepencies, I spent some more time comparing the two test environments. It turned out that the Macbook Pro was simply working with a more resource-rich environment, and as such wasn’t hitting the wall as badly when allocating the new immense byte[]s. The implementation in JRuby was not degrading as well on older (or more restricted) hardware as MRI.

(It’s probably good to note here the value of testing in multiple environments, and from multiple angles)

My first pass at a solution to this problem was to consider a byte[] loan algorithm. Basically, at the start of foreach, I effectively allocated a single ByteList (byte[] container), and for each iteration of the loop, I just reused the same ByteList – eventually the byte[] being used internally would be sufficient to contain the data for each line, and would not need to grow any more (yay recycling!).

I encapsulated most of this ‘unsafe’ byte[] wrangling and copying into a small inner class called ByteListCache - at the start of the loop, the ByteListCache is created, and then it is shared for each iteration, being passed down into ‘getline’ as an optional parameter, the side effect being that the first call to ‘getline’ manually allocates a large byte[] (just like it did pre-patch), and each subsequent call can simply reuse the previously allocated byte[] that is already quite large. If the need arises to grow it more, it can, but it becomes increasingly less likely with each line.

Once the iteration is completed, the ByteListCache is dropped out of scope, ready for garbage collection. The number of calls to ‘grow’ drops dramatically with this implementation, and so did the impacts to the performance:

Environment ‘wide’ MRI ‘wide’ JRuby ‘wide’ JRuby (v1) ‘tall’ MRI ‘tall’ JRuby ‘tall’ JRuby (v1)
Ubuntu VM 10 seconds 30 seconds 7 seconds 6 seconds 11 seconds 8 seconds
Macbook Pro 6.5 seconds 12 seconds 7 seconds 8 seconds 15 seconds 9 seconds

Unfortunately, they were only this fast because the implementation was now thoroughly broken.

Stop Breaking Crap

Okay, so I had amazing performance numbers. Except. Now over 50 ruby spec tests were failing. Oh yeah, that might be a problem. Needless to say the problem was obvious the minute I realized what I had done (I actually woke up at 6:00am realizing this, which if you know me, is a bad sign). Remember how earlier I said that the ByteList was used as a backing store for the String? Well, at the time I implemented this, that point had eluded me. I was (accidentally) creating strings with my shared bytelist, so you can probably see where that would end up creating some significant issues with data integrity.

To fix this, the solution was simple - create a perfectly-sized ByteList at the end of the line-read the exact size necessary for the String, copying into it from the shared bytelist, and then passing it in to the String constructor. Obviously this cut into my performance numbers by a percentage on each, but it also fixed the data corruption, which is nice.

Environment ‘wide’ MRI ‘wide’ JRuby ‘wide’ JRuby (v2) ‘tall’ MRI ‘tall’ JRuby ‘tall’ JRuby (v2)
Ubuntu VM 10 seconds 30 seconds 14 seconds 6 seconds 11 seconds 10 seconds
Macbook Pro 6.5 seconds 12 seconds 10 seconds 8 seconds 15 seconds 13 seconds

The lesson learned here, obviously, is that you need to run a variety of tests (a full suite of specs if you have them) when considering bug fixes. For JRuby, that means (at a minimum) running the specs, which is easy with the Ant script:

1ant spec # or ant spec-short to just run interpreted tests

A Word on Limited Application

Note that I isolated the use of this construct to the foreach and each_line algorithms, as these had deterministic, single-threaded behavior, and would benefit from the overhead of dealing with this additional object. The new Ruby 1.9 enumerator stuff does not use it, as there is no guarantee of single-threaded usage of the enumerator, so we can’t reuse a single byte list. Similarly, individual calls to ‘gets’ do not currently use it, for the same general reason.

Changes could be made to make this byte[] re-use more long-lasting/global - but the changes required felt a little too overwhelming for a first pass, even if they did offer potentially larger benefits.

Rinse and Repeat

Now that I had two tests, and I had seen some improvements (but not quite in the range of MRI), it was time to revisit. Re-running the benchmarks, it was fascinating to see a new prime offender - incrementlineno. It turns out that a global variable was having to be updated through a very indirect routine that contains a fixnum representing the line number in the file, and all of this heavy-weight variable updating (going through call-sites and arg file lookups) was very expensive in comparison to the rest of the iteration.

At this point, I’d spend a lot of time explaining how I improved the performance of this little gem, however the truth be told once I hit this, I simply had to inform the powers-that-be, and back up. You see, I couldn’t figure out (for the life of me) why this method was doing what it was doing; why it was so important for this line number to be set. This is one of the perils that I have verbally discussed with folks about consuming foreign code-bases. You can’t assume secret sauce is a bad thing - I had to assume it is there for a reason, even if I don’t know what it is.

It turns out, the JRuby folks didn’t know the reason either. Well, that’s not exactly true; it didn’t take long for Charles Nutter to figure out why it was there, but it was clear it was only for rare file execution scenarios, and not appropriate for the more general looping scenarios I was debugging. To follow his efforts on how he optimized that code path, you can reference his commit here: JRUBY-4117.

After his optimizations, the numbers boosted again:

Environment ‘wide’ MRI ‘wide’ JRuby ‘wide’ JRuby (v3) ‘tall’ MRI ‘tall’ JRuby ‘tall’ JRuby (v3)
Ubuntu VM 10 seconds 30 seconds 11 seconds 6 seconds 11 seconds 8.5 seconds
Macbook Pro 6.5 seconds 12 seconds 6.3 seconds 8 seconds 15 seconds 9.5 seconds

Summary

I think it’s fascinating how varied the numbers are depending on the platform. This is a complicated benchmark, and as Charles Nutter mentioned to me, one problem we’ll continue to face is that we have no control element in this benchmark. You can get consistency through repetition, but there are simply too many variables to predict exactly what the outcome will be on any given platform. I find it interesting how well the Macbook handles the wide files compared to the Ubuntu VM, which just dies a slow death in comparison - this has to be a side-effect of resource starvation in the VM; but whatever the case, it’s an interesting dichotomy.

On average, the new numbers are much more competitive with MRI, even if they don’t beat it in most cases. As I learned from working with others on this bug, your mileage may vary significantly, but it’s clear from the implementation that we’re causing a lot less resource churn for very little cost (the trade off here is retained memory), and that’s generally a good sign things are going in the right direction. Certainly, profiling has shown that the effort is much more focused on reading from the input channel.

That being said, I’m sure there is more performance to be found - MRI is just a hop-skip-and-jump away!

java  ruby  jruby 

Distilling JRuby: The JIT Compiler

/img/articles/jruby/logo.png

The JIT compiler in JRuby is a relatively new creation for JRuby. Of course, the initial approach taken for the JRuby platform was the most straightforward: parse and interpret the code incrementally as the program executes, traversing the AST. As time went on, the JRuby team have taken a bunch of steps to improve the performance of JRuby, most of which have involved shortcutting the consistent, but also slow and indirect interpreter model. The JIT compiler is probably one of the most aggressive and technically complex steps taken to date.

JIT compilers are a novel idea: take some code in an “intermediate form”, and, given some heuristics, compile it into a “more native” representation, with the expectation that the more native version will perform faster, and allow for more optimizations by the underlying platform. If some optimizations can be thrown into the more native form in the process, all the better.

Java, as an example, has had a JIT compiler for several years now. In fact, Java was, for many developers, the first time they heard the term JIT; so much so that many developers I know think the “J” in JIT stands for “Java”. In fact, JIT stands for Just-In-Time. Smalltalker’s may recognize the term “Dynamic Translation” instead.

Anyway, when the Java runtime determines code is eligible for native compilation (frequency of execution is one of the primary parameters), it diverts the execution of the code, so it can perform some fancy hoop-jumping to turn the Java bytecode into platform-specific native instructions, thereby removing any cost of bytecode interpretation, and also throwing some nifty optimizations into the compiled code. From that point forward, the native code will be used for execution, unless and until it has been invalidated by changing assumptions.

JRuby’s JIT compiler is similar in nature, the primary difference being the source and destination formats. In Java the source format is Java bytecode, and the destination format is native machine instructions. Conversely, in JRuby, the source format is the JRuby AST, and the destination format is Java bytecode. Perhaps the most fascinating aspect of the JRuby JIT compiler is that it benefits from both the initial compilation into Java bytecode, and then later, when Java may attempt to translate the JRuby-generated bytecode into native machine instructions. So effectively, it is possible to get a double-JIT on your executing code.

Generating Java bytecode from interpreted Ruby code is no small feat, however; so, without further ado, let’s start the tour of JRuby’s JIT!

There’s Compilers, and then there’s Compilers

Before we go to deep, we should do a quick overview of what is really meant by “compiling” in JRuby. Some of this discussion isn’t specific to the actual concept of “JIT” compilation; there are a few different code-paths to getting to compiled code in JRuby. One of those is definitely Just-In-Time method compilation, however JRuby can compile entire scripts as well (with Modules and Classes, and other fancy-shmancy stuff). So as this article proceeds, I will try to make a distinction on this.

When I refer to the JIT Compiler, I’m really referring to the component that is responsible for tracking method invocations and compiling them when appropriate. On the flip side, the term “Compiler” by itself belongs to the component in JRuby that can compile any Ruby script into an executable chunk of Java.

There are several tuning parameters in JRuby for the compiler as well; setting the maximum number of methods to compile, how many method calls before the compiler should be invoked, the maximum lines to compile into a single string of bytecode, etc. These settings affect different parts of the infrastructure, but for the most part, you shouldn’t have to care whether it is JIT-specific or not; in most cases it doesn’t matter.

Stay Classy, Java

So, JRuby is compiling Ruby into executable chunks of Java - that much we know; but what are they, actually? Java, as you may or may not be aware, prevents the modification of any loaded classes in the VM (security or some such nonsense; dang buzzkills). This rule prohibits JRuby from using the same class file to represent a live class in Ruby. Intuitively, it might seem logical for JRuby to group all of the methods for a single Ruby class into a single Java class; however, because of the aforementioned restriction in Java, and the facts that a.) Individual method are only compiled by the JIT at thresholds (meaning some of a class may be compile-eligible, while other parts aren’t), and b.) Ruby allows for runtime method manipulation, this single-class paradigm isn’t possible. So instead, the JRuby-to-Java compiler infrastructure is designed to turn any given JRuby AST into a generated Java class. In other words, any time a hierarchy of AST nodes that represents a Ruby script is compiled in JRuby, a new Java class is derived to match, built, loaded, and otherwise made awesome. Unlike the actual meta-class in Ruby, the AST is as static as the block of code from which it was built. If another script comes along with more AST that modifies the same class, Ruby doesn’t care - it will simply have compiled method hooks pointing to two entirely different Java classes.

The classes generated by the JRuby compiler (note I didn’t mention the word JIT here), implement the JRuby interface ‘org.jruby.ast.executable.Script’. We’ll see later how this is actually used.

That’s Some Crazy JIT!

In the first Distilling-JRuby article I detailed the class ‘DynamicMethod’. This class represents the “implementation” of a method in JRuby. As I discussed in that article, there are a ton of implementations; Figure 1 is the class-hierarchy again for those of you who’d rather not dig back to the old article. One of the arrows I have drawn on the diagram is pointing to an innocuous little class called ‘JittedMethod’. What this really represents is the cusp of the rabbit hole we are about to go down. Before we go too far down that hole, however, let’s do a quick recap.

Hello, Again Old Friend

Hello, Again Old Friend

When JRuby wants to invoke a method, it has to find an implementation of that method. I talked all about that in part 1. When it finally does find that implementation, it is going to be one of these DynamicMethod objects. One of them that I mentioned previously is the “DefaultMethod”, which I referred to as the ‘shell game’ method implementation. DefaultMethod is a consistent implementation for callers that internally delegates to one of two implementations: InterpretedMethod (ol’faithful) and JittedMethod (a primary patient for today).

Method invocation is the point at which code is JIT’ed in JRuby (but not the only way it gets compiled). DefaultMethod keeps an internal counter representing the call-count for that particular method. This number will be used by JRuby to determine whether or not the method is “JIT eligible”. There are somewhere between ten and four million overloaded variants (I got tired of counting) of the ‘call’ method on DefaultMethod, but needless to say they all do something like this:

1@Override
2public IRubyObject call(ThreadContext context, IRubyObject self, RubyModule clazz, String name, [...]) {
3  if (box.callCount >= 0) {
4    return tryJitReturnMethod(context, name).call(context, self, clazz, name, [...]);
5  }
6  return box.actualMethod.call(context, self, clazz, name, [...]);
7}

(In this case, “box” is just a container for the delegate method and the call count integer pair, and the […] denotes where the various “Arity”-supporting implementations diverge.)

s long as call count is a positive integer, the “tryJitReturnMethod” call is made, which will check in with the JIT subsystem, and will attempt to build a new method implementation. When that call completes, it returns a method implementation which this method can use, and that implementation is then invoked. If call count is negative, however, the method cached on this receiver is simply called.

The call-count integer serves multiple purposes. By simply being set negative, it effectively turns off the attempts to JIT-compile this particular method, but it also represents a counter for the number of invocations this method receives prior to JIT compilation actually being attempted. Call count is the primary metric JRuby uses to throttle JIT compilation.

The “tryJitReturnMethod” simply looks up the JITCompiler implementation using a call to Ruby#getJITCompiler(). From here, we are entering the JIT universe.

“You are in a maze of twisty passages, all alike.”

The actual org.jruby.compiler.JITCompiler class represents the heart of the entire JIT process. This class is the manager for the primary JIT compilation efforts (It is also one of JRuby’s registered MBeans - a topic I plan to discuss soon). The primary invocation point is a method called “tryJIT”, which takes the DefaultMethod instance we were just working with, as well as the method name and the current ThreadContext (there’s that context showing up again). From here, a number of contextual checks are made to see if this particular invocation is eligible for JIT:

  • Is the JIT compiler enabled?
  • Is the method call count above the threshold for the JIT?
  • Is the JIT method-cache full?
  • Has the method been excluded from JIT’ing?

After all of this, if the method still checks out, it’s time to begin the real fun of compiling (or at least, trying to). At this point I’m going to (temporarily) do some hand-waving, and let us all pretend that we’ve passed a bunch of JRuby AST to our magic fire-breathing compiler monster, and out the back-side we were given a bunch of Java bytecode that acheives the same result as the AST. To help with the transition, I have made a high-level diagram for this portion of the code.

/distilling/jruby/jruby_compiler_monster.png

Assuming you are still on board, remember that the goal at this point is to have a Ruby method be converted into a loadable chunk of Java code, so the bytecode we were given (which represents a Java class) needs to be loaded into a Java class at runtime. JRuby needs an efficient and ‘appropriate’ way to load the Script sub-classes into Java, and keep a handle on them so we can create new instances of our ‘method’-powering class.

The component ‘org.jruby.ClassCache’ performs both of these tasks, and is responsible for respecting the cache configuration parameters specified at startup. The method ‘cacheClassByKey(…)’ is called, and is given the bytecode from our fire-breathing monster. To load the class into the runtime, a small class called OneShotClassLoader is used, which as the name implies, is only used once. The classloader is a child of the primary JRuby class-loader (in the normal case, anyway), which means that the code generated in the script has visibility to all of the JRuby classes and dependencies (this will be important later). At the same time, it means that the class is isolated from the other scripts in its own classloading domain.

Interestingly, this class-cache does not actually hand out references to the class. It returns the class from the initial cache call, but then simply retains references to the classes in a weak-reference form, so that if a method goes out of scope (like if a call to ‘remove_method’ was made on a class), the reference will be dropped, and the cache will shrink by one. In other words, the primary goal of the cache is to act as a throttle, as well as an endpoint for JMX monitoring.

To create a key to represent the class in the cache distinctly S-expressions are used. A class called SexpMaker is given the various AST elements representing the method, and it in turn generates an S-expression that represents the method. If you ever want to get a good feel for the methods stored in the ClassCache during a Ruby program execution, putting a break-point and looking at the S-expressions in the cache can be enlightening. As an example, I created a very simple Ruby class that looked like this, and made a call to it in my script elsewhere:

 1class MyClass
 2  def do_something
 3    c = 3
 4    d = c
 5  end
 6end
 7
 8# ...
 9
10obj = MyClass.new
11obj.do_something

I then set up the JIT compiler to run in such a way that this method would be JIT’ed. Here is the S-expression generated for that method (formatting mine):

(
  method do_something (argsnoarg) (
    block
    (newline (localasgn c (fixnum 3)))
    (newline (localasgn d (localvar c)))
  )
)

As you can see, it’s a short-hand representation of the actual AST, and it is unique for that AST structure.

Method Swapping

So we’ve made it through the journey of the compiler, and our JIT’ed code is now contained in a nice handy-dandy Script class. The JIT compiler class proceeds to create a new instance of our Script, and along with some various bookkeeping, calls DefaultMethod.swithToJitted(Script, CallConfiguration), passing in the Script object that represents the logic for this method.

This method assigns a new JittedMethod object to the default method container, and sets the call-count to a negative value, disabling subsequent JIT attempts.

Assuming the compilation has worked correctly, the actual invocation of the script is fairly straightforward. There are a variety of ‘Arity’ implementations on the Script API to line up with the DynamicMethod call methods, but for the most part they all do the same thing:

1try {
2  pre(context, self, name, block, 1);
3  return jitCompiledScript.__file__(context, self, [...]);
4// handle exceptions omitted.
5} finally {
6  post(runtime, context, name);
7}

Effectively, the Script API is analogous to a general ‘Runnable’ representing a block of code; it has been specialized to handle a number of call configurations, but for the most part, it simply is a logic containing object.

From this point forward, that method is now JIT’ed unless/until it is removed from scope.

I’d Like a Side of Bytecode With My Bytecode

Okay, time for the good stuff. We’ve done enough hand-waving, it’s time to explore the compiler. Before I go any further, I should mention the –bytecode switch. If you ever want to see the bytecode JRuby generates for a chunk of Ruby code, you can simply by invoking JRuby:

1    jruby --bytecode my_script.rb

It was immeasurably helpful to me in writing this article.

Compiler Components

There are several pieces and parts that all participate in the compilation process (and incidentally, many have the word ‘compiler’ in the name). That makes it a fairly complex creature to understand. If the compiler is a big furry fire-breathing monster, we’re about to dissect it and poke at the internal organs. So, let’s start with a quick break-down:

  • org.jruby.compiler.ASTInspector - The ASTInspector class is effectively a statistics-gathering tool for the AST; a detective looking for certain conditions. The AST is given to this class at the start of compilation, and it looks for certain conditions in the AST that influence the overall behavior of the resulting code. One of those conditions that is scanned for is the concept of scope (which #{post ‘distilling-jruby-tracking-scope’}we talked about last time#{/post}). Scope becomes very important, because if the code in question doesn’t need an explicit scope, the compiled code can be made much simpler; likewise if it does have some intense scoping, the compiled code has to make sure it respects that so variables aren’t leaking all over the place.

  • org.jruby.internal.runtime.methods.CallConfiguration - This is an enumeration representing the type of method invocation involved at a certain level, and is calculated and returned by the ASTInspector, depending on the structures it finds. The call configuration isn’t unique to the compiler process; in fact it really is part of the scope management; but was a bit too detailed for the previous discussion. This enumeration is the actual object that performs the ‘pre/post’ invocations on the ThreadContext to setup any scope that is necessary; different work is done depending on the requirements of method being invoked. Some example call configurations are FrameFullScopeFull (meaning it needs a frame and a scope) and FrameNoneScopeNone (meaning it needs neither). We haven’t discussed the concept of ‘frame’, however it basically represents the invocation of a method in a certain context: the call frame. It keeps track of information that allows Ruby to manage the call stack beyond the scope, which we previously discussed.

  • org.jruby.compiler.ASTCompiler - The ASTCompiler knows specifically how to traverse the AST, and how to then consult with other objects to translate it into an alternate representation. To handle the actual bytecode generation, the ASTCompiler hands responsibility off to the bytecode generating parts. The ASMCompiler handles the busy work of setting up the compiler hierarchy when traversing method entry/exit, closures, etc.

  • org.jruby.compiler.ScriptCompiler - The ScriptCompiler interface defines the high-level hooks into the underlying bytecode generation process used by the ASTCompiler. The sole implementation of this API currently is StandardASMCompiler, which as you could guess is backed by the ASM bytecode library This class will create “sub-compilers” that know how to deal with the recursive nature of the compilation process.

  • org.jruby.compiler.impl.BodyCompiler - BodyCompilers are the ‘sub-compilers’ I just mentioned. Specifically, each BodyCompiler deals with blocks of code that may carry their own scope/set-of-variables (good thing we already discussed scope!). Here is the current class-hierarchy of body compilers:

    /img/articles/distilling/jruby/body_compilers.png

    The two primary categories are “root-scoped” and “child-scoped”. In our scope discussion, we called these two scenarios “local” and “block” respectively. Here is the root-scope javadoc:

    1      /**
    2      * Behaviors common to all "root-scoped" bodies are encapsulated in this class.
    3      * "Root-scoped" refers to any method body which does not inherit a containing
    4      * variable scope. This includes method bodies and class bodies.
    5      */
    
  • org.jruby.compiler.VariableCompiler - The variable compiler API knows how to generate bytecode for loading and storing variable values from the Ruby runtime. Body compilers create variable compilers appropriate for their scope. Certain scope rules allow for certain optimizations which will be described below.

  • org.jruby.compiler.InvocationCompiler - The invocation compiler is a component that knows how to generate bytecode for invoking methods on Ruby objects in the Java memory space. All body compilers have these. This will be described in more detail below.

  • org.jruby.compiler.CacheCompiler - Compiler that can translate certain references in the Java bytecode into quick lookups on the AbstractScript.RuntimeCache object, rather than having to do full lookups into the Ruby runtime (for methods, fixnums, etc). This allows the JVM more opportunities for optimization via inlining and other good stuff.

  • org.jruby.compiler.impl.SkinnyMethodAdapter - Delegating implementation of the ASM MethodVisitor interface that has several convenience methods for making typed bytecode calls; this makes the code that is generating bytecode much easier to trace through than it otherwise would be (not as easy as Bitescript, but we can’t always be idealists).

Here is another view of these elements, roughly categorized by responsibility, showing some interactions:

/img/articles/distilling/jruby/compiler_components.png

(This is only one of many ways to show the interactions at this level, however it should give you some idea of the segregation of these components by responsibility)

The AST analysis components work with the bytecode generation libraries to recursively build up a representation of the algorithm. In the process, the bytecode generation libraries are going to “wire in” to the bytecode calls into several JRuby runtime libraries, including many we’ve already seen, such as the ThreadContext, the scope objects, and a bunch of other helper libraries that are implemented as statics everywhere.

The Nature of the Generated Code

As with everything in JRuby, there are all kinds of special fast-path hooks built into place that complicate the code for quick scanning, however we can talk about the “general” case first, and work our way into the special cases. With our previous code walkthroughs, we saw that the interpreted AST code performed the invocation using a combination of recursive node traversal and invocations of calls into the ThreadContext to manage and lookup scoped variables (creating scopes, storing values, etc). The generated code often stores and retrieves variable values the same way as we have already discussed - via collaborative lookups on the current dynamic scope. In other words, let’s say we have this method:

1a = 5
2b = a

At this point, it’s going to be easier for me to write pseudo-Java-code to express what the generated code could do to solve certain problems. Remember that JRuby is generating Java bytecode to effectively do the same thing I’m doing here.

The pseudo-Java for this particular example could look something like this:

1ThreadContext.getCurrentScope().setValue("a", new RubyFixnum(5));
2ThreadContext.getCurrentScope().setValue("b"), ThreadContext.getCurrentScope().getValue("a"));

Now, I took several liberties with this code to make it easier to read (this isn’t valid code), but it should look a lot like the API we discussed previously in the scope article. In fact, this is the general idea of the code that is generated in many cases by JRuby. It gets more complicated than this, obviously, but this shows the most naive implementation of the JIT compiler - basically: turn the steps taken at runtime by Ruby into hard-wired Java.

If we start poking through the various compilers, we’ll see that they often make extensive use of the Ruby libraries that the AST uses (as we would expect). Here is an example from the HeapBasedVariableCompiler.assignLocalVariable(int index) method (we’ll analyze the variable compilers more deeply below):

1method.aload(methodCompiler.getDynamicScopeIndex());
2method.swap();
3method.invokevirtual(p(DynamicScope.class), "setValueZeroDepthZero", sig(IRubyObject.class, params(IRubyObject.class)));

In this case, “method” is a SkinnyMethodAdapter object as described above. This method is telling the current enclosing Java method to load the DynamicScope object from the local variable table, swap the “value to set” back onto the top of the stack, and then invoke the “setValueZeroDepthZero” method on it (which is a hardwired method for setting the variable at position 0, depth 0 with a value). That means that the DynamicScope variable has to be somewhere on the local variable stack (we’ll see how below), but beyond that it’s fairly straightforward (as these things go, anyway).

Stack vs. Heap

/img/articles/distilling/jruby/variable_compilers.png

All of this talk about DynamicScope and looking up variables is a bit over-simplified. The JRuby JIT compiler has support hard-wired into it for dealing with variables exclusively on the local stack. This is provided via the StackBasedVariableCompiler (you can see the variable compiler hierarchy here). The various body compilers (which, if you recall, represent methods, classes, blocks, and so forth) checks with the associated ASTInspector. If the inspector says that there is no special use of closures or scope-aware method calls, the scope will then create a special stack-based variable compiler.

This stack-based variable management is a pretty significant optimization. Java is much more efficient when dealing with stack-based variables as opposed to heap references, stack memory access is bound to be faster. In interpreted JRuby execution mode there is no such thing as a stack-based Ruby variable; all of the variable tracking is synthesized by the JRuby runtime via use of the Java heap and the scope objects we previously discussed.

Here is what the local-variable assignment looks like for the stack-based variable compiler:

1method.astore(baseVariableIndex + index);

We’ve dropped the DynamicScope lookup, not to mention the invokeVirtual on dynamic scope. In this particular scenario, we have entirely bypassed the JRuby libraries altogether, and are simply using native Java stack to track the Ruby object/value instead. In this particular scenario, JRuby has managed to convert an operation into the direct Java counterpart, with no compromises. That is, of course, lightning fast variable assignment (much faster than several hops through methods in the DynamicScope, and doing in-heap array lookups, etc).

One of the things that the VariableCompilers are responsible for is setting up variables at the start of a method or closure. There are a number of methods on the VariableCompiler API that are consulted at the introduction of these top-level elements - here are three of those elements:

1public void beginMethod(CompilerCallback argsCallback, StaticScope scope);
2public void beginClass(CompilerCallback bodyPrep, StaticScope scope);
3public void beginClosure(CompilerCallback argsCallback, StaticScope scope);

The HeapBasedVariableCompiler is going to get all of the lookup objects it needs (most notably, the DynamicScope) and stuff them into the method as local variables. More concretely, at the top of the heap-generated methods in JRuby, the bytecode generated by ‘beginMethod’ looks something like this:

1public IRubyObject __file__(ThreadContext ctx, IRubyObject self, [...]) {
2  DynamicScope scope = ctx.getCurrentScope();
3  // actual algorithm goes here
4}

(I feel the need to re-stress that this Java code is my extrapolation of the compiled bytecode - JRuby does not currently actually generate Java source, and instead has the “luxury” of dealing with Java bytecode variable indexes and the like. If you were to decompile the bytecode, it might look a little something like this, however.)

There are, of course, other things that the heap-based variable compiler may choose to do at this point to optimize itself, but those specifics aren’t particularly relevant to the idea. If you go back to our “a=5; b=a” example, you’ll see that the pseudo-Java can then simply reference ‘scope’, as opposed to calling ctx.getCurrentScope() every time.

On the other side of the fence, the StackBasedVariableCompiler (which knows it can deal with variables entirely on the Java stack), loops over all of the variables and declares them at the top of the method, assigning them to JRuby ‘nil’. In this example, if the method is going to deal with three variables (we’ll call them ‘a’, ‘b’, and ‘c’), then the stack-based compiler would start the method like this:

1public IRubyObject __file__(ThreadContext ctx, IRubyObject self, [...]) {
2  Object a = ruby.getNil();
3  Object b = ruby.getNil();
4  Object c = ruby.getNil();
5  // actual algorithm goes here
6}

This allows the subsequent code to just deal with variable declaration implicitly, just like Ruby does - since the variables are already on the Java stack, they can simply be blindly assigned a value by the rest of the native bytecode - so going back to our ‘a=5;b=a’ example, it is not unreasonable to consider JRuby generating another set of lines that looks like this:

1a = new RubyFixnum(5);
2b = a;

Not too shabby! In reality, JRuby can optimize even further than this (like dealing with fixnums smarter than that), but that’s beyond this discussion.

Invocation Compiler

The InvocationCompiler is responsible for create bytecode that can invoke another method. In the current JDK world, that means loading the appropriate variables on the stack to then invoke the method “IRubyObject.callMethod(…)” on the “receiver” object. In other words, if the Ruby code was:

1a.do_something

The pseudo-Java code we want to generate would look something like this:

1IRubyObject a = // ... (from somewhere)
2IRubyObject result = new NormalCachingCallSite("do_something").call(a);

Of course, this is a major simplification of the actual call-site management provided by the invocation compiler, and all of this actually implemented in scary bytecode, but you get the idea. From here, the Ruby runtime takes over the method invocation, which goes through the DynamicMethod dispatching we’ve previously discussed. In that process, it may compile more code and create more class-files, but they are effectively disconnected from each other via the call-site indirection through the Ruby runtime method dispatching.

In the “amazing future” we have the concept of “invokedynamic”, which is a Java 7 bytecode instruction that allows for invocations to dynamically-generated methods in the Java space, allowing a language like Ruby to hook into the JVM’s method dispatching, in turn allowing method dispatching in the JIT-compiled code to be fully optimized. Charlie Nutter has described this far better than I can, and his article ties in nicely with the method dispatching I previously discussed.

How JIT Participates

The JITCompiler class effectively has to ‘jump in the middle’ of this compilation routine - taking a single Ruby method and translating it into an entire JIT-compiled class. The class JITClassGenerator (an inner-class to JITCompiler) is responsible for handling this special scenario.

Remember, we already have all of these concepts built for compiling arbitrary scripts - the ASTCompiler and ScriptCompiler are fully capable of handling the introduction of a method. Therefore, most of the work actually happens via the ScriptCompiler. Where-as the general compilation will make calls to “compileRoot” or “startRoot” - the JIT process makes calls to “startFileMethod”.

The primary distinction between the two invocations is described in the ScriptCompiler Javadocs:

 1/**
 2* Begin compilation for the root of a script named __file__.
 3*
 4* @param args Arguments to the script, as passed via jitted wrappers
 5* @param scope The StaticScope for the script
 6* @param inspector The ASTInspector for the nodes for the script
 7* @return A new BodyCompiler for the body of the script
 8*/
 9public BodyCompiler startFileMethod(CompilerCallback args, StaticScope scope, ASTInspector inspector);
10
11/**
12* Begin compilation for a the root of a script. This differs from method compilation
13* in that it doesn't do specific-arity logic, nor does it require argument processing.
14*
15* @param javaName The outward user-readable name of the method. A unique name will be generated based on this.
16* @param arity The arity of the method's argument list
17* @param localVarCount The number of local variables that will be used by the method.
18* @return An Object that represents the method within this compiler. Used in calls to
19* endMethod once compilation for this method is completed.
20*/
21public BodyCompiler startRoot(String rubyName, String javaName, StaticScope scope, ASTInspector inspector);

Basically, startFileMethod performs a special pre-configuration on the generated class to make is self-sufficient for this single method, and entirely invokable through the ‘Script.run’ API. The JITClassGenerator is going to exercise the ASTInspector on the top-level of the method (just like the ASTCompiler would when it hits a method), and is going to then pass all of that information into the ScriptCompiler for construction.

From that point forward, the method is digested as it normally would be by the recursive inspection/compilation process.

Compiler Cookbook

At this point in this article, the general concepts have been thoroughly hashed out, but we’re simply missing some examples of what Java is actually built when certain scenarios are hit. So let’s walk through some. I’m going to use the same scheme from above, describing at a high-level the code that JRuby generates, and then possibly using pseudo-Java examples.

General AST Class

We’ve talked about it quite a bit already, but to cement things - if you have a ‘test.rb’ script and you compile it, you’re going to get back a class representing that script. The most natural Java code representation looks something like this:

1public class test extends org.jruby.ast.executable.AbstractScript { // implements Script indirectly
2  public test() {
3    // some basic initialization here
4  }
5  public IRubyObject __file__(...) {
6    // top-level script code here
7  }
8}

There is, of course, other stuff that goes on. But that’s the general ‘framework’ in which the rest of the generated code works.

Creating a Method

Generally speaking, Ruby methods are translated into Java methods. However, where-as the Ruby methods are typically associated with some parent class, in Java they are simply on the Script-class for which they are compiled. When a method is hit, Ruby combines all of the stuff we previously discussed to create that method. Let’s consider this Ruby script:

1def my_method
2a = 5
3b = a
4end
5
6my_method

JRuby is going to declare the method, embed the logic, and then invoke it. In this particular case, JRuby will be able to use stack-based variables. Let’s build on our previous class (trimming some of the irrelevant bulk):

 1class test extends AbstractScript {
 2  @JRubyMethod(name="my_method", frame=true, ...)
 3  // not the actual generated method name, but close.
 4  public IRubyObject RUBY$my_method(ThreadContext ctx, IRubyObject self) {
 5    Ruby r = ctx.getRuntime();
 6    Object a = r.getNil();
 7    Object b = r.getNil();
 8    a = RubyFixnum.five;
 9    b = a;
10    return b;
11  }
12
13  public IRubyObject __file__(ThreadContext ctx, IRubyObject self) {
14    RuntimeHelpers.def(ctx, self, this, "my_method", "RUBY$my_method", new String[] { "a", "b" }, ...);
15    // in reality, this would be cached by the CacheCompiler, but for ease-of-reading, I'm creating a new one here.
16    return new VariableCachingCallSite("my_method").call(ctx, self, self);
17  }
18}

Wow, a lot going on here - let’s break it down.

  • First, we create a Java method as a peer to a Ruby method. We also assign an annotation to it so it can be tracked in a number of ways by the running system. The method name is generated; I’ve simplified it slightly here, but you get the idea.
  • Then, using the logic we’ve discussed previously regarding the variable management, our method body is compiled and has a rough sketch of what we did with A and B in the ruby implementation.
  • The file method (which represents the invokable part of the script) is created, and is invokable through the Script API
  • file declares our method first. It uses a static call to RuntimeHelpers to do this, which basically looks up the class, and stuffs in a new DynamicMethod object into the RubyClass definition that can call into this script. (Keep in mind in the JIT scenario, this ‘declaration’ is skipped, and instead the body of the method is compiled into the file method, which is then referenced inside a JittedMethod object).
  • Next, we get a handle on a CallSite object, and then call invoke on it. From here, Ruby is going to wind back through the call-process to our compiled method object we just put in the RubyClass.

Naively, this invocation can use reflection to make the call. However, remember from the first discussion that call-site objects often result in pre-compiled “mini-classes” that have a single purpose - calling the method on an object. These mini-classes can often be jitted out of the execution, making our indirect call-site invocation nearly as fast as a direct method call - cool!

Creating a Class

Class creation in Ruby scripts is not necessarily creation (it could be augmentation/modification). As such, creating a Java Class is not actually what needs to be done in this case. Besides, even if the class is created in a compiled script, it still needs to be in the JRuby class namespace so that non-compiled code can reference it. Therefore, a method is created that can perform the class construction. Here’s a basic modification of our example:

1class MyClass
2 def my_method
3  a = 5
4  b = a
5 end
6end
7
8x = MyClass.new
9x.my_method

Our example grows again in this case:

 1class test extends AbstractScript {
 2  public RUBY$MyClass(ThreadContext ctx, IRubyObject self) {
 3    RubyModule m = RuntimeHelpers.prepareClassNamespace(ctx, self);
 4    RubyClass cls = m.defineOrGetClassUnder("MyClass", null);
 5    LocalStaticScope scope = new LocalStaticScope(...); // for the class
 6    ctx.preCompiledClass(scope, ...);
 7    try {
 8      RuntimeHelpers.def(ctx, self, this, "my_method", "RUBY$my_method", new String[] {"a", "b"});
 9    }
10    finally {
11      ctx.postCompiledClass();
12    }
13  }
14
15  @JRubyMethod(name="my_method", frame=true, ...)
16  // not the actual generated method name, but close.
17  public IRubyObject RUBY$my_method(ThreadContext ctx, IRubyObject self) {
18    Ruby r = ctx.getRuntime();
19    Object a = r.getNil();
20    Object b = r.getNil();
21    a = RubyFixnum.five;
22    b = a;
23    return b;
24  }
25
26  public IRubyObject __file__(ThreadContext ctx, IRubyObject self) {
27    DynamicScope scope = ctx.getCurrentScope();
28    RubyClass cls = RUBY$MyClass(ctx, self); // invoke the class-creation method.
29    // in reality, this would be cached by the CacheCompiler, but for ease-of-reading, I'm creating a new one here.
30    scope.setValue("x", new NormalCachingCallSite("new").call(ctx, self, cls));
31    return new NormalCachingCallSite("my_method").call(ctx, self, scope.getValue("x"));
32  }
33}

So what’s changed since our basic method-only example? Well, the generated “my_method” is identical. However, we now have a method to create the class as well. Let’s analyze this method:

  • First, given the current context, find the appropriate RubyModule.
  • Ask the module to get or define the class with the given name (remember, we could be updating an existing class.
  • Adjust the ThreadContext to have the class as the primary scope.
  • Define the method (which will be defined on the current class in scope).
  • Reset the ThreadContext

If we look at our file method, you’ll see the first thing it does this time is call this “class constructor”. Note also that the file method is now using heap-based variable management because the AST has classes mixed in, causing the inspector to throw out stack-based management due to the risk of leaky vars.

After calling the construct method, the method then invokes setValue for the current scope, passing in the result of calling ‘new’ on the new class. Then, the invocation of “my_method” is performed on the object held in “x” on the scope.

Other scenarios are just extensions of all of these same ideas.

Future Plans

As fancy as this current compiler architecture is, the JRuby team is not sitting still. Currently in JRuby master, there is a significant effort underway to reimagine the underlying compilation process. One of the problems with the current approach is that the AST does not lend itself well to analysis and optimization. The current work underway derives a new intermediate representation from the AST, providing an entirely separate object hierarchy. This new representation can then be shuffled and reorganized; patterns can be derived and compressed, and all sorts of other gooey goodness can be found. The IR is designed to be analyzed by the compiler; something that could never be said for the AST.

From a component standpoint, the plan (as best as I have derived) is to replace the true “compiler” component as described above, allowing the higher-level constructs that embed that compiler into the JIT process, leaving the rest of the infrastructure around it (like the JITCompiler class) mostly unchanged.

In other words, much of the existing ASTInspector/ASTCompiler goes away, in favor of something that is more compiler-friendly - an IR scanner.

This new IR code structure will bring the JRuby JIT one step closer to what is possible in the Java JIT, where optimizations can be made by analyzing the code structure; entire branches of code (and the associated conditions) can be eliminated, loops can be optimized, etc. Of course, the options for what can be optimized in a Ruby program are often different than Java - some of the language differences could be advantageous for optimization, and some may cause serious limitations - time will tell, but I have confidence it will be interesting to see the results.

Nevertheless, the work is fairly new on this, and I’d hate to spend too much time analyzing and documenting a work in progress, but it is something to revisit in a few weeks/months. At a minimum, I suggest following the JRuby committer blogs (for a start: Charlie Nutter, Tom Enebo, Nick Sieger).

Conclusion

This article was by far the longest yet, coming in well over 6000 words (and that means if you’ve read this far you are a patient, patient individual). Short of making this a “two-parter”, I’m not sure how I could have shortened it and truly hit the hotspots of the compilation process in Ruby; however I do think my next exploration might be a touch more ‘focused’.

I haven’t entirely decided yet what’s on the chopping block next for these JRuby internals articles, but I have a couple ideas; and of course, I’m always taking input at My Contact Page.

Stay tuned.

Distilling JRuby: Tracking Scope

/img/articles/jruby/logo.png

One of the things that is always going on in any programming language is managing the scope of variables. Scope is central to both how we code, as well as how a program executes. Even just in methods, how variables are scoped can be a point of great contention (particularly in code reviews).

When it comes to implementing a programming language like JRuby, the concept of a scope permeates everything. After all, someone needs to track what variables are available at each level in the activation stack, and when the stack unwinds (either through normal use, or due to an exception), someone needs to unbind the variables in scope with it.

And, of course, Ruby has closures, which means that you have to carry (aka capture) free variables into scope of the closure.But, Ruby also has instance and class eval’ed code blocks. Those have an entirely different scope. When you get down to it, Ruby is bound to test a scope algorithm’s limits.

So how does JRuby do it?

In my previous Distilling JRuby article I briefly mentioned a class called ‘DynamicScope’. I also hinted at another object called ‘StaticScope’. Both play a central role in handling this difficult problem.

The Static Scope

Static scoping in JRuby is all about the variable access as seen by the parser. From a class-hierarchy standpoint, it’s fairly simple, as Figure 1 indicates.

Figure 1: Note the package `org.jruby.parser`

Figure 1: Note the package `org.jruby.parser`

We’ll get into the differences of these types momentarily. As the parser is traversing the language syntax, it creates a stack of static scope objects. The majority of the parser code is generated, so it’s not particularly beneficial for me to provide examples from the code on how it does this, but at a high-level calls are made at certain points in the parse routine to the methods ParserSupport.pushBlockScope, ParserSupport.pushLocalScope, and ParserSupport.popCurrentScope. These, as the method names indicate, create new scope objects as children of the current scope. Once created, they then subsequently become the current scope, and the old current scope is now their ‘enclosing’ scope.

When certain nodes of the AST tree are created (block and method nodes most notably), they are handed a StaticScope object representing their scope in the tree (from the parser, via ParserSupport.getCurrentScope()), which they will later use to execute. For example, when a method definition is hit by the parser, it is going to:

  • Push a new local static scope on to the stack
  • Create a MethodDefnNode object, passing the current static scope in.
  • Pop the local static scope from the stack

This method will effectively have a dedicated scope object associated with it; however there is only one StaticScope object for that method definition, no matter how many times it is called, hence the term static. The static variants of the scope are really templates defining what the parser has determined is available in that particular scope.

This method, when it is interpreted (as we saw in the previous article), will then create a new DynamicMethod object, which usually holds on to a copy of the static scope for the method. Later, I will discuss how it will use that static scope when it is invoked.

Blocks, like methods, will go through a similar construction phase, however blocks are given a BlockStaticScope object (as the name suggests). This scope is aware of the special scoping rules that Ruby blocks have.

Handling Variables

As the parser is working, it will also hit variable assignments and declarations. In these cases, the previously-created scope (whether for a block, or a method) is consulted. This allows the scope to learn about its internal elements (recording variables), and also allows the scope (which can be one of a few types) to help translate the token into an appropriate AST node (different nodes interpret different ways depending on the type of code being run).

Variable declaration is an interesting problem in Ruby implementations. In many languages (particularly compiled, strong-typed languages), when a variable declaration is hit, there is generally some sort of identifier: in Java there is always a type declaration; similarly in Scala, there is the val and var keywords. Admittedly, these languages use those keywords for a purpose (typing and mutability respectively), however it also gives their respective compilers something to hold onto as the first declaration of the variable. Any references to that variable prior to that point are invalid.

Ruby, on the other hand (like many dynamic/scripting languages), doesn’t have this sort of handle; the language itself doesn’t have the concept of declaration. Instead, the first time you assign a variable is, in fact, when you declare it. So, in short, when JRuby finds an assignment, it could also be a declaration. The parser needs the scope to help it make this determination.

On the other hand, even though declaration is implicit in Ruby, the terminology “declare” does show up in the JRuby source - in this case, what is meant by “declare” is really when a variable is referenced (such as when used as the right hand side of an assignment).

Understanding the Parser

When trying to understand the scope algorithm, a bad handicap is not understanding the node hierarchy created by the parser; after all the node hierarchy in JRuby is what drives the scoping routine (and for that matter, the invocation of the program). So what does the parser actually see when given any particular program? For example, let’s try this very simple Ruby script that juggles some basic variables:

1a = 3
2b = a

Thankfully, because JRuby is easily accessible from Java, we can write a quick Java program to spit out the node hierarchy as seen by the JRuby parser. Note that (and this may be an obvious comment) this only parses the top-level script; any method calls to other libraries are not traversed.

 1import java.util.List;
 2
 3import org.jruby.Ruby;
 4import org.jruby.ast.Node;
 5
 6public class NodeEmitter {
 7
 8    public static void main(String[] args) {
 9        Node n = Ruby.getGlobalRuntime().parseFromMain(ParserInterpreter.class.getClassLoader().getResourceAsStream("test.rb"), "test.rb");
10        printNode(n, 0);
11    }
12
13    private static void printNode(Node n, int depth) {
14        for(int i=0; i<depth; i++) {
15            System.out.print("\t");
16        }
17        System.out.println(n.getNodeType().toString() + " pos: " + n.getPosition());
18        List<Node> children = n.childNodes();
19        for(Node child : children) {
20            printNode(child, depth+1);
21        }
22    }
23}

(There are much more ‘creative’ ways to instrument the running system to get information as it runs, but for the purposes of this analysis, this gives us a lot of valuable info.)

If we spit out the node hierarchy of our Ruby script using this tool, we’ll get this in the console:

ROOTNODE pos: test.rb:0
	BLOCKNODE pos: test.rb:0
		NEWLINENODE pos: test.rb:0
			LOCALASGNNODE pos: test.rb:0
				FIXNUMNODE pos: test.rb:0
		NEWLINENODE pos: test.rb:1
			LOCALASGNNODE pos: test.rb:1
				LOCALVARNODE pos: test.rb:1

So, we can see that we have a LocalAsgnNode, representing the assignment of the following (child) FixNumNode. Then, on the next line, we can then see that we have yet-another LocalAsgnNode, but this time it has been given a LocalVarNode object as it’s assignment value. When we discuss the Dynamic Scope below, we’ll explore how these nodes actually work, but suffice it to say that when these nodes interpret, they know how to interact with their particular scope, getting and setting values, and the assignment node will ask the value node to give it the value it represents.

Context-Aware Nodes

This is all well and good, however I glossed over how these nodes managed to show up in the tree, pre-wired to their appropriate scope. As mentioned previously, the parser consults with the StaticScope objects whenever it runs into variable assignment and references.

When the parser hits a simple ‘declaration’ (think variable reference), it makes a call into the current StaticScope object’s StaticScope.declare(ISourcePosition position,String name) method.

Similarly, when the parser hits an assignment, it tells the ParserSupport class via a call to the ParserSupport.assignable(Token,Node) method. The token represents the left-hand side of the assignment, and the passed-in node is the right-hand side of the assignment (either our FixNum or our Local variable constructed via the declaration routine we just discussed). The token is analyzed by the support class to figure out what type of variable reference it is (class variables, instance variables, globals, etc). In the case that the variable is just a standard/local identifier, the ParserSupport will ask the current static scope to help construct the node, by calling StaticScope.assign(ISourcePosition,String,Node).

This is where the two primary scope variants (block and local) diverge; the primary difference being the special scoping rules that Ruby blocks are afforded. A Ruby block is a lazy “capture” of the parent scope, yet it has its own scope as well.

Let’s walk through the local scenario first, as its the less complicated of the two. Here is what declaring a variable looks like in the local-scope scenario:

1public Node declare(ISourcePosition position, String name, int depth) {
2    int slot = exists(name);
3    if (slot >= 0) {
4        // mark as captured if from containing scope
5        if (depth > 0) capture(slot);
6        return new LocalVarNode(position, ((depth << 16) | slot), name);
7    }
8    return new VCallNode(position, name);
9}

Note that in the local case, it attempts to look up the variable by name, receiving something called a ‘slot’. If it can’t find a variable in the declaration stack, it ultimately assumes that the usage of the variable in this case must be a ‘call node’ - which in turn translates to a method call (which I have already [discussed in a previous article](«{ ref="/post/jruby/distilling-jruby-method-dispatching-101.md >}})). Ruby will then try to find that method at runtime (remember, methods, unlike local variables, can be added and removed at runtime freely).

So this is where our LocalVarNode comes from, and how it knows its position in the static scope.

1public Node declare(ISourcePosition position, String name, int depth) {
2    int slot = exists(name);
3    if (slot >= 0) {
4        // mark as captured if from containing scope
5        if (depth > 0) capture(slot);
6        return new DVarNode(position, ((depth << 16) | slot), name);
7    }
8    return enclosingScope.declare(position, name, depth + 1);
9}

The block variant is very similar, however is constructs a DVarNode instead of a LocalVarNode, and, unlike the local scenario, if it can’t find the variable in the declaration stack, it will ask the enclosing scope to figure it out, passing in an incremented-depth. This is how the block effectively tells the parent stack that it’s using a variable. As you can see, in both methods, if the depth is greater than zero, the scope internally marks that variable in that slot as “captured”; now the parent scope knows that a child scope is using one of its variables.

The ‘assign’ routines have similar parity as the ‘declare’ routines, however in that particular case, the local assign must also handle the concept of being the “top” (or root) scope of the program:

The Local Scope ‘Assign’ Routine:

 1public AssignableNode assign(ISourcePosition position, String name, Node value, StaticScope topScope, int depth) {
 2    int slot = exists(name);
 3    if (slot >= 0) {
 4        // mark as captured if from containing scope
 5        if (depth > 0) capture(slot);
 6        return new LocalAsgnNode(position, name, ((depth << 16) | slot), value);
 7    } else if (topScope == this) {
 8        slot = addVariable(name);
 9        return new LocalAsgnNode(position, name, slot , value);
10    }
11    // We know this is a block scope because a local scope cannot be within a local scope
12    // If topScope was itself it would have created a LocalAsgnNode above.
13    return ((BlockStaticScope) topScope).addAssign(position, name, value);
14}

The Block Scope ‘Assign’ Routine:

1protected AssignableNode assign(ISourcePosition position, String name, Node value, StaticScope topScope, int depth) {
2    int slot = exists(name);
3    if (slot >= 0) {
4        // mark as captured if from containing scope
5        if (depth > 0) capture(slot);
6        return new DAsgnNode(position, name, ((depth << 16) | slot), value);
7    }
8    return enclosingScope.assign(position, name, value, topScope, depth + 1);
9}

The top scope algorithm handles the auto-declaration discussed above. If the current static scope is the top scope (such as a method, or the top of your program), then it can capture the variable for the first time (hence the ‘addVariable’ logic); if it is not, then it will ask the real top-scope (a block) to do so. From there, we can see that this is where our LocalAsgnNode comes from, and conversely, where the DAsgnNode comes from.

The handling of blocks at runtime to complement this static scoping will be discussed more below, when we get into runtime scoping rules.

Variable “Slots”

The code above has this concept of a ‘slot’. All variables have a calculated identifier. The JRuby code describes the routine for calculating the slot variable this way:

High 16 bits is how many scopes down and low 16 bits is what index in the right scope to set the value.

In other words, variables are referenced via an integer representing a pair of values - the first portion of the number represents the scope-depth, and the second portion represents the array-index of the variable.

Both the static scope and the dynamic scope use arrays internally to hold variable information, and so when a particular piece of information for a variable is required, the correct scope is found using the high 16 bits to get the depth, and once that scope is found, the lower 16 bits are used to find the right index in the array.

This slot variable is very important as it is calculated by the parser initially, and then later will be used during the runtime to find the variable again in the scope. Obviously, serializing the local variable reference into a slot value is much cheaper than looking for the variable by name (which has to do string equality checks, rather than integer lookups in an array).

On Control Structures

One other thing that confused me because of my Java background: control structures in Ruby (if, while, etc) do not have a distinct scope. That means this code is legal, and will print out “9”:

1if some_condition
2    my_var = 9
3end
4
5puts "# {my_var}"

Obviously in Java, that isn’t possible; the variable goes out of scope with the “if” block.

I spent a good while analyzing the two scope implementations trying to figure out how an “if” block could fit in the picture before this occurred to me. These control blocks don’t behave like a method (where they have no parent scope), however I saw variable access and manipulation in an “if” block using LocalVarNodes and LocalAsgnNodes, which was indicating to me (erroneously) that if-blocks were using a LocalStaticScope.

The reason for this, of course, is because the “if” block just shares the scope of the parent (be it a local or a block scope), and my particular example was in a method (which is local), so I was simply being mis-lead due to my Java-ish past.

Dynamic Scoping

DynamicScope is to the running system what StaticScope is to the parser: the DynamicScope handles the variable values, while the StaticScope tracks the variable names. In the running system, the DynamicScope and StaticScope are paired - every DynamicScope has a StaticScope peer that it consults with to figure out things about the current location. Note that the dynamic scope, unlike the static scope, is instanced out on a per-call basis. Where there will be a single StaticScope for every method, there will be a single DynamicScope for every call to a method.

Most of the runtime (including our previous LocalAsgnNode and LocalVarNode examples) use the DynamicScope object to interact with the system, and in turn, the DynamicScope makes heavy use of the StaticScope to assist with tracking variables.

As a thread is invoking a method, for example, it constructs a DynamicScope and pairs it with the static scope for that method. That scope will then be used for the lifecycle of that method execution.

Here is a description of DynamicScope from the code itself:

Represents the the dynamic portion of scoping information. The variableValues are the values of assigned local or block variables. The staticScope identifies which sort of scope this is (block or local).

Properties of Dynamic Scopes:

  1. static and dynamic scopes have the same number of names to values
  2. size of variables (and thus names) is determined during parsing. So those structures do not need to change

The DynamicScope class is an abstract parent to a variety of implementations (as seen in Figure 2).

Figure 2: Arity-Based Scopes

Figure 2: Arity-Based Scopes

(Side Note: This trend of having specialized NoArg, OneArg, TwoArg (etc.) objects shows up frequently in JRuby (usually with the word “arity” associated with it); the idea being that the ‘ManyArgs’ variant will inherently be slower and more expensive to work with, as it’s collection-based. These counted-argument implementations are bound to be a little more painful to maintain, but also provide a boost in performance for the majority of cases).

Using the Scope

The DynamicScope class has a variety of methods on it - two of the most central to the operation of the language are #getValue(int offset, int depth), and #setValue(int offset, IRubyObject val, int depth). These two (and a ton of variants) are where the real values are actually passed to and from the activation stack of the program. As previously discussed, all variables carry an identifer that has a depth (the high-16) and an index (the low-16). The arguments to these methods (offset and depth) are those numbers.

So now, if we go back to our LocalVarNode, we can look at the interpret method of that node, and see that it does this:

1@Override
2public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
3    IRubyObject result = context.getCurrentScope().getValue(getIndex(), getDepth());
4    return result == null ? runtime.getNil() : result;
5}

As you can see, it simply returns the value from the current dynamic scope (as determined by the ThreadContext, which we’ll discuss shortly). You’ll see it uses ‘getIndex()’ and ‘getDepth()’ methods - here those are, using the mathematical bit-masking as discussed previously:

 1/**
 2 * How many scopes should we burrow down to until we need to set the block variable value.
 3 *
 4 * @return 0 for current scope, 1 for one down, ...
 5 */
 6public int getDepth() {
 7    return location >> 16;
 8}
 9
10/**
11 * Gets the index within the scope construct that actually holds the eval'd value
12 * of this local variable
13 *
14 * @return Returns an int offset into storage structure
15 */
16public int getIndex() {
17    return location & 0xffff;
18}

Meanwhile, if we go look at the LocalAsgnNode, we’ll see it works the other way:

 1@Override
 2public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
 3    // ignore compiler pragmas
 4    if (location == 0xFFFFFFFF) return runtime.getNil();
 5
 6    return context.getCurrentScope().setValue(
 7        getIndex(),
 8        getValueNode().interpret(runtime,context, self, aBlock),
 9        getDepth()
10    );
11}

The AsgnNode is asking the value node to interpret (which is either our FixNumNode, or in the second case, our LocalVarNode, as seen above). So in our variable-to-variable assignment, here is what is happening:

  • Assignment-Node (representing ‘b’) asks Value-Node for its value.
  • Value-Node (representing ‘a’) asks the current dynamic scope for its value, and returns it.
  • Assignment-Node takes value and asks dynamic scope to store it for ‘b’.

That, in a nutshell, is how values are traded between scopes, and (conceptually at least) the API for using the scope is quite simple: get the current scope, and call get/set on it. Internally, the scope has all the information it needs to recursively find the correct variables to retrieve or update (whichever the case may be). “getValue” is quite straightforward at this point (this is the general purpose, ManyVars version):

1public IRubyObject getValue(int offset, int depth) {
2    if (depth > 0) {
3        return parent.getValue(offset, depth - 1);
4    }
5    assertGetValue(offset, depth);
6    return variableValues[offset];
7}

Setting is just as simple:

1public IRubyObject setValue(int offset, IRubyObject value, int depth) {
2    if (depth > 0) {
3        assertParent();
4        return parent.setValue(offset, value, depth - 1);
5    } else {
6        assertSetValue(offset, value);
7        return setValueDepthZero(value, offset);
8    }
9}

The Thread Context

Up to this point, I have largely glossed over how the dynamic scoping gets it parentage. For static scope, we discussed that they were constructed by the parser as it scanned the program. A central component of the JRuby runtime (very central, actually) is the ThreadContext object. This is a ThreadLocal-like object (internally it uses a specialized RubyThread->SoftReference map, but the concept is roughly the same). As the runtime is executing, and is doing things like dispatching to methods, it will always consult with the ThreadContext to push and pop the dynamic scope for the current executing thread so it’s ready for the various nodes to use.

There are a number of context-specific methods on the ThreadContext - they typically start with “pre” and “post”, and are meant to be called in pairs. Let’s walk through the two most interesting scenarios, methods and blocks.

Method Dispatching and Scope

Method invocations are always gated by calls to the correct “pre-” and “post-” methods on the ThreadContext.

1ThreadContext.preMethod(...)
2try {
3    this.call(...)
4}
5finally {
6    ThreadContext.postMethod(...)
7}

In practice, it is more complicated than that, but it’s largely so that corner cases (like backtraces, and special optimizations) can be handled. Going back to the previous method-dispatching lesson, there are two methods on InterpretedMethod: “pre”, and “post” which are called as part of their ‘call’ routine (the one that interprets the AST node), and these methods in turn work with the ThreadContext pre/post methods to setup and tear-down the context as appropriate.

Here is what the method scope construction looks like on ThreadContext:

1public void preMethodFrameAndScope(RubyModule clazz, String name, IRubyObject self, Block block, StaticScope staticScope) {
2    RubyModule implementationClass = staticScope.getModule();
3    pushCallFrame(clazz, name, self, block);
4    pushScope(DynamicScope.newDynamicScope(staticScope));
5    pushRubyClass(implementationClass);
6}

As you can see, in this case (among some other stuff) the dynamic scope is being created to pair with the method’s static scope, and this dynamic scope has no parent (as no local variables are visible outside of the method).

Block Invocation and Variable Capturing

One of the ‘killer features’ of modern languages is closures. If you don’t have closures, you just aren’t cool (I’m looking at you, Java). One of the oddities about closures is their ability to ‘capture’ free variables from a parent scope for their own use – of course, you have to keep in mind that the closure is going to go about its business, and may not be called for a long time after the method/parent in which it was declared has gone out of scope and left the building.

It turns out that since the parser already assigned the variables in the scope the appropriate ‘slots’, the dynamic scope, when it is being told to set a variable value or get a variable value, already knows which parent to look at; it can traverse directly to the correct depth in the parentage, and can get or set the correct index.

In other words, all of the hard work done by the parser to create the Local/Block hierarchy has effectively created an easy-to-track hierarchy for the DynamicScope. However, for that pass-through to work, the DynamicScope representing the parent method at the block’s “instantiation” point needs to be captured, and saved.

This is done by the various block implementations when they are constructed. Blocks are implemented much like methods - there is a org.jruby.runtime.BlockBody class which several real implementations extend; it’s not directly relevant enough to discuss here, but nevertheless, each implementation has a point during the constructor/initialization in which it will create a “Binding” object, which is done via a call to one of the variations of ThreadContext.currentBinding(…).

This binding object snapshots the call-frame at that point in time (along with the dynamic scope). Then, later, when the block is invoked (which an example of can be seen in InterpretedBlock.yield), it will in turn call ThreadContext.preYield*Block(…), all of which deal with the concept of invoking a block from a different context. The preYieldSpecificBlock method is relevant to this particular example. Here is what it looks like:

1public Frame preYieldSpecificBlock(Binding binding, StaticScope scope, RubyModule klass) {
2    Frame lastFrame = preYieldNoScope(binding, klass);
3    // new scope for this invocation of the block, based on parent scope
4    pushScope(DynamicScope.newDynamicScope(scope, binding.getDynamicScope()));
5    return lastFrame;
6}

As you can see, this is similar to the method ‘pre-’ variant, except this one does consider the binding’s dynamic scope as a parent when constructing the scope for the block invocation. This allows the previously-bound scope to be used during the invocation; any standing variables in that scope will be accessible.

Call Stack

So far, most of the conversation has been centered around how the parser and runtime work together to ensure that the correct variables are visible and editable at the correct times. The other important task is the process of managing the call stack. This, along with creating the scope objects, is done by the ThreadContext. When a call to the thread context is made to ‘prepare’ for a method or block yield, it constructs a DynamicScope object. As the two method examples above show, however, it also puts that DynamicScope into stack-like structure on the ThreadContext (via pushScope).

This structure represents the call stack of the thread, and as method calls are made, it is manipulated via push/pop calls. In this process, JRuby seamlessly keeps the correct variables available for each method/block in the call hierarchy, and calls to ThreadContext#currentScope will always return the appropriate scope for that point in time (recall that LocalVarNode and LocalAsgnNode were given the thread context object in their ‘interpret’ call).

All Together Now

Figure 3: Call Stack Example

Figure 3: Call Stack Example

Figure 3 shows an example where current call on the stack is represented by a block’s DynamicScope (representing by the first blue node), and it’s holding a reference to another DynamicScope, which is, in this case, the scope for the method it was instantiated in (it could also alternatively be another block, and in that case this tree would recurse deeper).

The red nodes represent the corresponding static-scope objects for the block and method respectively.

Interestingly, the parent DynamicScope referenced by a block may be frozen somewhere higher up in the call stack waiting for the block to complete (such as a block passed into a ‘each’ method on a collection), it could have been popped from the call stack long ago (such as when a block is used as a listener or callback on some other object), or it could even be somewhere on the call stack of another thread (which is typically seen when blocks are used as atomic units of work in a multi-threaded program).

Conclusion

Scoping in Ruby is less rigid than many languages, so in some ways it makes this implementation less complicated, and in other ways it makes it more complicated. JRuby’s implementation of the scoping is an interesting lesson, and having gone through it, I feel like I better understand what is expected of the Ruby platform/“specification” when it comes to scoping.

Stay tuned - more to come!

Distilling JRuby: Method Dispatching 101

/img/articles/jruby/logo.png

To better understand how JRuby combines the Java world with the Ruby world, I have recently been delving into the source code (available via git), and while the implementation there-in is always bound to evolve and change, it seemed that there would probably be some value in me documenting my journey through the guts of JRuby.

JRuby is a huge beast, so it can be hard to find a place to start - nonetheless, as the joke goes you eat an elephant one bite at a time, so I figured I’d start somewhere at least somewhat familiar.

One of the first areas I picked up was the method dispatch code. This is an area I have seen discussed through a number of blog entries by various JRuby committers; and given the criticality of the code in this area, I knew it was probably a fairly mainstream section of functionality; heavily used by a running JRuby application.

Unfortunately, it is also right in the middle of the implementation, so it’s a bit like starting to eat the elephant right in the middle. Nonetheless, I have made my way through a good bit of it, and learned a lot in the process, so let’s get started.

Disclaimer: I am admittedly an amateur when it comes to the JRuby code. Nothing I say on here should be considered JRuby gospel; consider it a diary of my understanding of the code, and a good starting point for your analysis should you want to make one. I welcome any constructive input on where I may have gone off the reservation.

First, the concept: Any Ruby implementation has to take the code written by a developer, parse it, and then translate it into a series of execution steps. Effectively every method written in Ruby is going to call other methods (even the venerable puts 'Hello World' requires the ‘puts’ method). Therefore, it’s critically important for a Ruby implementation to be able to dispatch invocations of methods by a developer to the appropriate method implementations. So, the real question at the center of all of this is after JRuby parses your code and sees that you want to call the method ‘puts’, how does it know:

  • Where to find ‘puts’?
  • How to call ‘puts’?
  • How to give you the result from ‘puts’?

Method Handles

One of the primary classes in the middle of all of this framework is org.jruby.internal.runtime.methods.DynamicMethod. I’ll quote the Javadoc (as of JRuby 1.3.x):

DynamicMethod represents a method handle in JRuby, to provide both entry points into AST and bytecode interpreters, but also to provide handles to JIT-compiled and hand-implemented Java methods. All methods invokable from Ruby code are referenced by method handles, either directly or through delegation or callback mechanisms.

The DynamicMethod class, as the documentation suggests, is the primary handle JRuby uses to reference another block of code. It turns out that DynamicMethod is the abstract parent class of several method implementations:

Arrows indicate some subclasses of interest

Arrows indicate some subclasses of interest

What we can determine from this is that as JRuby is interpreting your code it is collecting, and in turn invoking, DynamicMethod objects representing the various calls being made. Distilling the various implementations, we can see that each one is tailored to a specific type of call:

InterpretedMethod - Has a handle on an AST node representing the code of that method. Eventually asks the AST node to interpret itself. JittedMethod - Counterpart to the interpreted method. JRuby has an innovative JIT compiler that translates Ruby code into a series of Java bytecode instructions. This bytecode is then stored and executed via a loaded Java class. Several blog entries could be spent explaining the internals of this one. DefaultMethod - Special “shell-game” method that can either interpret or use a JIT’d method as returned by the JITCompiler. Internally manages the handle to one of the two methods for clients, telling them which reference to cache. Since the JIT operates on a threshold, this method may interpret on the first several calls before swapping out to a JIT’d variety.

  • JavaMethod - Top-level class for any method handle that makes a call into Java. The majority of the core Ruby libraries are implemented in Java in JRuby (just as they are largely implemented in C for Ruby). There are a number of specialized Java method handles. The most straightforward general-purpose implementation is ReflectedJavaMethod, which simply makes a reflected call into the Java counterpart, however it’s not going to be the most commonly used now-a-days, as we’ll see.

At a high-level, these method handles provide a fairly good abstraction of how one block of code talks to another. But how does it know which type of method handle to use when? They have to come from somewhere. It’s simplest to think of the steps taken by JRuby in interpreted mode, as that is how all methods generally start, and certainly how the top level of your script will be invoked.

Chaining Method Invocations

First - all scripts/methods/etc are translated into an AST of org.jruby.ast.Node objects. All Node objects have an ‘interpret’ method call, the concept being that as the AST is being traversed, the various nodes provide the actual interpret algorithm (a fairly standard approach to implementing an interpreted language). Without spending a lot of time delving into the actual node implementations, you can see kind of how this works by looking at a relatively simple node: the AndNode (representing &&). As of 1.3.X, here is the interpret method of AndNode:

1@Override
2public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
3    IRubyObject result = firstNode.interpret(runtime, context, self, aBlock);
4    if (!result.isTrue()) return result;
5    return secondNode.interpret(runtime, context, self, aBlock);
6}

As you can see, it simply interprets the two sibling nodes looking for true - you can also see how they implement the expected short-circuit behaivor on the first node.

Anyway, back to method calling. In this hierarchy of nodes, method calls are generally represented by some variant of the org.jruby.ast.*CallNode object (there are a variety of implementations and subclasses depending on the type of receiver and number of arguments). CallNode objects internally use an instance of org.jruby.runtime.CallSite to invoke all methods. These CallSite objects provide an abstraction of the point of invocation of a method. They perform caching to improve performance for the interpreter, and represent different types of calls depending on the case (there are CallSite hot-wirings to handle >, <, =, and so forth when running in -fast mode for example).

For the most part, however, the method lookup proceeds from the CallSite through what is called the ‘receiver Node’, or the AST node representing the receiver of the method call. As it is running, the parser constructs a node for all variable references in the code (LocalVarNode, InstVarNode, DVarNode, GlobalVarNode, etc). These variable references know how to look themselves up in the current scope. JRuby tracks the current call stack via the object org.jruby.runtime.DynamicScope, which, while being a critical component of this process, for this discussion I will simply hand-wave as something that tracks the currently available variables; how it does so can be analyzed deeper in the future. (Additionally it should be noted, the parser, which is invoked at runtime, uses the current static scope to determine what type of variable node should be created).

As an example, here is the interpret implementation of LocalVarNode which represents a local variable:

1@Override
2public IRubyObject interpret(Ruby runtime, ThreadContext context, IRubyObject self, Block aBlock) {
3    IRubyObject result = context.getCurrentScope().getValue(getIndex(), getDepth());
4    return result == null ? runtime.getNil() : result;
5}

As you can see it simply asks the current scope (the DynamicScope object at the moment) to find the object given the index and depth of the variable node. Assuming there is no error in the Java code, the lookup on the scope for the variable node will come back with an IRubyObject representing the variable. This can then be used to find the appropriate RubyClass, which can then in turn be used to find a method that matches the signature being called.

This is an important point - we are looking up the corresponding method at runtime by analyzing the object held in the variable - there is no compile-time binding to the method. The magical JIT-ing done by the JRuby committers will bring this a lot closer to a compile-time binding (as will invokedynamic), however all of those features really just hot-wire a runtime-discovered binding. In short, Ruby (without any fancy type-assistance like that implemented Duby or Surinx) will always runtime-bind variable types, which is both good (for capabilities) and bad (for relative performance limitations).

Anywho – way down in the bowels of CallSite, there is a call against the RubyClass for the object that is against the method RubyModule#searchMethodInner. This method looks like this:

1protected DynamicMethod searchMethodInner(String name) {
2    DynamicMethod method = getMethods().get(name);
3    if (method != null) return method;
4    return superClass == null ? null : superClass.searchMethodInner(name);
5}

As you can see, this is a basic recursive algorithm that looks in what is effectively a hash-map of methods for a DynamicMethod handle. So we’ve now made it all the way to our method handle.

Loading Classes and Methods

Even though we have traced the invocation process to where the method handles are sourced (the owning class), what we haven’t seen yet, is how the methods are actually loaded into the RubyClass. Where did that magic hashmap of methods come from? As with everything in JRuby - it depends.

Loading Ruby Classes

If the class being loaded is implemented in Ruby, the load process is much like the method dispatch process. The various AST nodes work together to load the class into the runtime. Ignoring the majority of the class-load semantics (they are interesting, but not particularly relevant here), we can jump straight to the abstract MethodDefNode, which has two child methods: DefnNode and DefsNode - Defn represents standard method definitions, and Defs represents singleton methods.

The standard methods eventually boil down to this sequence of events:

1DynamicMethod newMethod = DynamicMethodFactory.newInterpretedMethod(
2runtime, containingClass, name, scope, body, argsNode,
3visibility, getPosition());
4
5containingClass.addMethod(name, newMethod);

In other words, JRuby is creating a new ‘interpreted’ method (which, as discussed previously, will normally be a DefaultMethod object), that has the body, args, and other meta-information available to it for if/when it is invoked.

The code then adds the method to the RubyClass magic-map so that later it will be returned when the CallSite asks for it via ‘getMethods()’.

All things considered, fairly simple.

Loading Java-Backed Classes

The story behind Java-based classes is a little more tricky - for one thing, it depends on if we’re dealing with a core library, or a user-provided Java object wrapped by the Java integration support.

As I mentioned previously, several of the core libraries in JRuby are implemented in Java. This is done both for performance reasons, and for the simple fact that some of the core libraries (Kernel for example) cannot be implemented in Ruby as they are needed for all other Ruby objects to function.

In any case, during JRuby startup, the method Ruby#initCore() is called. This in turn, loads a number of Java peers. Some examples include:

  • org.jruby.RubyKernel
  • org.jruby.RubyIO
  • org.jruby.RubyString
  • org.jruby.RubyInteger

… and the list goes on. All of these have an associated RubyClass object that needs to have Java method handles in its ‘methods’ collection. What #initCore does is pre-fill these various method handles by calling special static methods on these core classes. The majority of these static initializers on the core classes wind back up to a method on RubyClass called defineAnnotatedMethods. This method uses an API called TypePopulator, which exists solely to bind Java methods to Ruby classes.

But how does it do it?

In all cases, these special Ruby-library-implementing Java classes must use a special suite of Java annotations to mark what methods on their corresponding Ruby class they are providing. Here is RubyKernel.puts for a concrete example:

1@JRubyMethod(name = "puts", rest = true, module = true, visibility = PRIVATE)
2public static IRubyObject puts(ThreadContext context, IRubyObject recv, IRubyObject[] args) {
3    IRubyObject defout = context.getRuntime().getGlobalVariables().get("$>");
4    return RubyIO.puts(context, defout, args);
5}

These annotations come in a variety of flavors depending on the case. For example, if we were to look at RubyString.chop, you’ll see there are actually two implementations: one to support Ruby 1.8 and one to support Ruby 1.9 (as it was bug-fixed/altered in 1.9 to support string encodings):

1@JRubyMethod(name = "chop", compat = CompatVersion.RUBY1_8)
2public IRubyObject chop(ThreadContext context) {
3    if (value.realSize == 0) return newEmptyString(context.getRuntime(), getMetaClass()).infectBy(this);
4    return makeShared(context.getRuntime(), 0, choppedLength());
5}
1@JRubyMethod(name = "chop", compat = CompatVersion.RUBY1_9)
2public IRubyObject chop19(ThreadContext context) {
3    Ruby runtime = context.getRuntime();
4    if (value.realSize == 0) return newEmptyString(runtime, getMetaClass(), value.encoding).infectBy(this);
5    return makeShared19(runtime, 0, choppedLength19(runtime));
6}

As you can see, these annotations have a compatibility flag to determine which version of Ruby the implementation supports.

The TypePopulator class is meant to scan the corresponding class for these annotations, and turn them into DynamicMethod objects that can be registered on the RubyClass. There is a default (naive) implementation of TypePopulator that does this at runtime in a fairly straightforward process. However, there is also an APT build process to generate special instances of TypePopulator at compile-time that are then stored in the org.jruby.gen package. These TypePopulator implementations exist on a per-class basis, and have the Java method registrations ‘hard-coded’ in them as individual lines. This is meant to significantly improve the initial load time for the Java libraries.

The defineAnnotatedMethods method previously mentioned boils down to trying to lookup these TypePopulator objects at runtime, falling back to the default if it can’t find them:

1try {
2    String qualifiedName = "org.jruby.gen." + clazz.getCanonicalName().replace('.', '$');
3    if (DEBUG) System.out.println("looking for " + qualifiedName + "$Populator");
4    Class populatorClass = Class.forName(qualifiedName + "$Populator");
5    populator = (TypePopulator)populatorClass.newInstance();
6} catch (Throwable t) {
7    if (DEBUG) System.out.println("Could not find it, using default populator");
8    populator = TypePopulator.DEFAULT;
9}

The methods actually registered in the method collection are very different than those we registered before. Since they are fronting Java methods, they can’t be simple recursive ‘interpreted’ methods. Instead, they have to use a different mechanism. The original tie was the ReflectedJavaMethod, which would simply use reflection to call the Java peer. Some time later (1.1 JRuby I think?), Charles Nutter implemented a special Java method that compiles a ‘mini-class’ that invokes the method via compiled bytecode, which is much faster (and easier for Java to JIT) than the reflection code. This is captured as a generated subclass of CompiledMethod.

As for Java objects that are provided by the user, and are in turn handled by the Java integration support, I hate to delve too deeply in this for a few reasons:

  • Java classes, unlike Ruby, have a ton of special cases that make the code very tedious to parse.
  • The JRuby crew is working on revitalizing this code in earnest as part of the next release of JRuby, so whatever I cover here will be out-of-date very soon.

However, in concept it’s fairly simple. A RubyClass is constructed and cached for the Java class (by iterating it’s class metadata). In that Ruby class, a special method peer (of one of the above types) is constructed that binds to each corresponding Java method. Note that the Java integration jumps through some hoops to provide Ruby-syntax-ish method names, which were all covered in the EngineYard blog entries.

Once that class is created and bound to the runtime, it can function like any other RubyClass.

Incidentally, that is how adding a method to a Java object is made possible; it is simply bound to the RubyClass peer (that’s also why the Java peer can’t see it).

In Closing

This was not so brief, but was about as short as I could make it and still cover all high-level components of the method dispatching in JRuby.

There is a lot more Ruby internals to touch on in the future, so stay tuned!

Taking Advantage of Java in JRuby

For those who may not be familiar, Tom Enebo and Charles Nutter were full-time Sun employees until late July when they switched companies to Engine Yard. As many others have said, it’s a different experience (and in my opinion, a very good one) seeing JRuby posts show up on the EngineYard blog.

You have to make a decision at some point when developing a JRuby application as to whether or not you are going to hide the ‘Java’ in your JRuby application from the main bulk of your code. Obviously, if you are developing a Swing GUI, it’s pretty much a foregone conclusion that it’s a JRuby app, and JRuby alone. On the other hand, it’s very possible now-a-days to code a Rails app that can run concurrently on JRuby, MRI, and any other compliant platform.

No answer is ‘right’. Each approach has its own virtues, and it really depends on your goals. A Java developer who simply wants to make coding Java apps easier could easily switch to JRuby and sprinkle Java references throughout their app. On the flip-side, you may just choose to deploy an existing Ruby app on JRuby for it’s compelling performance characteristics and its alternative deployment/scalability options; or perhaps your company would rather manage a Glassfish cluster than a Mongrel cluster - there are a variety of possible reasons.

Tom’s article (part 2) delves a little deeper into the influence of Java from an API perspective (as opposed to the basic Java-integration language constructs). His comments on delegation are compelling:

Delegation as a concept is not specific to Ruby, but is worth bringing up. Why decorate a Java class and get all of that Java class’s methods exposed when you can hide them behind another Ruby class that only exposes what you want? This is especially powerful when you want to write a common API that works across multiple Ruby implementations (e.g. JRuby, MRI, and Rubinius), but has different native backends.

In practice, even if you have committed to JRuby as your platform and have Java libraries referenced all over the place, it can still be very valuable to abstract APIs around the original Java implementation - if for no other reason than Java APIs needing some TLC to feel natural in Ruby. Likewise, there are cases where using Java libraries as the driver for a particular component of your application may give you a competitive advantage on JRuby.

In that vein, I was recently working on a high-concurrency component of a JRuby application. In my opinion, the Java libraries for concurrency are much further evolved than those in Ruby; particularly after the advent of java.util.concurrent in Java 5. One of those areas where Ruby is lacking is the existence of a read-write lock. Jonah Burke previously blogged about implementing a read-write lock using the standard Ruby mutex objects. His implementation is simple, well-implemented, and should be quite reliable. However, Java’s ReentrantReadWriteLock is fully integrated into the underlying Java platform, including in the instrumentation libraries (detailed briefly here). To me, if you plan to run on a Java platform, being able to independently monitor locks held by the Ruby runtime is a Good Thing.

Here is a simple JRuby implementation of a ReadWriteLock that uses a Java lock via delegation:

 1require 'java'
 2
 3class ReadWriteLock
 4    include_package 'java.util.concurrent.locks'
 5    def initialize
 6        @java_lock = ReentrantReadWriteLock.new
 7    end
 8
 9    def read(&block)
10        read_lock = @java_lock.read_lock
11        read_lock.lock
12        begin
13            block.call
14        ensure
15            read_lock.unlock
16        end
17    end
18
19    def write(&block)
20        write_lock = @java_lock.write_lock
21        write_lock.lock
22        begin
23            block.call
24        ensure
25            write_lock.unlock
26        end
27    end
28end

As you can see, JRuby makes using Java objects and extending them very straightforward.

This particular implementation is not 100% API-compatible with the one provided by Jonah, however with a little manipulation it could be easily (I’ll leave it as an exercise to the reader). This is really more of an example of how Java can provide the underlying engine for Ruby libraries when running on JRuby (this is how the majority of JRuby is implemented, after all) - and if this were API compatible with Jonah’s, it’d simply be a matter of constructing this in an isolated factory method, and switching implementations then becomes simply a matter of flipping a flag.

Note that instead of delegation, you could also simply extend the existing Java object with more goodies. Here is an alternate implementation of these same block methods that simply appends these methods to the Java class itself:

 1require 'java'
 2
 3class java::util::concurrent::locks::ReentrantReadWriteLock
 4    def read(&block)
 5        read_lock = self.read_lock
 6        read_lock.lock
 7        begin
 8            block.call
 9        ensure
10            read_lock.unlock
11        end
12    end
13
14    def write(&block)
15        write_lock = self.write_lock
16        write_lock.lock
17        begin
18            block.call
19        ensure
20            write_lock.unlock
21        end
22    end
23end

In practice this works almost identically to the first example. This implementation has some API-leakage (the Java methods on ReentrantReadWriteLock are available to call by the client code), but also has one less object involved, as the methods are added to the Java lock class itself, rather than provided by a proxy object. Which approach you would use for your particular use-case is really dependent upon the scenario in question, and your goals for the API.

Either way for this example, the core usage of this library is identical irrespective of which implementation you choose:

 1lock = ReadWriteLock.new
 2# Alternatively, the second example would be created this way:
 3# require 'java'
 4# import java.util.concurrent.locks.ReentrantReadWriteLock
 5# lock = ReentrantReadWriteLock.new
 6
 7# Usage is identical
 8lock.read do
 9    puts 'Executing with read lock'
10end
11
12lock.write do
13    puts 'Executing with write lock'
14end
java  ruby  jruby