Kotlin Libraries: Concurrency

Today’s Kotlin library article is about the kotlin.concurrent package, and everything that adds to the platform.

Java’s concurrency package is already quite sophisticated, and rather than re-invent so many extremely delicate abstractions, the Kotlin authors focused on making the libraries better suited to the language by decorating and shortening various features.

[Read More]

Kotlin Libraries: I/O operations

This is the first in a series of articles that will quickly go over some of the more interesting bits of the Kotlin standard libraries for Java. Today’s run-through is about some of the more interesting affordances for I/O based programming.

[Read More]

Is There Curry In This Dish?

Now that Java is dipping its toes into the waters of supporting functional programming styles, many of the tenets of functional programming come to mind, and it’s worth exploring what is possible in this brave new world. Two of those features are currying and partial application. So the question is, can Java be functional?

Currying and Partial Application - A Refresher

Currying (named after Haskell Curry) is the process of generating function invocations with multiple parameters via single-parameter functions that return follow-up functions. In a language that is purely functional, currying is the only way to handle multiple parameters, as a pure function only accepts one parameter, and always returns one value.

Most languages in practice have convenience support for multiple parameters (sometimes implemented via currying, like Haskell), or tuples, or both, so currying isn’t a requirement, but it is a very powerful tool.

As a concrete example, we can use a language most folks understand that has some functional bits: Javascript (you may not love it, but you can probably read it!)

Javascript does not require curried functions, and doesn’t have language-level knowledge of currying, but can be forced to do it since functions are first-class (it’s just not nearly as pretty or “always-on” as in Haskell, for example). In Javascript we can have this function:

1function add(a, b) {
2	return a + b;
3}
4
5add(1,2); // returns 3.

We can revisit this implementation as a curried version instead, and it could look like this:

1function add(a) {
2	return function(b) {
3		return a + b;
4	}
5}
6
7add(1)(2); // returns 3.

This allows us to define, and then carry around a halfway completed function (it allows you to “specialize” the function).

1var addOne = add(1);
2var val1 = addOne(2); // returns 3
3var val2 = addOne(3); // returns 4.

What the add(1) invocation has done is partially apply parameters to the underlying operation of adding a and b.

This leads us to the idea of partial application, which is the process of taking a function that involves multiple parameters, and generating a new function where some of those parameters are fixed or built-in.

Note that you can have partial application without currying. Where currying is the process of composing a multi-parameter function chain out of many small single parameter functions (increasing the number of parameters), partial application is the process of producing a simpler and more specific function out of a more general function (reducing the number of parameters).

Let’s flip our previous example on its head, and use partial application on our initial add function:

 1function add(a, b) {
 2	return a + b;
 3}
 4
 5// Partially apply this function in a new function.
 6var addOne = function(b) {
 7	return add(1, b);
 8};
 9
10var val1 = addOne(2); // returns 3.

Now, we’ve taken our add function, applied one of the parameters and passed through the other parameter, converting a two-parameter function into a one-parameter function that is more specialized for the task.

So What About Java 8?

There are many arguments that can be made in this space regarding Java’s ability for partial application and currying, and much of it depends on where you are coming from to make your arguments.

Academically speaking, Java unequivocally does not have currying or partial application, simply because Java 8 still does not have first class functions. What Java 8 has are compiler-synthetic lambdas that are generated (via fancy invokedynamic static method lambda meta factory generation trickery) into functional interface objects.

So really, when you get down to the core of things, you’re still dealing with objects, and anything you can do with objects hasn’t really changed, other than a lot of syntactic sugar and some performance benefits as compared to spinning a bunch of anonymous inner classes.

However, from a practical standpoint, the answer is more like “maybe” or “kind-of”.

Keep in mind that at the core of Java’s lambda support is the translation of a lambda into a functional interface. Assuming you know the functional interfaces you’re dealing with (such as the standard java.util.function interfaces), then you can create some reusable utilities that can help. For example, consider partial application using a BiFunction example. We can partially apply parameters by hand:

1BiFunction<Integer,Integer,Integer> adder = (a,b) -> a + b;
2Function<Integer,Integer> addOne = (b) -> adder.apply(1, b);

All we’ve effectively done here is create an adapter object around the initial functional object; but in practice it is indeed partial application. You could also make this partial application a utility:

1public static <A,B,R> Function<B,R> partial(BiFunction<A,B,R> func, A aVal) {
2	return (b) -> return func.apply(aVal, b);
3}
4
5BiFunction<Integer,Integer,Integer> adder = (a,b) -> a + b;
6Function<Integer,Integer> addOne = partial(adder, 1);

From this viewpoint, currying is also “practically” possible, although the verbose type signatures can make it less than ideal, as the curried nature of the value is quite explicit in the typing. Consider our curried add:

1Function<Integer,Function<Integer,Integer>> add = (a) -> (b) -> a + b;

Here we’ve defined a curried add function. We can partially apply as before:

1Function<Integer,Integer> addOne = add.apply(1);
2addOne.apply(2); // returns 3
3addOne.apply(3); // returns 4

The Problems with Generalizing

One of the strengths of true functional currying and partial application is that it can be done in a very general way. Since functions are a core “currency” of the language, you can effectively take any function and curry it into a decomposed chain of functions, allowing you to pass those parts around. Ideally in Java, you could take a three parameter function and turn it into a 3-deep single parameter function chain, but, in practice it’s a lot more sketchy.

It’s worth considering that in Java a “function” is really defined via the implementation of a particular interface (with Java 8 shipping specific “standard” implementations), and despite having a standard library, it’s also meant to bridge the compatibility gap with existing libraries that existed before Java 8, so you can have a lambda become an implementation of any SAM interface (single abstract method).

Once you bake a lambda into one of these interface implementations, you kind of lose the “general” nature of it, and it becomes a specific (and opaque) thing; even if that thing is semantically identical to another thing.

This is quite relevant looking at Google Guava. Guava ships a small suite of functional programming tools, and is very common in many Java projects active today. Most of these functional APIs are superceded by Java 8, but replacing them will be a gradual process for a lot of teams.

For example, Guava has a Function interface. While there are some extra bits in Java 8, at its core, the Google and Java Function interfaces are of parity:

1// Guava
2public interface Function<F,T> {
3	T apply(F param);
4}
5
6// Java 8
7public interface Function<F,T> {
8	T apply(F param);
9}

Now - consider this Guava-interfacing code on Java 8 (using Guava’s Iterables utility class):

1List<Integer> ints = Arrays.asList(1,2,3);
2Iterable<String> strings = Iterables.transform(ints, Integer::toString);

Internally, what Java has actually done here (Java-7-style) can be thought of like this:

1List<Integer> ints = Arrays.asList(1,2,3);
2Iterable<String> strings = Iterables.transform(ints, new com.google.common.base.Function<Integer,String>() {
3  public String apply(Integer x) {
4    return x.toString();
5  }
6});

You’ll get a compile time error if you try and use a standard type intermediately (I’ve left explicit packages in the code for clarity):

1java.util.function.Function<Integer,String> func = Integer::toString;
2com.google.common.base.Function<Integer,String> googFunc = Integer::toString;
3
4// OK:
5Iterable<String> strings = Iterables.transform(ints, googFunc);
6// Compile-time error:
7strings = Iterables.transform(ints, func);

Thankfully, this mismatch can easily be resolved in code by declaring the lambda as the correct type as I illustrated, or by adapting it through using something like a method reference:

1// Construct (or receive) the wrong type
2java.util.function.Function<Integer,String> func = Integer::toString;
3// Use a method reference to adapt the wrong type into the right type via a lambda.
4Iterable<String> strings = Iterables.transform(ints, func::apply);

Here we’re actually creating a new functional wrapper of the “google” type around our core Java function using method references as the short-hand to do so.

However, because functions and their accepted parameters and type declarations are not a “universal” type in the language (at the same ubiquitous level as Object), there will unfortunately always be impedence mismatches of the sort seen above. This sort of friction that exists is practically easily to resolve in code as shown above, but will have a tendency to cause library log-jams as you try to use techniques like partial application and currying.

Concretely speaking, what happens when you want a three-parameter function? You could of course create your own interface for it:

1public interface TriFunction<A,B,C,D> {
2	public D apply(A a, B b, C c);
3}
4
5TriFunction<String,Integer,String,String> wat = (a,b,c) -> a + String.valueOf(b) + c;
6String result = wat.apply("test",5,"testagain"); // test5testagain

But this will not be recognized by other libraries. Or you could get fancy with currying:

1Function<String, Function<Integer, Function<String,String>>> spicy =
2	(a) -> (b) -> (c) -> a + String.valueOf(b) + c;
3String result = spicy.apply("test").apply(5).apply("testagain");

Unfortunately, this isn’t what I’d call idiomatic or friendly to most developers (Java or no), and still runs the issue that adapting through libraries will be bumpy depending on what they chose.

Summary

In-short, Java has the ability to generate, compose, and transform functional objects into different shapes in a far more concise syntax than before, but it’s still far more verbose and type-y than true functional counterparts (even Javascript as I illustrated here).

That doesn’t mean that Java isn’t orders of magnitude better of than it was; simply that it is still a good distance from (and probably never will be) a bastion of functional programming.

Understanding Method References

Java 8 introduces four types of method references:

  • Static - Converts any static method into a lambda with matching formal parameters.
  • Instance-Capturing - Converts a method call on a specific object into a lambda.
  • Constructor - Converts constructors into factory lambdas for a specific type.
  • Arbitrary Instance - Converts a method on all instances of a type into a lambda that accepts instances of that type.

Since method references apply more magic behind the scenes than straight lambdas, I thought it might be good to walk through each with examples from the pre-lamdba Java world on each:

Static

Static method references are probably the easiest to understand. For example, the method Integer.valueOf(String) can be converted into a Function<String,Integer>. Intuitively, that’s exactly what it is already - a context-less method that takes a String and returns an Integer.

Using the functional interfaces in java.util.function, we can illustrate the various ways to build a string to integer conversion function:

 1// The Java 7 way - wish this syntax a fond farewell!
 2Function<String,Integer> f1 = new Function<String,Integer>() {
 3  public Integer apply(String x) {
 4    return Integer.valueOf(x);
 5  }
 6};
 7
 8// The naive (and overly verbose) Java 8 way.
 9Function<String,Integer> f2 = (String str) -> {
10  return Integer.valueOf(str);
11};
12
13// The inference-y short-hand naive Java 8 way.
14Function<String,Integer> f3 = (str) -> Integer.valueOf(str);
15
16// The method reference Java 8 way.
17Function<String,Integer> f4 = Integer::valueOf

Instance-Capturing

Instance-capturing method references are, as the name implies, a way to capture instances in the enclosing type. Consider Integer.toString, which converts an integer instance back into a String:

 1Integer x = new Integer(123);
 2// Create a string supplier off of the instance Java 7 style.
 3Supplier<String> s1 = new Supplier<String>() {
 4  public String get() {
 5    return x.toString();
 6  }
 7}
 8
 9// Short-hand lambda version.
10Supplier<String> s2 = () -> x.toString();
11
12// Method reference version.
13Supplier<String> f2 = x::toString

In effect, the lambda closes over the variable and uses it later when evaluated, and that’s what the method reference is doing internally. We use a supplier in this case because we have no method parameters to toString(), and the instance is already known, so we can simply call the no-arg method to get the resultant value.

But what about an instance method that takes a parameter? That’s no problem, so long as the underlying type that the lambda is being coerced into accepts matching parameters.

So what does that mean? Let’s look at an example. String.indexOf(String). is an instance method on String that accepts a string, and returns an int. We can translate it into a Function like so:

1String str = "abcdefg";
2Function<String,Integer> indexOfFunc = str::indexOf
3Integer x = indexOfFunc.apply("abc"); // returns 0.
4x = indexOfFunc.apply("def"); // returns 3.
5// etc.

However, it’s worth noting that we can also go directly to primitives with Java 8, as there are primitive-friendly functional interfaces in the java.util.function package as well:

1ToIntFunction<String> indexOfFuncPrim = str::indexOf
2int x = indexOfFunc.applyAsInt("abc");

This avoids the boxing and type indirection, at the expense that the API is less generally reusable. There will be some interesting learning curves for API designers regarding whether to and how to properly flow primitive variants through their APIs.

It’s important to note here that the type inferencing implied by the assignment of the method reference is how Java will choose between overloaded methods as well. This is quite important for our next type of method reference.

Constructors

Constructor references are effectively shorthand for factories of Objects. These will go along way towards getting rid of the factory-hell in Java land that has been incurred in the name of making things “pluggable” and “flexible”.

 1Supplier<List<String>> listMaker = ArrayList::new;
 2// Calls new ArrayList<String>();
 3List<String> newList = listMaker.get();
 4
 5IntFunction<List<String>> sizedListMaker = ArrayList::new;
 6// Calls new ArrayList<String>(int) -- setting initial capacity.
 7List<String> newList2 = sizedListMaker.apply(5);
 8
 9// Same with boxed params using standard function interface:
10Function<Integer, List<String>> sizedMaker2 = ArrayList::new;
11List<String> newList3 = sizedMaker2.apply(10); // boxes to Integer.valueOf(10) to call.

Arbitrary-Instance

These types of method references are perhaps the most mind-bending when you first work with them, as they imply there is a magic instance somewhere they work against, however, once you understand the conversion into a lambda, they are quite straightforward.

The goal of an arbitrary instance method reference is to allow you to refer to a method on an instance that will be encountered at execution time. In the Java implementation, this means the instance will be the first argument of the lambda invocation (aka, passing in “self”), and the remainder of the parameters will be as encountered on the method itself.

Let’s walk through an example using Integer.toString() once again. This time the method reference produces a lambda that operates on the integer instance being passed in instead of capturing one in the current context:

 1// Java 7 implementation.
 2Function<Integer,String> f1 = new Function<Integer,String>() {
 3  public String apply(Integer x) {
 4    return x.toString();
 5  }
 6};
 7// Standard lambda.
 8Function<Integer,String> f2 = (x) -> x.toString();
 9
10// Method reference.
11Function<Integer,String> f3 = Integer::toString;

So, where-as our “instance-capturing” method reference produced a method signature with no parameters (aka Supplier's String get() method), our “arbitrary instance” method reference produces a method that still returns a String, but also accepts the instance to operate against; in this case as a Function with the method signature: String apply(Integer self).

Where this gets more interesting (and a little more confusing initially) is when you have methods that accept parameters. Consider our String.indexOf(int) case again. What does a String::indexOf method reference produce? If you remember the rule, the first parameter of the method signature will be our type, and the second-through-N will be the standard parameters of the method. While Java doesn’t ship a whole suite of arbitrarily tupled functional interfaces, it does have a BiFunction that accepts two parameters:

 1// Java 7 form.
 2BiFunction<String,String,Integer> bf1 = new BiFunction<String,String,Integer>() {
 3  public Integer apply(String self, String arg) {
 4    return self.indexOf(arg);
 5  }
 6};
 7
 8// Java 8 standard lambda.
 9BiFunction<String,String,Integer> bf2 = (str,i) -> str.indexOf(i);
10
11// Method reference:
12BiFunction<String,String,Integer> bf3 = String::indexOf;

Admittedly, these typed signatures are long-winded, but keep in mind that generally speaking you won’t see them directly; lambdas are most often declared directly at the call site. So instead you’d have something that would accept a BiFunction (or similar), making the syntax flow like this:

1factory.setIndexFinder(String::indexOf);
2
3// Signature of this method might look like this:
4public class SomethingFactory<T> {
5  public void setIndexFinder(BiFunction<T,String,Integer> finder) { ... }
6}

We can also go so far as to apply primitives in this case as well; Java does have a ToIntBiFunction that allows us to eliminate the boxing return value:

1ToIntBiFunction<String,String> bf = String::indexOf
2int x = bf.applyAsInt("abc", "c"); // returns 2.

The Increased Importance of Generic Exceptions in API Design

One of the more interesting things with Java 8 will be how it impacts the way API design is done. This is already being discussed a lot with lambdas, and indeed, they are the key driver for much of the API change. However, it’s not as simple as making sure you accept single method interfaces everywhere. There are a lot of subtle changes to consider. I will walk through one here with exceptions.

The ability to generically define the exceptions in a throws clause for a method in Java isn’t a new thing (although most devs don’t seem to know or care about it). Pop open any Java 7 project, and you’ll be able to do this:

 1public abstract class Thingy<T extends Throwable> {
 2  public void doSomething() throws T {
 3    // ...
 4  }
 5}
 6
 7public class FileThingy extends Thingy<FileNotFoundException> {
 8  @Override
 9  public void doSomething() throws FileNotFoundException {
10    // ...
11  }
12}

Unfortunately, in practical terms this is of limited value in most cases, as the context of handling the exception usually needs to know the explicit type, making the polymorphic benefits of generics somewhat minimal.

In fact, the only case I’ve ever had where this provided legitimate value was in tunneling typed exceptions through a transaction-handling method.

A Rare Useful Case Pre Java-8

 1public void saveThingy(Thingy thingy) throws NotFoundException, AlreadyExistsException, InvalidInputException {
 2  try {
 3    inTransaction(new TransactionalOp() {
 4      public void exec() throws Throwable {
 5        // may save, or may throw one of many checked exceptions.
 6        throw new AlreadyExistsException();
 7      }
 8    });
 9  }
10  catch(TransactionException e) {
11    e.throwIf(NotFoundException.class)
12     .or(AlreadyExistsException.class)
13     .or(InvalidInputException.class)
14     .elseRuntimeException();
15  }
16}
17
18// the signature of the generic "inTransaction" method:
19public void inTransaction(TransactionalOp op) throws TransactionException { ... }

In this case, we want our transactional op to be able to throw checked exceptions so our outer-method can declare them (and force API users to handle them), but we can’t go changing the signature of our reusable inTransaction method.

To handle a “general” exception bubbling workflow, The inTransaction method catches all Throwables, and wraps them in a new checked exception that can then be unraveled using this fairly clean and fluid API. Effectively, the entire goal of this is to make it easier to type than a whole bunch of if(e.getCause() instanceof XYZ) throw (XYZ)e.getCause(); switches.

The TransactionException implementation could look something like this:

 1public class TransactionException extends Exception {
 2  public TransactionException(Throwable cause) {
 3    super(cause);
 4  }
 5
 6  public <T extends Throwable> TransactionException throwIf(Class<T> type) throws T {
 7    if(type.isInstance(getCause())) throw type.cast(getCause());
 8      return this;
 9  }
10
11  public <T extends Throwable> TransactionException or(Class<T> type) throws T { return throwIf(type); }
12
13  public void elseRuntimeException() {
14    throw new RuntimeException(getCause());
15  }
16}

So, like I said - that’s the only time it’s really benefited me.

Coming Soon with Java 8: More Generic Exceptions

In Java 8 thanks to the utility and terse syntax of lambdas this generification of throws parameters is becoming much more useful, and is something you can start considering to assist developers in using your APIs. Consider, for example, the Java 8 Optional class:

1public void loadString() throws IOException {
2  Optional<String> maybeStr = // load from somewhere.
3  String str = maybeStr.orElseThrow(() -> new IOException());
4}

In this case, we’re asking the Optional to throw our custom exception (an IOException) if the String is not present. To support this flow, the optional class has a method that takes a supplier of exceptions of a certain type:

1public <E extends Throwable> T orElseThrow(Supplier<? extends E> supplier) throws E {
2  // effective impl:
3  if(get() != null) return get();
4  else throw supplier.get();
5}

Since you can specify an implementation of the exception supplier in a very brief lambda definition, this provides a very clear and concise flow, and makes orElseThrow fit your use case, instead of the typical scenario of checked exceptions, where you have to work around things to meet the need of the API that throws the exception.

Bi-Directional References in Google App Engine with ID Pre-Allocation

It’s not uncommon when dealing with any database that you’ll occasionally have records where you need to navigate both from A to B, and from B to A - aka bi-directional relationships. In cases where your database is generating your IDs for you, you have a chicken-egg problem; to insert both records and establish the link at-once isn’t generally possible, as only one of the records will have a generated ID ready in time.

In the relational world, you typically handle this by having foreign-key constraints going both directions, with one being nullable and the other not. You perform both inserts, establishing the link back to the first on the second, and then perform an update on the first record to point back to the first. Another approach is to move away from database auto-generated IDs to some sort of Hi-Lo generator you manage in the application, or similar.

In the Google App Engine / Google Cloud Storage world, you can of course do this the same way using the insert-then-update pattern. Here is a sketch of what this might look like using Objectify 4:

1EntityA a = // ...
2EntityB b = // ...
3ofy().save().entity(a).now();
4b.setA(Ref.create(a));
5ofy().save().entity(b).now();
6a.setB(Ref.create(b));
7ofy().save().entity(a);

If you are at all familiar with Google Cloud Storage (and where the costs are), this example is probably making you cringe. We just made three individual round-trips to GCS, and further, so we could get the allocated IDs in a synchronous fashion, we used the now() join method on the first two calls, tying all of the latency up in our active thread. This is brutal.

Now to be fair, without having any additional tools in our bag, we could optimize this a good bit to just two round-trips by using batch saves with null refs on both sides:

1EntityA a = // ...
2EntityB b = // ...
3ofy().save().entities(a, b).now();
4b.setA(Ref.create(a));
5a.setB(Ref.create(b));
6ofy().save().entities(a, b);

This is better, but still far from ideal. We still have the synchronous block waiting for both A and B to be confirmed as saved and given IDs, and we’re writing both entity twice, which means we’re spending more money than we’d like.

Thankfully, we can do better still.

The GAE datastore has the ability to allocate IDs explicitly on the client. This is also exposed through the Objectify APIs. We can use this to pre-allocate IDs so we not only eliminate the double write cost, but also eliminate the synchronous blocking for the datastore.

Here’s how:

1EntityA a = // ...
2EntityB b = // ...
3a.setId(ofy().factory().allocateId(EntityA.class));
4b.setId(ofy().factory().allocateId(EntityB.class));
5a.setB(Ref.create(b));
6b.setA(Ref.create(a));
7ofy().save().entities(a, b);

Now we’ve found a way to optimize away almost all of our extra datastore interaction - success!

A Caveat

There is at least one caveat as of the time of this writing regarding this approach. In modern GAE deployments, the automatic ID generation uses a “scattered” model, where IDs emitted are distributed all over the 51-bit floating-point-safe long integer range. This is, somewhat opaquely, intended to optimize datastore performance. There are two ways this would likely help performance:

  1. The scattered ID generation might require the client to chat less with the datastore regarding ID ranges. I’m not totally aware of how GAE performs incremental ID range bucketing to avoid conflicts on multiple clients, but I suspect the scattered approach allows for less frequent unique range check-ins from the client to avoid collisions.
  2. The scattered IDs likely distribute better in the key partitions in GCS. With IDs that are numerically close, it’s possible that the hash-ring locations for records clump together more than would normally be desired, meaning that your application is unnecessarily biased to a certain part of the datastore.

I bring this up because, at least for now, the client-side ID allocation is still configured to generate the classic incrementally managed identifiers, and not the scattered IDs that were introduced earlier this year.

gae  gcs  objectify  java 

Objectify Entity Subclass Migrations

If you’re using Google App Engine with Java, chances are good that you’re using Objectify. While Objectify 4.0 final is not technically released, the release-candidate has been available for some time, and has shown to be quite stable.

Unlike with a relational database, the generally preferred way to migrate data in a NoSQL datastore where you may have terabytes of data is gradually, and on an as-needed basis. Typically this manifests in two potential ways, either:

  1. When loading the data, apply a transformation to it to fit the new structure, and re-save it right then, or at least mark it for re-saving later.
  2. When saving a record with new changes, look for any transformations that need to be applied to upgrade it, and apply them prior to saving.

Whichever is chosen, devs often also decide to asynchronously migrate records in the database concurrently to the main application flow, by simply loading them and re-saving them in a background task queue. This forces the migration put in place above.

This assertive asynchronous process provides the advantage that at some point you can remove some of your old migration hitches, with the primary disadvantage that you have to visit a lot more data in a fixed time-window. Sometimes this isn’t feasible (particularly on large log rolls of data), but it can be a useful technique.

Objectify provides all kinds of powerful tools for gradually migrating data in your uber-big GAE datastore to a new model. In particular it has:

  • @OnLoad for applying transformations inside your entity class right after it was loaded.
  • @OnSave for applying transofrmations inside your entity class right before it is saved.
  • @IgnoreSave for disabling an old field after you have loaded and transformed it.
  • @IgnoreLoad for disabling an old field from being loaded, but still allowing you to save it.
  • @AlsoLoad for loading other field names into a new field that is a composite.

These allow you to apply all sorts of transformations to entities, but there are always places that can be problematic. One such area is introducing polymorphism into entity records in your environment.

Say for example you have a record type of WidgetEvent that you have used to track when a widget is enabled in your application. Then, in a subsequent release you realize that you also want to track widget disables, and you will want to refer to both enabled and disabled events as the more abstract WidgetEvent in your application code, and have common Refs from entities to them, as seen here:

1@Entity
2public class MyOtherEntity {
3	// ...
4
5	// May be a WidgetEnabled or WidgetDisabled
6	private Ref<WidgetEvent> event;
7}

To be able to have those common refs, you’re going to need polymorphic support from Objectify. This allows Objectify to ask for a common data type by key, and then load it into a specific runtime type based on stored values. So, you decide you want this final entity structure for your app:

 1// Make the old type an abstract super-class, and push enabled-specific logic down.
 2@Entity
 3public abstract class WidgetEvent { }
 4
 5// Make a new subclass to represent existing data.
 6@EntitySubclass(name="we")
 7public class WidgetEnabled extends WidgetEvent { }
 8
 9// Make a new subclass to represent the new data.
10@EntitySubclass(name="wd")
11public class WidgetDisabled extends WidgetEvent { }

Starting fresh, this is no problem with Objectify. In concrete terms, Objectify will store records with a hidden ^d property in GAE (meaning, discriminator). When saving, this value is set to what you specify in the annotation. When loading, Objectify looks at the value coming from the datastore ^d field, and constructs the appropriate sub-type based on your registered annotations.

Unfortunately, there is a rather confounding issue of introducing hierarchies like this in the form of migrating existing prod data. How can we load existing data that doesn’t have the discriminator value persisted with it? If you just leave it alone, Objectify will quickly start throwing runtime errors in this case because it can’t instantiate the abstract WidgetEvent class.

You could, of course, do this:

1@Entity(name="WidgetEvent") // forces widget enabled to use the old event kind name.
2public class WidgetEnabled { }
3
4@EntitySubclass(name="wd")
5public class WidgetDisabled extends WidgetEnabled { }

This will allow you to have the root type represent your original stored value. But… that doesn’t make much sense, does it? A disabled event doesn’t extend an enabled event. So while this works, it’s messy at best. If you have enabled-specific logic, it will leak down into your disabled class.

Instead, you might want to try something like this to provide your soft migration path:

@Entity
public abstract class WidgetEvent { }

// Try to load null as a discriminator for this sub-type to make it the default.
@EntitySubclass(name="we", alsoLoad=null)
public class WidgetEnabled extends WidgetEvent { }

@EntitySubclass(name="wd")
public class WidgetDisabled extends WidgetEvent { }

The alsoLoad property is a system in Objectify to allow you to take ownership of multiple discriminator value types for one subclass, so it seems perfect for this case. Here we’re trying to say “if there is no discriminator, choose WidgetEnabled”. Unfortunately, while this may seem logical, Objectify has a short-circuit on loading the event type that always chooses the root type (WidgetEvent) when it encounters null for the discriminator.

In fact, I’ve opened a bug to see if this can be be changed to support migrating in this scenario, where null is explicitly specified on a subclass annotation.

In the mean time, what do you do? Well, one decent workaround that allows you to migrate over time (but unfortunately all ahead of time) is to patch your production environment and then use the raw datastore service to migrate your PROD data.

Using our previous example, that process looks like this.

Step 1: Create the Patched PROD Version

1@Entity
2public class WidgetEvent {
3	// All of the existing logic.
4}
5
6@EntitySubclass(name="we")
7public class WidgetEnabled extends WidgetEvent {
8	// Has no body - simply an empty subclass!
9}

Once you’ve done this, you can update your code where you create new un-persisted WidgetEvent instances to create WidgetEnabled instances (theoretically this would be the place where, you know, widgets are enabled).

Note that you don’t want to just refactor+rename WidgetEvent to WidgetEnabled and create a new abstract super-class called WidgetEvent in its place, because you want the rest of your code to work directly with the existing superclass. The reason for this is simple: Objectify, upon encountering a non marked instance in the database, will not create WidgetEnabled; it will create WidgetEvent. Therefore, it’s extremely important your application treat all instances the same way you did before (as the superclass), with the single exception that where you construct new instances, you construct the subclass, and therefore force Objectify to store the ^d value.

So, in short, new data will get the correct ^d value in the database, and the rest will just plug along as usual.

Step 2: Begin Migrating Using Low-Level API

Now that you’ve got a “moving-forward” solution in place, you can start to work on the existing data. Unfortunately, this workaround still requires that you touch/migrate all of your data before you can upgrade to a polymorphic data-set, but at least you don’t have to incur any downtime.

Migrating the existing data is as simple as iterating over the entities using the DataStoreService, and updating them in-place:

1DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
2Query q = new Query("WidgetEvent");
3PreparedQuery pq = ds.prepare(q);
4Iterable<Entity> all = pq.asIterable(FetchOptions.Builder.withPrefetchSize(200).chunkSize(200).limit(Integer.MAX_VALUE));
5int count = 0;
6for(Entity e : all) {
7    e.setProperty("^d", "we");
8    ds.put(e);
9}

This will go through and update every single instance using an iterative fetching process; 200 at a time.

You can choose to run this as a single long-running process, or split it up into a bunch of sub-tasks on a task queue by serializing batches of IDs fetched via q.setKeysOnly() or something similar.

Summary

This is one of the more complex migrations via Objectify. While it’s unfortunate that you can’t let the old data stay at rest and only migrate as needed, it would take a significant amount of data to make the “migrate-ahead-of-time” solution overly burdensome.

Reclaiming the Underscore

Java, as a language, has historically been quite careful to avoid changes that are forwards-incompatible. This is quite obvious to anyone who has spent any time coding against the JVM. Very few changes come in that don’t allow for applications migrate forward naturally. There seem to be different tiers of protection here, with the first being binary compatibility. Ideally, applications compiled with Java 1.4 will still run on JVMs today, which says a lot.

Library compatibility has also been a big effort over the years. Deprecations run amok in the core libraries, and there are duplicate classes for many core pieces of functionality.

Finally, we have source compatibility. Again, most changes to the syntax of Java have been forward-compatible, and in turn the syntax of the language has seen some odd choices to meet this need (the use of : for the “foreach” construct, for example).

That’s why I found it interesting that Java 8 will see the reservation of the _ character for all variable identifiers. Before you grab your pitchfork, note that this is only when using identifier alone (not as part of a larger identifier name), and for all existing cases it’s only a warning about it being possibly unusable in future releases. Here are some examples:

1// This is totally fine.
2String _test = "test";
3
4// This will produce a compiler warning.
5String _ = "test";
6
7// this will produce an error
8Consumer<String> op = (String _) -> { /* ... */ }

Note that in the lambda case, they are immediately failing on the keyword for the “lambda formal”, as this is a position you can’t possibly have in your existing code prior to Java 8.

When pressed for some details on this on the JDK-8 mailing list, this is what Brian Goetz said about the future of “underscore” as a variable name:

Yes, we are “reclaiming” the syntactic real estate of “_” from the space of identifiers for use in future language features. However, because there are existing programs that might use it, it is a warning for identifiers that occur in existing syntactic positions for 8, and an error for lambda formals (since there is no existing code with lambdas.)

Your suspicion is mostly right, except that we are certainly NOT going to do Scala’s “wunderbar”. However, things it might be used for include things like “I don’t want to give this variable a name” (such as catch parameters that are never used.)

java  java8  jdk8 

JEP-171: Fence Intrisics Keep Memory In Line

Doug Lea just posted a new Java enhancement proposal with JEP-171 - Fence Intrinsics. This enhancement is all about exposing memory fence controls into Java code so that the java.util.concurrent APIs can more accurately and efficiently control memory ordering and bounding.

This is an interesting JEP as it proposes no Java consumer-oriented APIs; the only changes would be on the sun.misc.Unsafe class - which is already a fascinating pivot point for many of the advanced concurrency features of Java. Here is a Stack Overflow article that recaps many of them better than I could: StackOverflow.com: “Interesting Uses of sun.misc.Unsafe”. This class (as you can guess from the package) is not intended for regular Java devs to access; it’s an implementation-specific class, and is really only meant for internal use by class library devs on the JDK. (That said, many folks have taken this class by the horns to wrangle the most out of the JVM anyway.)

What is proposed instead is to make ordering fences with memory a first-class citizen in the implementation layer of the JDK so that the various core APIs for concurrency can leverage fencing without having to resort on side-effects of other lower-level intrinsics (something that is done regularly today).

You may be asking why memory fencing is important. Modern CPUs can easily re-order memory accessing and storing when it can see via the upcoming instruction set that re-ordering will not impact the overall outcome of the program as considered by a single thread in the CPU. When you start throwing multiple threads or CPUs at a problem, out-of-order operations on memory that would otherwise go un-noticed could instead cause all kinds of data corruption and confusion. That’s why effectively all of the core synchronization and atomic operations in Java today implicitly carry a memory fence along with them; it’s part of the larger equation of protecting memory access.

This JEP includes three operations in the proposal:

  • Unsafe.loadFence() - Prevent reordering of load operations before this call with loads and stores after this call.
  • Unsafe.storeFence() - Prevent reordering of store operations before this call with loads and stores after this call.
  • Unsafe.fullFence() - Prevent reordering of all memory operations before this call with loads and stores after this call.

You can read more about memory fences in the Wikipedia Memory Barrier article.

It’s worth noting that the JEP does consider potentially surfacing memory fence operations to full devs at some point in the future given that sun.misc.Unsafe is already platform specific (making it risky for external libs to access), and may become impossible to access given the efforts of Jigsaw:

Adding these methods at the VM level permits use by JDK libraries in support of JDK 8 features, while also opening up the possibility of later exporting the base functionality via new java.util.concurrent APIs. This may become essential to allow people developing non-JDK low-level libraries if upcoming modularity support makes these methods impossible for others to access.