Bi-Directional References in Google App Engine with ID Pre-Allocation

It’s not uncommon when dealing with any database that you’ll occasionally have records where you need to navigate both from A to B, and from B to A - aka bi-directional relationships. In cases where your database is generating your IDs for you, you have a chicken-egg problem; to insert both records and establish the link at-once isn’t generally possible, as only one of the records will have a generated ID ready in time.

In the relational world, you typically handle this by having foreign-key constraints going both directions, with one being nullable and the other not. You perform both inserts, establishing the link back to the first on the second, and then perform an update on the first record to point back to the first. Another approach is to move away from database auto-generated IDs to some sort of Hi-Lo generator you manage in the application, or similar.

In the Google App Engine / Google Cloud Storage world, you can of course do this the same way using the insert-then-update pattern. Here is a sketch of what this might look like using Objectify 4:

1EntityA a = // ...
2EntityB b = // ...

If you are at all familiar with Google Cloud Storage (and where the costs are), this example is probably making you cringe. We just made three individual round-trips to GCS, and further, so we could get the allocated IDs in a synchronous fashion, we used the now() join method on the first two calls, tying all of the latency up in our active thread. This is brutal.

Now to be fair, without having any additional tools in our bag, we could optimize this a good bit to just two round-trips by using batch saves with null refs on both sides:

1EntityA a = // ...
2EntityB b = // ...
3ofy().save().entities(a, b).now();
6ofy().save().entities(a, b);

This is better, but still far from ideal. We still have the synchronous block waiting for both A and B to be confirmed as saved and given IDs, and we’re writing both entity twice, which means we’re spending more money than we’d like.

Thankfully, we can do better still.

The GAE datastore has the ability to allocate IDs explicitly on the client. This is also exposed through the Objectify APIs. We can use this to pre-allocate IDs so we not only eliminate the double write cost, but also eliminate the synchronous blocking for the datastore.

Here’s how:

1EntityA a = // ...
2EntityB b = // ...
7ofy().save().entities(a, b);

Now we’ve found a way to optimize away almost all of our extra datastore interaction - success!

A Caveat

There is at least one caveat as of the time of this writing regarding this approach. In modern GAE deployments, the automatic ID generation uses a “scattered” model, where IDs emitted are distributed all over the 51-bit floating-point-safe long integer range. This is, somewhat opaquely, intended to optimize datastore performance. There are two ways this would likely help performance:

  1. The scattered ID generation might require the client to chat less with the datastore regarding ID ranges. I’m not totally aware of how GAE performs incremental ID range bucketing to avoid conflicts on multiple clients, but I suspect the scattered approach allows for less frequent unique range check-ins from the client to avoid collisions.
  2. The scattered IDs likely distribute better in the key partitions in GCS. With IDs that are numerically close, it’s possible that the hash-ring locations for records clump together more than would normally be desired, meaning that your application is unnecessarily biased to a certain part of the datastore.

I bring this up because, at least for now, the client-side ID allocation is still configured to generate the classic incrementally managed identifiers, and not the scattered IDs that were introduced earlier this year.

gae  gcs  objectify  java 

Objectify Entity Subclass Migrations

If you’re using Google App Engine with Java, chances are good that you’re using Objectify. While Objectify 4.0 final is not technically released, the release-candidate has been available for some time, and has shown to be quite stable.

Unlike with a relational database, the generally preferred way to migrate data in a NoSQL datastore where you may have terabytes of data is gradually, and on an as-needed basis. Typically this manifests in two potential ways, either:

  1. When loading the data, apply a transformation to it to fit the new structure, and re-save it right then, or at least mark it for re-saving later.
  2. When saving a record with new changes, look for any transformations that need to be applied to upgrade it, and apply them prior to saving.

Whichever is chosen, devs often also decide to asynchronously migrate records in the database concurrently to the main application flow, by simply loading them and re-saving them in a background task queue. This forces the migration put in place above.

This assertive asynchronous process provides the advantage that at some point you can remove some of your old migration hitches, with the primary disadvantage that you have to visit a lot more data in a fixed time-window. Sometimes this isn’t feasible (particularly on large log rolls of data), but it can be a useful technique.

Objectify provides all kinds of powerful tools for gradually migrating data in your uber-big GAE datastore to a new model. In particular it has:

  • @OnLoad for applying transformations inside your entity class right after it was loaded.
  • @OnSave for applying transofrmations inside your entity class right before it is saved.
  • @IgnoreSave for disabling an old field after you have loaded and transformed it.
  • @IgnoreLoad for disabling an old field from being loaded, but still allowing you to save it.
  • @AlsoLoad for loading other field names into a new field that is a composite.

These allow you to apply all sorts of transformations to entities, but there are always places that can be problematic. One such area is introducing polymorphism into entity records in your environment.

Say for example you have a record type of WidgetEvent that you have used to track when a widget is enabled in your application. Then, in a subsequent release you realize that you also want to track widget disables, and you will want to refer to both enabled and disabled events as the more abstract WidgetEvent in your application code, and have common Refs from entities to them, as seen here:

2public class MyOtherEntity {
3	// ...
5	// May be a WidgetEnabled or WidgetDisabled
6	private Ref<WidgetEvent> event;

To be able to have those common refs, you’re going to need polymorphic support from Objectify. This allows Objectify to ask for a common data type by key, and then load it into a specific runtime type based on stored values. So, you decide you want this final entity structure for your app:

 1// Make the old type an abstract super-class, and push enabled-specific logic down.
 3public abstract class WidgetEvent { }
 5// Make a new subclass to represent existing data.
 7public class WidgetEnabled extends WidgetEvent { }
 9// Make a new subclass to represent the new data.
11public class WidgetDisabled extends WidgetEvent { }

Starting fresh, this is no problem with Objectify. In concrete terms, Objectify will store records with a hidden ^d property in GAE (meaning, discriminator). When saving, this value is set to what you specify in the annotation. When loading, Objectify looks at the value coming from the datastore ^d field, and constructs the appropriate sub-type based on your registered annotations.

Unfortunately, there is a rather confounding issue of introducing hierarchies like this in the form of migrating existing prod data. How can we load existing data that doesn’t have the discriminator value persisted with it? If you just leave it alone, Objectify will quickly start throwing runtime errors in this case because it can’t instantiate the abstract WidgetEvent class.

You could, of course, do this:

1@Entity(name="WidgetEvent") // forces widget enabled to use the old event kind name.
2public class WidgetEnabled { }
5public class WidgetDisabled extends WidgetEnabled { }

This will allow you to have the root type represent your original stored value. But… that doesn’t make much sense, does it? A disabled event doesn’t extend an enabled event. So while this works, it’s messy at best. If you have enabled-specific logic, it will leak down into your disabled class.

Instead, you might want to try something like this to provide your soft migration path:

public abstract class WidgetEvent { }

// Try to load null as a discriminator for this sub-type to make it the default.
@EntitySubclass(name="we", alsoLoad=null)
public class WidgetEnabled extends WidgetEvent { }

public class WidgetDisabled extends WidgetEvent { }

The alsoLoad property is a system in Objectify to allow you to take ownership of multiple discriminator value types for one subclass, so it seems perfect for this case. Here we’re trying to say “if there is no discriminator, choose WidgetEnabled”. Unfortunately, while this may seem logical, Objectify has a short-circuit on loading the event type that always chooses the root type (WidgetEvent) when it encounters null for the discriminator.

In fact, I’ve opened a bug to see if this can be be changed to support migrating in this scenario, where null is explicitly specified on a subclass annotation.

In the mean time, what do you do? Well, one decent workaround that allows you to migrate over time (but unfortunately all ahead of time) is to patch your production environment and then use the raw datastore service to migrate your PROD data.

Using our previous example, that process looks like this.

Step 1: Create the Patched PROD Version

2public class WidgetEvent {
3	// All of the existing logic.
7public class WidgetEnabled extends WidgetEvent {
8	// Has no body - simply an empty subclass!

Once you’ve done this, you can update your code where you create new un-persisted WidgetEvent instances to create WidgetEnabled instances (theoretically this would be the place where, you know, widgets are enabled).

Note that you don’t want to just refactor+rename WidgetEvent to WidgetEnabled and create a new abstract super-class called WidgetEvent in its place, because you want the rest of your code to work directly with the existing superclass. The reason for this is simple: Objectify, upon encountering a non marked instance in the database, will not create WidgetEnabled; it will create WidgetEvent. Therefore, it’s extremely important your application treat all instances the same way you did before (as the superclass), with the single exception that where you construct new instances, you construct the subclass, and therefore force Objectify to store the ^d value.

So, in short, new data will get the correct ^d value in the database, and the rest will just plug along as usual.

Step 2: Begin Migrating Using Low-Level API

Now that you’ve got a “moving-forward” solution in place, you can start to work on the existing data. Unfortunately, this workaround still requires that you touch/migrate all of your data before you can upgrade to a polymorphic data-set, but at least you don’t have to incur any downtime.

Migrating the existing data is as simple as iterating over the entities using the DataStoreService, and updating them in-place:

1DatastoreService ds = DatastoreServiceFactory.getDatastoreService();
2Query q = new Query("WidgetEvent");
3PreparedQuery pq = ds.prepare(q);
4Iterable<Entity> all = pq.asIterable(FetchOptions.Builder.withPrefetchSize(200).chunkSize(200).limit(Integer.MAX_VALUE));
5int count = 0;
6for(Entity e : all) {
7    e.setProperty("^d", "we");
8    ds.put(e);

This will go through and update every single instance using an iterative fetching process; 200 at a time.

You can choose to run this as a single long-running process, or split it up into a bunch of sub-tasks on a task queue by serializing batches of IDs fetched via q.setKeysOnly() or something similar.


This is one of the more complex migrations via Objectify. While it’s unfortunate that you can’t let the old data stay at rest and only migrate as needed, it would take a significant amount of data to make the “migrate-ahead-of-time” solution overly burdensome.