Most Rails developers quickly learn that minimizing the number of database queries by properly specifying eager loads is critical for good application performance. Unfortunately specifying eager loads is error prone and can cause encapsulation problems. In this post we'll explore having Rails automatically handle eager loads.
Problems with Eager Loading
Let's explore some of the issues with eager loading by considering the following models for a blogging application:
The Rails app has a PostsController
with the following index
method:
The list of posts is rendered with the following view:
Things are working well and then some yahoo updates the view to include the post's author:
Uh oh. The app just got slower. Looking at the query log we see that we're running a query per post to retrieve its author:
Looks like our yahoo developer forgot to update the eager loads in the PostsController
. The fix is fairly straightforward:
Examining the query log after the fix we see that we're now running one query to retrieve all of the authors for our posts:
Tools like Bullet can help detect these problems earlier in the development lifecycle but there's something fundamentally wrong here. Why should the PostsController
care what associations the view is going to use? This is an implementation detail of the view that the controller shouldn't care about. The overly simplified example here used a controller and a view but this is a more general problem that can happen whenever an object uses associations on a collection of models that have been loaded by another object. Let's see if we can do something to make this better.
A Better Approach?
Let's start by seeing if we can simplify the problem a bit. What if we assume that all models retrieved by a query are used in a uniform way? In other words, if we have code that's iterating over a collection of blogs, can we assume that if we access one blog's posts that we're highly likely to access all the other blogs' posts? Analyzing the source code for the Rails app we're developing at Salsify the answer is yes.
Let's run with this idea. What if Rails automatically eager loaded an association for a collection of models the first time the association was used in any of the models? Ideally we'd see queries like this executed:
The loading of associated models is both lazy because it isn't triggered until the association is first used and eager because we prefetch associated models for all models in our collection.
Sounds too good to be true? What's the catch? Well clearly this doesn't work well if the models in the collection are not accessed in a uniform way but even then some amount of over fetching may still perform better than falling back to completely lazy loading. A more thorny problem happens when we call empty?
on an association. Consider the following example with our mythical automatic eager loading:
Where did that extra query come from? The first time around the loop the posts
association wasn't loaded so Rails computed empty?
by running a SQL count query. Then the posts
association was accessed forcing the association to load for all the blogs in the collection. The next time around the loop Rails computed empty?
by checking the size of the loaded array of associated models. Hmm. What's the right thing to do here? Should we guess that if we call posts.empty?
, then we're likely to iterate over the posts collection? Seems like a dangerous assumption to make. We've got a similar problem for the following association methods:
first
second
third
forth
fifth
last
size
ids_reader
empty?
exists?
Fortunately in the Salsify codebase this is only a problem for 2 out of 192 associations. For these two fairly low cardinality associations (one of which is used in our most commonly called, performance critical controller method), the usage is at least consistent. Clients who call methods like empty?
always iterate over all models in the associated collection. To handle these relatively rare cases we can introduce a bit of metadata on the association:
When fully_load
is set to true (the default is false), then Rails should load all of the associated models when methods like empty?
are called (which may in turn trigger automatic eager loading for a collection of models). It's certainly not perfect but it seems better than specifying all of our eager loads.
At this point you're probably thinking "Great we've constructed a magical world where I don't need to specify eager loads but I live and write code in the real world." Well you're in luck. We've written a gem called Goldiloader that implements exactly this functionality! Give it a whirl and let us know what you think.
Conclusion
Specifying Rails eager loads is critical for good application performance but eager loads are error prone and can cause encapsulation problems. We've written a gem called Goldiloader that provides just the right amount of eager loading (with some caveats and assumptions of course). What do you think? Will this make your life easier? Is it a horrible idea? Add a comment and let us know your thoughts!