Example Domain Model
ID NAME TIPS ---------------------------------------------------- 1 swimming Swim in open water when you can. 2 cycling Know how to fix a flat tire. 3 running Find shoes that fit well.
ID USER_ID SPORT_ID NOTES CREATED_AT --------------------------------------------------------------- 1 1 2 First bike ride 2019-06-15 10:15:00 2 1 2 Second bike ride 2019-06-16 14:30:00 3 1 1 First swim 2019-06-18 18:00:00 4 1 1 Second swim 2019-06-20 17:45:00 5 1 3 First run 2019-06-21 07:00:00
Inspecting the data, we can see that the answer is these three records:
ID USER_ID SPORT_ID NOTES CREATED_AT --------------------------------------------------------------- 2 1 2 Second bike ride 2019-06-16 14:30:00 4 1 1 Second swim 2019-06-20 17:45:00 5 1 3 First run 2019-06-21 07:00:00
Writing the SQL query
We can find the same answer in SQL with a query that groups by sport type, orders by when the workout was recorded, and joins back to workouts to choose the record with the latest timestamp:
There are other ways to write this query. In Postgres you might want to replace the GROUP BY and JOIN with a DISTINCT ON expression, or use a window function like row_number(). You may want to experiment to find the query that yields the best performance with your data.
Another subtle point: the query above joins on created_at. In theory, you can have more than one row in workouts for a given sport_id with the same created_at values, which will cause this query to return multiple most recent workouts for that sport, violating the application's expectations.
Depending on your application, that situation may be impossible in practice. If it can happen, you need to refine the query to ensure you obtain a single most recent related item for each category. Unfortunately, when limiting ourselves to standard SQL, the more robust query becomes much harder to read:
Database-specific features really start to shine here. The Postgres version below obtains the same desired results. Look how much more concise it is!
For the rest of this example, we will assume that duplicate created_at values are impossible due to how our application works, so we can stick to the simpler standard SQL version.
Integrating the query into Rails
We want to use the results of this query in our Rails app, and write code like the following:
# run the SQL query above, load the results into Workout model instancesHow do we plug the SQL query into our ActiveRecord models?
Workout.most_recent_by_sport
# query sports, and eager load their associated most recent workouts Sport .where(name: ['swimming','cycling'], workouts: { user_id: 1 }) .includes(:most_recent_workout)
Scope for a custom query
We start by defining a scope in Workout, which is the "many" model in this example. We utilize the ActiveRecord from method to integrate our custom SQL.
There's an important detail here: We must wrap the plain query inside a ( ... ) workouts expression. This makes it so that the generated SQL will match up correctly with normal ActiveRecord queries.
With this in place, we now have enough to put the Workout.most_recent_by_sport expression in our app.
Association for a custom query
On the other side, in our Sport model, we also want to use this query when loading associated workouts. For normal Rails associations, we get association methods for free. For example, since
Sport has_many :workouts
we can write expressions like sport.workouts, or workout.sport.
In particular, we can use built-in association methods to avoid an N + 1 query. Say we have a web page where we display the list of sports along with all of their workouts. The approach below will prompt an N + 1 query:
We can prevent that problem by changing the first line above to Sport.includes(:workouts).all
The includes method will prompt ActiveRecord to load the sports and their associated workouts with only two queries. Side note: If you don't like remembering to eager load associations this way, check out Salsify's goldiloader gem!
It would be ideal to have the same methods available for our "most recent workout by sport" query. We can do that with a slightly more complicated association:
The association leverages ActiveRecord's merge method to bring in our custom query. Equivalently here, we could have interpolated the SQL into a string, like so:
from("(#{Workout.most_recent_by_sport.to_sql}) workouts")
That would have been safe in this case since we fully define that SQL in our source code, but in general SQL generated with string interpolation can create a security vulnerability. Use caution in situations where your query might incorporate user-input.
All set!
With this association in place, we obtain the normal functionality of ActiveRecord associations. Our web page could now display the list of sports along with their most recent respective workouts:
One final performance note: you may need indexes to make queries like this efficient.
Good index design depends on your table's columns, the distribution of record values, and the specific queries your app needs to execute, so I can't offer a universal recommendation.
As a starting point, indexes on the "many" table's category ID and timestamp (sport_id, created_at) may help the performance of the scope query.