Loading the Roles in a separate SQL query is actually an optimization called "Optimized Eager Loading Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`. `id` IN (2,1)) (It is doing this instead of loading each role separately, the N+1 problem. ) The Rails team found it was usually faster to use an IN query with the associations looked up previously instead of doing a big join A join will only happen in this query if you add conditions on one of the other tables.
Rails will detect this and do the join For example: User. All(:include => :roles, :conditions => "roles. Name = 'Admin'") See the original ticket this previous Stack Overflow question and Fabio Akita's blog post about Optimized Eager Loading.
Loading the Roles in a separate SQL query is actually an optimization called "Optimized Eager Loading". Role Load (0.8ms) SELECT * FROM `roles` WHERE (`roles`. `id` IN (2,1)) (It is doing this instead of loading each role separately, the N+1 problem.) The Rails team found it was usually faster to use an IN query with the associations looked up previously instead of doing a big join.
A join will only happen in this query if you add conditions on one of the other tables. Rails will detect this and do the join. For example: User.
All(:include => :roles, :conditions => "roles. Name = 'Admin'") See the original ticket, this previous Stack Overflow question, and Fabio Akita's blog post about Optimized Eager Loading.
As Damien pointed out, if you really want a single query every time you should use join. But you might not want a single SQL call. Here's why (from here): Optimized Eager Loading Let’s take a look at this: Post.
Find(:all, :include => :comments) Until Rails 2.0 we would see something like the following SQL query in the log: SELECT `posts`. `id` AS t0_r0, `posts`. `title` AS t0_r1, `posts`.
`body` AS t0_r2, `comments`. `id` AS t1_r0, `comments`. `body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments.
Post_id = posts. Id But now, in Rails 2.1, the same command will deliver different SQL queries. Actually at least 2, instead of 1.
€œAnd how can this be an improvement? €? Let’s take a look at the generated SQL queries: SELECT `posts`.
`id`, `posts`. `title`, `posts`. `body` FROM `posts` SELECT `comments`.
`id`, `comments`. `body` FROM `comments` WHERE (`comments`. Post_id IN (130049073,226779025,269986261,921194568,972244995)) The :include keyword for Eager Loading was implemented to tackle the dreaded 1+N problem.
This problem happens when you have associations, then you load the parent object and start loading one association at a time, thus the 1+N problem. If your parent object has 100 children, you would run 101 queries, which is not good. One way to try to optimize this is to join everything using an OUTER JOIN clause in the SQL, that way both the parent and children objects are loaded at once in a single query.
Seemed like a good idea and actually still is. But for some situations, the monster outer join becomes slower than many smaller queries. A lot of discussion has been going on and you can take a look at the details at the tickets 9640, 9497, 9560, L109.
The bottom line is: generally it seems better to split a monster join into smaller ones, as you’ve seen in the above example. This avoid the cartesian product overload problem. For the uninitiated, let’s run the outer join version of the query: mysql> SELECT `posts`.
`id` AS t0_r0, `posts`. `title` AS t0_r1, `posts`. `body` AS t0_r2, `comments`.
`id` AS t1_r0, `comments`. `body` AS t1_r1 FROM `posts` LEFT OUTER JOIN `comments` ON comments. Post_id = posts.
Id ; +-----------+-----------------+--------+-----------+---------+ | t0_r0 | t0_r1 | t0_r2 | t1_r0 | t1_r1 | +-----------+-----------------+--------+-----------+---------+ | 130049073 | RailsConf | MyText | NULL | NULL | | 226779025 | Brazil | MyText | 816076421 | MyText5 | | 269986261 | World | MyText | 61594165 | MyText3 | | 269986261 | World | MyText | 734198955 | MyText1 | | 269986261 | World | MyText | 765025994 | MyText4 | | 269986261 | World | MyText | 777406191 | MyText2 | | 921194568 | Rails 2.1 | NULL | NULL | NULL | | 972244995 | AkitaOnRails | NULL | NULL | NULL | +-----------+-----------------+--------+-----------+---------+ 8 rows in set (0.00 sec) Pay attention to this: do you see lots of duplications in the first 3 columns (t0_r0 up to t0_r2)? Those are the Post model columns, the remaining being each post’s comment columns. Notice that the “ World�
Post was repeated 4 times. That’s what a join does: the parent rows are repeated for each children. That particular post has 4 comments, so it was repeated 4 times.
The problem is that this hits Rails hard, because it will have to deal with several small and short-lived objects. The pain is felt in the Rails side, not that much on the MySQL side. Now, compare that to the smaller queries: mysql> SELECT `posts`.
`id`, `posts`. `title`, `posts`. `body` FROM `posts` ; +-----------+-----------------+--------+ | id | title | body | +-----------+-----------------+--------+ | 130049073 | RailsConf | MyText | | 226779025 | Brazil | MyText | | 269986261 | World | MyText | | 921194568 | Rails 2.1 | NULL | | 972244995 | AkitaOnRails | NULL | +-----------+-----------------+--------+ 5 rows in set (0.00 sec) mysql> SELECT `comments`.
`id`, `comments`. `body` FROM `comments` WHERE (`comments`. Post_id IN (130049073,226779025,269986261,921194568,972244995)); +-----------+---------+ | id | body | +-----------+---------+ | 61594165 | MyText3 | | 734198955 | MyText1 | | 765025994 | MyText4 | | 777406191 | MyText2 | | 816076421 | MyText5 | +-----------+---------+ 5 rows in set (0.00 sec) Actually I am cheating a little bit, I manually removed the created_at and updated_at fields from the all the above queries in order for you to understand it a little bit clearer.
So, there you have it: the posts result set, separated and not duplicated, and the comments result set with the same size as before. The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with. Allocating and deallocating several hundreds or thousands of small duplicated objects is never a good deal.
But this new feature is smart. Let’s say you want something like this: >> Post. Find(:all, :include => :comments, :conditions => "comments.
Created_at >? ", 1.week.ago. To_s(:db)) In Rails 2.1, it will understand that there is a filtering condition for the ‘comments’ table, so it will not break it down into the small queries, but instead, it will generate the old outer join version, like this: SELECT `posts`.
`id` AS t0_r0, `posts`. `title` AS t0_r1, `posts`. `body` AS t0_r2, `posts`.
`created_at` AS t0_r3, `posts`. `updated_at` AS t0_r4, `comments`. `id` AS t1_r0, `comments`.
`post_id` AS t1_r1, `comments`. `body` AS t1_r2, `comments`. `created_at` AS t1_r3, `comments`.
`updated_at` AS t1_r4 FROM `posts` LEFT OUTER JOIN `comments` ON comments. Post_id = posts.Id WHERE (comments. Created_at > '2008-05-18 18:06:34') So, nested joins, conditions, and so forth on join tables should still work fine.
Overall it should speed up your queries. Some reported that because of more individual queries, MySQL seems to receive a stronger punch CPU-wise. Do you home work and make your stress tests and benchmarks to see what happens.
1 Thanks for the in depth explanation. I understand that large joins are expensive, however, my code that's finding this user is in a before filter to get current_user from a sessions, so I'm only ever choosing one user, so there's no massive join. I would think then that a join over an includes would be faster for finding a single user with multiple roles.
Any suggestions on how to do that? – brad Nov 9 '09 at 17:29 You can always do a find_by_sql and put whatever SQL you want. – JRL Nov 9 '09 at 18:04.
Including a model loads the datas. But makes a second query. For what you want to do, you should use the :joins parameter.
User = User. Find_by_id(x, :joins => :roles).
But user. Roles makes a second query anyway so that doesn't help at all – brad Nov 9 '09 at 17:24.
The pain is felt in the Rails side, not that much on the MySQL side. Actually I am cheating a little bit, I manually removed the created_at and updated_at fields from the all the above queries in order for you to understand it a little bit clearer. So, there you have it: the posts result set, separated and not duplicated, and the comments result set with the same size as before.
The longer and more complex the result set, the more this matters because the more objects Rails would have to deal with.
I cant really gove you an answer,but what I can give you is a way to a solution, that is you have to find the anglde that you relate to or peaks your interest. A good paper is one that people get drawn into because it reaches them ln some way.As for me WW11 to me, I think of the holocaust and the effect it had on the survivors, their families and those who stood by and did nothing until it was too late.