Thread: Merge Join vs Nested Loop

Merge Join vs Nested Loop

From
Tobias Brox
Date:
I have some odd cases here joining two tables - the planner insists on
Merge Join, but Nested Loop is really faster - and that makes sense,
since I'm selecting just a small partition of the data available.  All
planner constants seems to be set at the default values, the only way to
get a shift towards Nested Loops seems to be to raise the constants.  I
believe our memory is big enough to hold the indices, and that the
effective_cache_size is set to a sane value (but how to verify that,
anyway?).

What causes the nested loops to be estimated so costly - or is it the
merge joins that are estimated too cheaply?  Should I raise all the
planner cost constants, or only one of them?

Here are some sample explains:


prod=> explain analyze select * from ticket join users on users_id=users.id where ticket.created>'2006-09-25 17:00';
                                                                  QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=0.00..67664.15 rows=10977 width=675) (actual time=0.038..202.877 rows=10627 loops=1)
   ->  Index Scan using ticket_on_created on ticket  (cost=0.00..11665.94 rows=10977 width=80) (actual
time=0.014..35.571rows=10627 loops=1) 
         Index Cond: (created > '2006-09-25 17:00:00'::timestamp without time zone)
   ->  Index Scan using users_pkey on users  (cost=0.00..5.00 rows=1 width=595) (actual time=0.007..0.008 rows=1
loops=10627)
         Index Cond: ("outer".users_id = users.id)
 Total runtime: 216.612 ms
(6 rows)

prod=> explain analyze select * from ticket join users on users_id=users.id where ticket.created>'2006-09-25 16:00';
                                                                     QUERY PLAN

----------------------------------------------------------------------------------------------------------------------------------------------------
 Merge Join  (cost=12844.93..68580.37 rows=11401 width=675) (actual time=106.631..1712.458 rows=11554 loops=1)
   Merge Cond: ("outer".id = "inner".users_id)
   ->  Index Scan using users_pkey on users  (cost=0.00..54107.38 rows=174508 width=595) (actual time=0.041..1215.221
rows=174599loops=1) 
   ->  Sort  (cost=12844.93..12873.43 rows=11401 width=80) (actual time=105.753..123.905 rows=11554 loops=1)
         Sort Key: ticket.users_id
         ->  Index Scan using ticket_on_created on ticket  (cost=0.00..12076.68 rows=11401 width=80) (actual
time=0.074..65.297rows=11554 loops=1) 
               Index Cond: (created > '2006-09-25 16:00:00'::timestamp without time zone)
 Total runtime: 1732.452 ms
(8 rows)

Re: Merge Join vs Nested Loop

From
Tom Lane
Date:
Tobias Brox <tobias@nordicbet.com> writes:
> What causes the nested loops to be estimated so costly - or is it the
> merge joins that are estimated too cheaply?  Should I raise all the
> planner cost constants, or only one of them?

If your tables are small enough to fit (mostly) in memory, then the
planner tends to overestimate the cost of a nestloop because it fails to
account for cacheing effects across multiple scans of the inner table.
This is addressed in 8.2, but in earlier versions about all you can do
is reduce random_page_cost, and a sane setting of that (ie not less than
1.0) may not be enough to push the cost estimates where you want them.
Still, reducing random_page_cost ought to be your first recourse.

            regards, tom lane

Re: Merge Join vs Nested Loop

From
Tobias Brox
Date:
[Tom Lane - Tue at 06:09:56PM -0400]
> If your tables are small enough to fit (mostly) in memory, then the
> planner tends to overestimate the cost of a nestloop because it fails to
> account for cacheing effects across multiple scans of the inner table.
> This is addressed in 8.2, but in earlier versions about all you can do
> is reduce random_page_cost, and a sane setting of that (ie not less than
> 1.0) may not be enough to push the cost estimates where you want them.
> Still, reducing random_page_cost ought to be your first recourse.

Thank you.  Reducing the random page hit cost did reduce the nested loop
cost significantly, sadly the merge join costs where reduced even
further, causing the planner to favor those even more than before.
Setting the effective_cache_size really low solved the issue, but I
believe we rather want to have a high effective_cache_size.

Eventually, setting the effective_cache_size to near-0, and setting
random_page_cost to 1 could maybe be a desperate measure.  Another one
is to turn off merge/hash joins and seq scans.  It could be a worthwhile
experiment if nothing else :-)

The bulk of our database is historical data that most often is not
touched at all, though one never knows for sure until the queries have
run all through - so table partitioning is not an option, it seems like.
My general idea is that nested loops would cause the most recent data
and most important part of the indexes to stay in the OS cache.  Does
this make sense from an experts point of view? :-)


Re: Merge Join vs Nested Loop

From
Scott Marlowe
Date:
On Wed, 2006-09-27 at 11:48 +0200, Tobias Brox wrote:
> [Tom Lane - Tue at 06:09:56PM -0400]
> > If your tables are small enough to fit (mostly) in memory, then the
> > planner tends to overestimate the cost of a nestloop because it fails to
> > account for cacheing effects across multiple scans of the inner table.
> > This is addressed in 8.2, but in earlier versions about all you can do
> > is reduce random_page_cost, and a sane setting of that (ie not less than
> > 1.0) may not be enough to push the cost estimates where you want them.
> > Still, reducing random_page_cost ought to be your first recourse.
>
> Thank you.  Reducing the random page hit cost did reduce the nested loop
> cost significantly, sadly the merge join costs where reduced even
> further, causing the planner to favor those even more than before.
> Setting the effective_cache_size really low solved the issue, but I
> believe we rather want to have a high effective_cache_size.
>
> Eventually, setting the effective_cache_size to near-0, and setting
> random_page_cost to 1 could maybe be a desperate measure.  Another one
> is to turn off merge/hash joins and seq scans.  It could be a worthwhile
> experiment if nothing else :-)
>
> The bulk of our database is historical data that most often is not
> touched at all, though one never knows for sure until the queries have
> run all through - so table partitioning is not an option, it seems like.
> My general idea is that nested loops would cause the most recent data
> and most important part of the indexes to stay in the OS cache.  Does
> this make sense from an experts point of view? :-)

Have you tried chaning the cpu_* cost options to see how they affect
merge versus nested loop?

Re: Merge Join vs Nested Loop

From
Tobias Brox
Date:
[Scott Marlowe - Wed at 09:58:30AM -0500]
> Have you tried chaning the cpu_* cost options to see how they affect
> merge versus nested loop?

As said in the original post, increasing any of them shifts the planner
towards nested loops instead of merge_join.  I didn't check which one of
the cost constants made the most impact.

Re: Merge Join vs Nested Loop

From
Scott Marlowe
Date:
On Wed, 2006-09-27 at 17:05 +0200, Tobias Brox wrote:
> [Scott Marlowe - Wed at 09:58:30AM -0500]
> > Have you tried chaning the cpu_* cost options to see how they affect
> > merge versus nested loop?
>
> As said in the original post, increasing any of them shifts the planner
> towards nested loops instead of merge_join.  I didn't check which one of
> the cost constants made the most impact.

So, by decreasing them, you should move away from nested loops then,
right?  Has that not worked for some reason?

Re: Merge Join vs Nested Loop

From
Tobias Brox
Date:
[Scott Marlowe - Wed at 10:19:24AM -0500]
> So, by decreasing them, you should move away from nested loops then,
> right?  Has that not worked for some reason?

I want to move to nested loops, they are empirically faster in many of
our queries, and that makes sense since we've got quite big tables and
most of the queries only touch a small partition of the data.

I've identified that moving any of the cost constants (including
random_page_cost) upwards gives me the right result, but I'm still wary
if this is the right thing to do.  Even if so, what constants should I
target first?  I could of course try to analyze a bit what constants
give the biggest impact.  Then again, we have many more queries hitting
the database than the few I'm doing research into (and those I'm doing
research into is even very simplified versions of the real queries).

Re: Merge Join vs Nested Loop

From
Scott Marlowe
Date:
On Wed, 2006-09-27 at 10:26, Tobias Brox wrote:
> [Scott Marlowe - Wed at 10:19:24AM -0500]
> > So, by decreasing them, you should move away from nested loops then,
> > right?  Has that not worked for some reason?
>
> I want to move to nested loops, they are empirically faster in many of
> our queries, and that makes sense since we've got quite big tables and
> most of the queries only touch a small partition of the data.
>
> I've identified that moving any of the cost constants (including
> random_page_cost) upwards gives me the right result, but I'm still wary
> if this is the right thing to do.  Even if so, what constants should I
> target first?  I could of course try to analyze a bit what constants
> give the biggest impact.  Then again, we have many more queries hitting
> the database than the few I'm doing research into (and those I'm doing
> research into is even very simplified versions of the real queries).

Ahh, the other direction then.  I would think it's safer to nudge these
a bit than to drop random page cost to 1 or set effective_cache_size to
1000 etc...

But I'm sure you should test the other queries and / or keep an eye on
your database while running to make sure those changes don't impact
other users.

Re: Merge Join vs Nested Loop

From
Tobias Brox
Date:
[Scott Marlowe - Wed at 10:31:35AM -0500]
> And remember, you can always change any of those settings in session for
> just this one query to force the planner to make the right decision.

sure ... I could identify the most problematic queries, and hack up the
software application to modify the config settings for those exact
queries ... but it's a very ugly solution. :-)  Particularly if Tom Lane
is correct saying the preferance of merge join instead of nested loop is
indeed a bug.

Re: Merge Join vs Nested Loop

From
Tobias Brox
Date:
I found a way to survive yet some more weeks :-)

One of the queries we've had most problems with today is principially
something like:

  select A.*,sum(B.*) from A join B where A.created>x and ... order by
  A.created desc limit 32 group by A.*

There is by average two rows in B for every row in A.
Note the 'limit 32'-part.  I rewrote the query to:

  select A.*,(select sum(B.*) from B ...) where A.created>x and ...
  order by A.created desc limit 32;

And voila, the planner found out it needed just some few rows from A,
and execution time was cutted from 1-2 minutes down to 20 ms. :-)

I've also started thinking a bit harder about table partitioning, if we
add some more redundancy both to the queries and the database, it may
help us drastically reduce the real expenses of some of the merge
joins...