Thread: Bad plan for nested loop + limit

Bad plan for nested loop + limit

From

Alexander Staubo

Date:

14 February 2009, 21:25:15

This dinky little query takes about 4 seconds to run:

  select event_occurrences.*
  from event_occurrences
  join section_items on section_items.subject_id = event_occurrences.event_id
    and section_items.subject_type = 'Event'
    and section_items.sandbox_id = 16399
  where event_occurrences.start_time > '2009-02-14 18:15:14.739411 +0100'
  order by event_occurrences.start_time
  limit 4;

Output from "explain analyze":

 Limit  (cost=0.00..973.63 rows=4 width=48) (actual
time=61.554..4039.704 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
time=61.552..4039.700 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
(actual time=61.535..4039.682 rows=1 loops=1)
               ->  Index Scan using
index_event_occurrences_on_start_time on event_occurrences
(cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152
rows=155197 loops=1)
                     Index Cond: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
               ->  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0
loops=155197)
                     Index Cond: (((section_items.subject_type)::text
= 'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
                     Filter: (section_items.sandbox_id = 16399)
         ->  Index Scan using event_instances_pkey on events
(cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1
loops=1)
               Index Cond: (events.id = event_occurrences.event_id)
               Filter: (events.read_permissions = (-1))
 Total runtime: 4039.788 ms

Now, if I use "limit 50" it uses a plan that is several orders of
magnitude more efficient:

 Limit  (cost=6202.38..6202.51 rows=50 width=48) (actual
time=0.170..0.171 rows=1 loops=1)
   ->  Sort  (cost=6202.38..6203.20 rows=326 width=48) (actual
time=0.170..0.170 rows=1 loops=1)
         Sort Key: event_occurrences.start_time
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=5.09..6191.55 rows=326 width=48)
(actual time=0.160..0.161 rows=1 loops=1)
               ->  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7
loops=1)
                     Recheck Cond: (sandbox_id = 16399)
                     Filter: ((subject_type)::text = 'Event'::text)
                     ->  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.018..0.018 rows=7 loops=1)
                           Index Cond: (sandbox_id = 16399)
               ->  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0
loops=7)
                     Index Cond: (event_occurrences.event_id =
section_items.subject_id)
                     Filter: (event_occurrences.start_time >
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.210 ms

Similarly if I disable nested joins with "set enable_nestloop = off":

 Limit  (cost=10900.13..10900.14 rows=4 width=48) (actual
time=191.476..191.478 rows=1 loops=1)
   ->  Sort  (cost=10900.13..10900.95 rows=326 width=48) (actual
time=191.474..191.475 rows=1 loops=1)
         Sort Key: event_occurrences.start_time
         Sort Method:  quicksort  Memory: 25kB
         ->  Hash Join  (cost=8944.52..10895.24 rows=326 width=48)
(actual time=162.104..191.463 rows=1 loops=1)
               Hash Cond: (section_items.subject_id =
event_occurrences.event_id)
               ->  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7
loops=1)
                     Recheck Cond: (sandbox_id = 16399)
                     Filter: ((subject_type)::text = 'Event'::text)
                     ->  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.019..0.019 rows=7 loops=1)
                           Index Cond: (sandbox_id = 16399)
               ->  Hash  (cost=5580.54..5580.54 rows=157752 width=48)
(actual time=161.832..161.832 rows=155197 loops=1)
                     ->  Seq Scan on event_occurrences
(cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406
rows=155197 loops=1)
                           Filter: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 192.496 ms

Some statistics:

# # select attname, n_distinct from pg_stats where tablename =
'event_occurrences';
  attname   | n_distinct
------------+------------
 id         |         -1
 created_at |  -0.291615
 updated_at |  -0.294081
 created_by |        715
 updated_by |        715
 event_id   |       2146
 start_time |   -0.10047
 end_time   |       5602

# select attname, n_distinct from pg_stats where tablename = 'section_items';
       attname        | n_distinct
----------------------+------------
 id                   |         -1
 created_by           |       1612
 created_at           |  -0.708649
 updated_at           |   -0.83635
 updated_by           |       1190
 posted_at            |  -0.930831
 section_id           |        456
 sandbox_id           |        455
 reference            |          2
 subject_id           |  -0.546929
 subject_type         |          5
 conversation_id      |       1981
 read_permissions     |          8
 permission_policy_id |         11

Anything I can do to fix the query?

This is PostgreSQL 8.3.5. Standard planner configs. Before testing I
reindexed, vacuumed and analyzed the tables.

Alexander.

Re: Bad plan for nested loop + limit

From

David Wilson

Date:

15 February 2009, 03:29:56

On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote:
>
> Output from "explain analyze":
>
>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
> time=61.554..4039.704 rows=1 loops=1)
>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
> time=61.552..4039.700 rows=1 loops=1)
>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
> (actual time=61.535..4039.682 rows=1 loops=1)

Those estimates are pretty far off. Did you try increasing the
statistics target? Also, is the first query repeatable (that is, is it
already in cache when you do the test, or alternately, are all queries
*out* of cache when you test?)
--
- David T. Wilson
david.t.wilson@gmail.com

Re: Bad plan for nested loop + limit

From

Alexander Staubo

Date:

15 February 2009, 15:45:48

On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote:
> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote:
>>
>> Output from "explain analyze":
>>
>>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
>> time=61.554..4039.704 rows=1 loops=1)
>>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
>> time=61.552..4039.700 rows=1 loops=1)
>>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
>> (actual time=61.535..4039.682 rows=1 loops=1)
>
> Those estimates are pretty far off. Did you try increasing the
> statistics target? Also, is the first query repeatable (that is, is it
> already in cache when you do the test, or alternately, are all queries
> *out* of cache when you test?)

All in the cache when I do the test. Ok, so upping the statistics to
100 on section_items.subject_id fixed it:

 Limit  (cost=3530.95..3530.96 rows=4 width=48) (actual
time=0.107..0.107 rows=1 loops=1)
   ->  Sort  (cost=3530.95..3531.12 rows=66 width=48) (actual
time=0.106..0.106 rows=1 loops=1)
         Sort Key: event_occurrences.start_time
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=0.00..3529.96 rows=66 width=48)
(actual time=0.098..0.100 rows=1 loops=1)
               ->  Index Scan using index_section_items_on_sandbox_id
on section_items  (cost=0.00..104.29 rows=22 width=4) (actual
time=0.017..0.033 rows=7 loops=1)
                     Index Cond: (sandbox_id = 16399)
                     Filter: ((subject_type)::text = 'Event'::text)
               ->  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..154.79 rows=74 width=48) (actual time=0.008..0.008 rows=0
loops=7)
                     Index Cond: (event_occurrences.event_id =
section_items.subject_id)
                     Filter: (event_occurrences.start_time >
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.142 ms

Thanks.

Alexander.

Re: Bad plan for nested loop + limit

From

Alexander Staubo

Date:

27 February 2009, 19:18:52

On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex@bengler.no> wrote:
> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote:
>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote:
>>>
>>> Output from "explain analyze":
>>>
>>>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
>>> time=61.554..4039.704 rows=1 loops=1)
>>>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
>>> time=61.552..4039.700 rows=1 loops=1)
>>>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>
>> Those estimates are pretty far off. Did you try increasing the
>> statistics target? Also, is the first query repeatable (that is, is it
>> already in cache when you do the test, or alternately, are all queries
>> *out* of cache when you test?)

All right, this query keeps coming back to bite me. If this part of the join:

  ... and section_items.sandbox_id = 16399

yields a sufficiently large number of matches, then performance goes
'boink', like so:

 Limit  (cost=0.00..34.86 rows=4 width=48) (actual
time=4348.696..4348.696 rows=0 loops=1)
   ->  Nested Loop  (cost=0.00..60521.56 rows=6944 width=48) (actual
time=4348.695..4348.695 rows=0 loops=1)
         ->  Index Scan using index_event_occurrences_on_start_time on
event_occurrences  (cost=0.00..11965.38 rows=145712 width=48) (actual
time=0.093..138.029 rows=145108 loops=1)
               Index Cond: (start_time > '2009-02-27
18:01:14.739411+01'::timestamp with time zone)
         ->  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
loops=145108)
               Index Cond: (((section_items.subject_type)::text =
'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
               Filter: (section_items.sandbox_id = 9)
 Total runtime: 4348.777 ms

In this case:

# select count(*) from section_items where sandbox_id = 9;
 count
-------
  3126

If I remove the start_time > ... clause, performance is fine. Upping
the statistics setting on any of the columns involved seems to have no
effect.

Is this a pathological border case, or is there something I can do to
*generally* make this query run fast? Keep in mind that the query
itself returns no rows at all. I want to avoid doing an initial
"select count(...)" just to avoid the bad plan. Suffice to say, having
a web request take 5 seconds is asking too much from our users.

Alexander.

Re: Bad plan for nested loop + limit

From

Robert Haas

Date:

27 February 2009, 21:54:54

On Fri, Feb 27, 2009 at 3:18 PM, Alexander Staubo <alex@bengler.no> wrote:
> On Sun, Feb 15, 2009 at 5:45 PM, Alexander Staubo <alex@bengler.no> wrote:
>> On Sun, Feb 15, 2009 at 5:29 AM, David Wilson <david.t.wilson@gmail.com> wrote:
>>> On Sat, Feb 14, 2009 at 5:25 PM, Alexander Staubo <alex@bengler.no> wrote:
>>>>
>>>> Output from "explain analyze":
>>>>
>>>>  Limit  (cost=0.00..973.63 rows=4 width=48) (actual
>>>> time=61.554..4039.704 rows=1 loops=1)
>>>>   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
>>>> time=61.552..4039.700 rows=1 loops=1)
>>>>         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
>>>> (actual time=61.535..4039.682 rows=1 loops=1)
>>>
>>> Those estimates are pretty far off. Did you try increasing the
>>> statistics target? Also, is the first query repeatable (that is, is it
>>> already in cache when you do the test, or alternately, are all queries
>>> *out* of cache when you test?)
>
> All right, this query keeps coming back to bite me. If this part of the join:
>
>  ... and section_items.sandbox_id = 16399
>
> yields a sufficiently large number of matches, then performance goes
> 'boink', like so:
>
>  Limit  (cost=0.00..34.86 rows=4 width=48) (actual
> time=4348.696..4348.696 rows=0 loops=1)
>   ->  Nested Loop  (cost=0.00..60521.56 rows=6944 width=48) (actual
> time=4348.695..4348.695 rows=0 loops=1)
>         ->  Index Scan using index_event_occurrences_on_start_time on
> event_occurrences  (cost=0.00..11965.38 rows=145712 width=48) (actual
> time=0.093..138.029 rows=145108 loops=1)
>               Index Cond: (start_time > '2009-02-27
> 18:01:14.739411+01'::timestamp with time zone)
>         ->  Index Scan using
> index_section_items_on_subject_type_and_subject_id on section_items
> (cost=0.00..0.32 rows=1 width=4) (actual time=0.029..0.029 rows=0
> loops=145108)
>               Index Cond: (((section_items.subject_type)::text =
> 'Event'::text) AND (section_items.subject_id =
> event_occurrences.event_id))
>               Filter: (section_items.sandbox_id = 9)
>  Total runtime: 4348.777 ms
>
> In this case:
>
> # select count(*) from section_items where sandbox_id = 9;
>  count
> -------
>  3126
>
> If I remove the start_time > ... clause, performance is fine. Upping
> the statistics setting on any of the columns involved seems to have no
> effect.
>
> Is this a pathological border case, or is there something I can do to
> *generally* make this query run fast? Keep in mind that the query
> itself returns no rows at all. I want to avoid doing an initial
> "select count(...)" just to avoid the bad plan. Suffice to say, having
> a web request take 5 seconds is asking too much from our users.

The problem here is that the planner estimates the cost of a Limit
plan node by adding up (1) the startup cost of the underlying plan
node, in this case 0 for the nestjoin, and (2) a percentage of the run
cost, based on the ratio of the number of rows expected to be returned
to the total number of rows.  In this case, the nested loop is
expected to return 6944 rows, so it figures it won't have to get very
far to find the 4 you requested.

So when the LIMIT clause is a little bigger, or missing, the planner
tries to minimize the cost of the whole operation, whereas when the
limit is very small, it picks a plan that is much slower overall on
theory that it will be able to quit long before finishing the whole
thing.  When that turns out to be false, you get burned.

That means that the root cause of the problem is the fact that the
join is estimated to return hundreds or thousands of rows.  But it's
hard to think that you can make that estimate any better.  The
nestloop is expected to output 6944 rows, and the index scan on
event_occurrences is expected to return 145712 rows.  So the planner
knows that only a tiny fraction of the rows in event_occurrences are
going to have a match in section_items - it just doesn't think the
fraction is quite tiny enough to keep it from making a bad decision.

Interestingly, I think the solution Tom and I were talking about to
another problem in this area would make your case MUCH WORSE.

http://archives.postgresql.org/message-id/603c8f070902180908j3ae46774g535d96ece2c90e74@mail.gmail.com

I will think about this some more but nothing is occurring to me off
the top of my head.

...Robert

Re: Bad plan for nested loop + limit

From

Alexander Staubo

Date:

28 February 2009, 15:20:40

On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> The problem here is that the planner estimates the cost of a Limit
> plan node by adding up (1) the startup cost of the underlying plan
> node, in this case 0 for the nestjoin, and (2) a percentage of the run
> cost, based on the ratio of the number of rows expected to be returned
> to the total number of rows.  In this case, the nested loop is
> expected to return 6944 rows, so it figures it won't have to get very
> far to find the 4 you requested.
[...]
> I will think about this some more but nothing is occurring to me off
> the top of my head.

Thanks for explaining. Is there any way to rewrite the query in a way
that will avoid the nested loop join -- other than actually disabling
nested loop joins? If I do the latter, the resulting query uses a hash
join and completes in 80-100 ms, which is still pretty horrible,
especially for a query that returns nothing, but extremely auspicious
compared to the unthinkable 4-5 seconds for the current query.

Alexander.

Re: Bad plan for nested loop + limit

From

Robert Haas

Date:

01 March 2009, 02:33:18

On Sat, Feb 28, 2009 at 11:20 AM, Alexander Staubo <alex@bengler.no> wrote:
> On Fri, Feb 27, 2009 at 11:54 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>> The problem here is that the planner estimates the cost of a Limit
>> plan node by adding up (1) the startup cost of the underlying plan
>> node, in this case 0 for the nestjoin, and (2) a percentage of the run
>> cost, based on the ratio of the number of rows expected to be returned
>> to the total number of rows.  In this case, the nested loop is
>> expected to return 6944 rows, so it figures it won't have to get very
>> far to find the 4 you requested.
> [...]
>> I will think about this some more but nothing is occurring to me off
>> the top of my head.
>
> Thanks for explaining. Is there any way to rewrite the query in a way
> that will avoid the nested loop join -- other than actually disabling
> nested loop joins? If I do the latter, the resulting query uses a hash
> join and completes in 80-100 ms, which is still pretty horrible,
> especially for a query that returns nothing, but extremely auspicious
> compared to the unthinkable 4-5 seconds for the current query.

Can you post the schema for the two tables in question?  Feel free to
omit any columns that aren't included in the query, but make sure to
include any unique indices, etc.

What do you have default_statistics_target set to?  If it's less than
100, you should probably raise it to 100 and re-analyze (the default
value for 8.4 will be 100, but for 8.3 and prior it is 10).

What is the approximate total number of rows in each of these two
tables?  Of the rows in section_items, how many have subject_type =
'Event'?

...Robert

Re: Bad plan for nested loop + limit

From

Alexander Staubo

Date:

30 March 2009, 12:06:21

On Sun, Mar 1, 2009 at 4:32 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> What do you have default_statistics_target set to?  If it's less than
> 100, you should probably raise it to 100 and re-analyze (the default
> value for 8.4 will be 100, but for 8.3 and prior it is 10).

Changing it to 100 fixed the problem. Thanks for alerting me to the
existence of default_statistics_target.

Alexander.