Bad plan for nested loop + limit - Mailing list pgsql-performance

From Alexander Staubo
Subject Bad plan for nested loop + limit
Date
Msg-id 88daf38c0902141425s6972d849i9dad904f55d93c77@mail.gmail.com
Whole thread Raw
Responses Re: Bad plan for nested loop + limit
List pgsql-performance
This dinky little query takes about 4 seconds to run:

  select event_occurrences.*
  from event_occurrences
  join section_items on section_items.subject_id = event_occurrences.event_id
    and section_items.subject_type = 'Event'
    and section_items.sandbox_id = 16399
  where event_occurrences.start_time > '2009-02-14 18:15:14.739411 +0100'
  order by event_occurrences.start_time
  limit 4;

Output from "explain analyze":

 Limit  (cost=0.00..973.63 rows=4 width=48) (actual
time=61.554..4039.704 rows=1 loops=1)
   ->  Nested Loop  (cost=0.00..70101.65 rows=288 width=48) (actual
time=61.552..4039.700 rows=1 loops=1)
         ->  Nested Loop  (cost=0.00..68247.77 rows=297 width=52)
(actual time=61.535..4039.682 rows=1 loops=1)
               ->  Index Scan using
index_event_occurrences_on_start_time on event_occurrences
(cost=0.00..13975.01 rows=159718 width=48) (actual time=0.024..398.152
rows=155197 loops=1)
                     Index Cond: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
               ->  Index Scan using
index_section_items_on_subject_type_and_subject_id on section_items
(cost=0.00..0.33 rows=1 width=4) (actual time=0.023..0.023 rows=0
loops=155197)
                     Index Cond: (((section_items.subject_type)::text
= 'Event'::text) AND (section_items.subject_id =
event_occurrences.event_id))
                     Filter: (section_items.sandbox_id = 16399)
         ->  Index Scan using event_instances_pkey on events
(cost=0.00..6.23 rows=1 width=4) (actual time=0.014..0.015 rows=1
loops=1)
               Index Cond: (events.id = event_occurrences.event_id)
               Filter: (events.read_permissions = (-1))
 Total runtime: 4039.788 ms

Now, if I use "limit 50" it uses a plan that is several orders of
magnitude more efficient:

 Limit  (cost=6202.38..6202.51 rows=50 width=48) (actual
time=0.170..0.171 rows=1 loops=1)
   ->  Sort  (cost=6202.38..6203.20 rows=326 width=48) (actual
time=0.170..0.170 rows=1 loops=1)
         Sort Key: event_occurrences.start_time
         Sort Method:  quicksort  Memory: 25kB
         ->  Nested Loop  (cost=5.09..6191.55 rows=326 width=48)
(actual time=0.160..0.161 rows=1 loops=1)
               ->  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.024..0.087 rows=7
loops=1)
                     Recheck Cond: (sandbox_id = 16399)
                     Filter: ((subject_type)::text = 'Event'::text)
                     ->  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.018..0.018 rows=7 loops=1)
                           Index Cond: (sandbox_id = 16399)
               ->  Index Scan using
index_event_occurrences_on_event_id on event_occurrences
(cost=0.00..60.14 rows=74 width=48) (actual time=0.010..0.010 rows=0
loops=7)
                     Index Cond: (event_occurrences.event_id =
section_items.subject_id)
                     Filter: (event_occurrences.start_time >
'2009-02-14 18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 0.210 ms

Similarly if I disable nested joins with "set enable_nestloop = off":

 Limit  (cost=10900.13..10900.14 rows=4 width=48) (actual
time=191.476..191.478 rows=1 loops=1)
   ->  Sort  (cost=10900.13..10900.95 rows=326 width=48) (actual
time=191.474..191.475 rows=1 loops=1)
         Sort Key: event_occurrences.start_time
         Sort Method:  quicksort  Memory: 25kB
         ->  Hash Join  (cost=8944.52..10895.24 rows=326 width=48)
(actual time=162.104..191.463 rows=1 loops=1)
               Hash Cond: (section_items.subject_id =
event_occurrences.event_id)
               ->  Bitmap Heap Scan on section_items
(cost=5.09..328.94 rows=96 width=4) (actual time=0.026..0.050 rows=7
loops=1)
                     Recheck Cond: (sandbox_id = 16399)
                     Filter: ((subject_type)::text = 'Event'::text)
                     ->  Bitmap Index Scan on
index_section_items_on_sandbox_id  (cost=0.00..5.06 rows=107 width=0)
(actual time=0.019..0.019 rows=7 loops=1)
                           Index Cond: (sandbox_id = 16399)
               ->  Hash  (cost=5580.54..5580.54 rows=157752 width=48)
(actual time=161.832..161.832 rows=155197 loops=1)
                     ->  Seq Scan on event_occurrences
(cost=0.00..5580.54 rows=157752 width=48) (actual time=0.030..75.406
rows=155197 loops=1)
                           Filter: (start_time > '2009-02-14
18:15:14.739411+01'::timestamp with time zone)
 Total runtime: 192.496 ms

Some statistics:

# # select attname, n_distinct from pg_stats where tablename =
'event_occurrences';
  attname   | n_distinct
------------+------------
 id         |         -1
 created_at |  -0.291615
 updated_at |  -0.294081
 created_by |        715
 updated_by |        715
 event_id   |       2146
 start_time |   -0.10047
 end_time   |       5602

# select attname, n_distinct from pg_stats where tablename = 'section_items';
       attname        | n_distinct
----------------------+------------
 id                   |         -1
 created_by           |       1612
 created_at           |  -0.708649
 updated_at           |   -0.83635
 updated_by           |       1190
 posted_at            |  -0.930831
 section_id           |        456
 sandbox_id           |        455
 reference            |          2
 subject_id           |  -0.546929
 subject_type         |          5
 conversation_id      |       1981
 read_permissions     |          8
 permission_policy_id |         11

Anything I can do to fix the query?

This is PostgreSQL 8.3.5. Standard planner configs. Before testing I
reindexed, vacuumed and analyzed the tables.

Alexander.

pgsql-performance by date:

Previous
From: Octavio Alvarez
Date:
Subject: Re: Slow queries from information schema
Next
From: David Wilson
Date:
Subject: Re: Bad plan for nested loop + limit