Adding "LIMIT 1" kills performance. - Mailing list pgsql-performance

From Chris Shoemaker
Subject Adding "LIMIT 1" kills performance.
Date
Msg-id 20080529154734.GA18273@pe.Belkin
Whole thread Raw
Responses Re: Adding "LIMIT 1" kills performance.  (Shane Ambler <pgsql@Sheeky.Biz>)
List pgsql-performance
[Attn list-queue maintainers: Please drop the earlier version
of this email that I accidentally sent from an unsubscribed address. ]

Hi,

I'm having a strange problem with a slow-running select query.  The
query I use in production ends in "LIMIT 1", and it runs very slowly.
But when I remove the "LIMIT 1", the query runs quite quickly.  This
behavior has stumped a couple smart DBAs.

The full queries and EXPLAIN ANALYZE plans are included below, but by
way of explanation/observation:

1) The "LIMIT 1" case will sometimes be quicker (but still much slower
than the non-"LIMIT 1" case) for different values of
calendar_group_id.

2) The query below is a slightly simplified version of the one I
actually use.  The real one includes more conditions which explain why
each table is joined.  For reference, the original query is quoted at
the end [1].  The original query exhibits the same behavior as the
simplified versions w.r.t. the "LIMIT 1" case taking _much_ longer
(even longer than the simplified version) than the non-"LIMIT 1" case,
and uses the same plans.


Can anyone explain why such a slow plan is chosen when the "LIMIT 1"
is present?  Is there anything I can do to speed this query up?
Thanks.

-chris


production=> select version();
                                                      version
------------------------------------------------------------------------------
 PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2)
(1 row)

production=> analyze calendar_groups;
ANALYZE
production=> analyze calendar_links;
ANALYZE
production=> analyze calendars;
ANALYZE
production=> analyze event_updates;
ANALYZE
production=> EXPLAIN ANALYZE SELECT event_updates.*
             FROM event_updates
             INNER JOIN calendars ON event_updates.feed_id = calendars.id
             INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
             WHERE (calendar_links.calendar_group_id = 3640)
             ORDER BY event_updates.id DESC
             LIMIT 1;
                                                                                               QUERY PLAN
                                                                                
-----------------------------------------------------------------------------------------------------------
 Limit  (cost=16.55..91.73 rows=1 width=2752) (actual time=27810.058..27810.059 rows=1 loops=1)
   ->  Nested Loop  (cost=16.55..695694.18 rows=9254 width=2752) (actual time=27810.054..27810.054 rows=1 loops=1)
         Join Filter: (event_updates.feed_id = calendars.id)
         ->  Index Scan Backward using event_updates_pkey on event_updates  (cost=0.00..494429.30 rows=8944370
width=2752)(actual time=0.030..7452.142 rows=5135706 loops=1) 
         ->  Materialize  (cost=16.55..16.56 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=5135706)
               ->  Nested Loop  (cost=0.00..16.55 rows=1 width=8) (actual time=0.029..0.034 rows=1 loops=1)
                     ->  Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on
calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) 
                           Index Cond: (calendar_group_id = 3640)
                     ->  Index Scan using harvest_trackers_pkey on calendars  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.012..0.013rows=1 loops=1) 
                           Index Cond: (calendars.id = calendar_links.source_tracker_id)
 Total runtime: 27810.161 ms
(11 rows)

production=> EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates
             INNER JOIN calendars ON event_updates.feed_id = calendars.id
             INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id
             WHERE (calendar_links.calendar_group_id = 3640)
             ORDER BY event_updates.id DESC;
                                                                                            QUERY PLAN
                                                                          

-----------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=43376.36..43399.50 rows=9256 width=2752) (actual time=10.178..10.205 rows=36 loops=1)
   Sort Key: event_updates.id
   ->  Nested Loop  (cost=249.86..31755.56 rows=9256 width=2752) (actual time=9.957..10.098 rows=36 loops=1)
         ->  Nested Loop  (cost=0.00..16.55 rows=1 width=8) (actual time=9.868..9.873 rows=1 loops=1)
               ->  Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links
(cost=0.00..8.27rows=1 width=4) (actual time=9.824..9.825 rows=1 loops=1) 
                     Index Cond: (calendar_group_id = 3640)
               ->  Index Scan using harvest_trackers_pkey on calendars  (cost=0.00..8.27 rows=1 width=4) (actual
time=0.034..0.036rows=1 loops=1) 
                     Index Cond: (calendars.id = calendar_links.source_tracker_id)
         ->  Bitmap Heap Scan on event_updates  (cost=249.86..31623.01 rows=9280 width=2752) (actual time=0.080..0.138
rows=36loops=1) 
               Recheck Cond: (event_updates.feed_id = calendars.id)
               ->  Bitmap Index Scan on index_event_updates_on_feed_id_and_feed_type  (cost=0.00..247.54 rows=9280
width=0)(actual time=0.056..0.056 rows=36 loops=1) 
                     Index Cond: (event_updates.feed_id = calendars.id)
 Total runtime: 10.337 ms
(13 rows)



---------
[1] The original, unsimplified query:
SELECT event_updates.* FROM event_updates
INNER JOIN calendars ON (event_updates.feed_id = calendars.id AND event_updates.feed_type = E'Calendar')
INNER JOIN calendar_links ON (calendars.id = calendar_links.source_tracker_id AND calendars.type = E'SourceTracker')
WHERE (calendar_links.calendar_group_id = 3640 AND calendars.deactivated_at IS NULL)
ORDER BY event_updates.id DESC
LIMIT 1

pgsql-performance by date:

Previous
From: "Joshua D. Drake"
Date:
Subject: Re: 2GB or not 2GB
Next
From: Shane Ambler
Date:
Subject: Re: Adding "LIMIT 1" kills performance.