Adding "LIMIT 1" kills performance. - Mailing list pgsql-performance
From | Chris Shoemaker |
---|---|
Subject | Adding "LIMIT 1" kills performance. |
Date | |
Msg-id | 20080529154734.GA18273@pe.Belkin Whole thread Raw |
Responses |
Re: Adding "LIMIT 1" kills performance.
|
List | pgsql-performance |
[Attn list-queue maintainers: Please drop the earlier version of this email that I accidentally sent from an unsubscribed address. ] Hi, I'm having a strange problem with a slow-running select query. The query I use in production ends in "LIMIT 1", and it runs very slowly. But when I remove the "LIMIT 1", the query runs quite quickly. This behavior has stumped a couple smart DBAs. The full queries and EXPLAIN ANALYZE plans are included below, but by way of explanation/observation: 1) The "LIMIT 1" case will sometimes be quicker (but still much slower than the non-"LIMIT 1" case) for different values of calendar_group_id. 2) The query below is a slightly simplified version of the one I actually use. The real one includes more conditions which explain why each table is joined. For reference, the original query is quoted at the end [1]. The original query exhibits the same behavior as the simplified versions w.r.t. the "LIMIT 1" case taking _much_ longer (even longer than the simplified version) than the non-"LIMIT 1" case, and uses the same plans. Can anyone explain why such a slow plan is chosen when the "LIMIT 1" is present? Is there anything I can do to speed this query up? Thanks. -chris production=> select version(); version ------------------------------------------------------------------------------ PostgreSQL 8.2.6 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (GCC) 4.1.2 (Gentoo 4.1.2 p1.0.2) (1 row) production=> analyze calendar_groups; ANALYZE production=> analyze calendar_links; ANALYZE production=> analyze calendars; ANALYZE production=> analyze event_updates; ANALYZE production=> EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC LIMIT 1; QUERY PLAN ----------------------------------------------------------------------------------------------------------- Limit (cost=16.55..91.73 rows=1 width=2752) (actual time=27810.058..27810.059 rows=1 loops=1) -> Nested Loop (cost=16.55..695694.18 rows=9254 width=2752) (actual time=27810.054..27810.054 rows=1 loops=1) Join Filter: (event_updates.feed_id = calendars.id) -> Index Scan Backward using event_updates_pkey on event_updates (cost=0.00..494429.30 rows=8944370 width=2752)(actual time=0.030..7452.142 rows=5135706 loops=1) -> Materialize (cost=16.55..16.56 rows=1 width=8) (actual time=0.001..0.002 rows=1 loops=5135706) -> Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=0.029..0.034 rows=1 loops=1) -> Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013 rows=1 loops=1) Index Cond: (calendar_group_id = 3640) -> Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.012..0.013rows=1 loops=1) Index Cond: (calendars.id = calendar_links.source_tracker_id) Total runtime: 27810.161 ms (11 rows) production=> EXPLAIN ANALYZE SELECT event_updates.* FROM event_updates INNER JOIN calendars ON event_updates.feed_id = calendars.id INNER JOIN calendar_links ON calendars.id = calendar_links.source_tracker_id WHERE (calendar_links.calendar_group_id = 3640) ORDER BY event_updates.id DESC; QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------- Sort (cost=43376.36..43399.50 rows=9256 width=2752) (actual time=10.178..10.205 rows=36 loops=1) Sort Key: event_updates.id -> Nested Loop (cost=249.86..31755.56 rows=9256 width=2752) (actual time=9.957..10.098 rows=36 loops=1) -> Nested Loop (cost=0.00..16.55 rows=1 width=8) (actual time=9.868..9.873 rows=1 loops=1) -> Index Scan using index_calendar_links_on_calendar_group_id_and_source_tracker_id on calendar_links (cost=0.00..8.27rows=1 width=4) (actual time=9.824..9.825 rows=1 loops=1) Index Cond: (calendar_group_id = 3640) -> Index Scan using harvest_trackers_pkey on calendars (cost=0.00..8.27 rows=1 width=4) (actual time=0.034..0.036rows=1 loops=1) Index Cond: (calendars.id = calendar_links.source_tracker_id) -> Bitmap Heap Scan on event_updates (cost=249.86..31623.01 rows=9280 width=2752) (actual time=0.080..0.138 rows=36loops=1) Recheck Cond: (event_updates.feed_id = calendars.id) -> Bitmap Index Scan on index_event_updates_on_feed_id_and_feed_type (cost=0.00..247.54 rows=9280 width=0)(actual time=0.056..0.056 rows=36 loops=1) Index Cond: (event_updates.feed_id = calendars.id) Total runtime: 10.337 ms (13 rows) --------- [1] The original, unsimplified query: SELECT event_updates.* FROM event_updates INNER JOIN calendars ON (event_updates.feed_id = calendars.id AND event_updates.feed_type = E'Calendar') INNER JOIN calendar_links ON (calendars.id = calendar_links.source_tracker_id AND calendars.type = E'SourceTracker') WHERE (calendar_links.calendar_group_id = 3640 AND calendars.deactivated_at IS NULL) ORDER BY event_updates.id DESC LIMIT 1
pgsql-performance by date: