Thread: Two fast queries get slow when combined
I have two small queries which are both very fast to evaluate separately. The first query, "Query 1", calculates some statistics and the the second query, "Query 2", finds a subset of relevant keys. When combined into a single query which calculates statistics from only the subset of relevant keys the evaluation plan explodes and uses both seq scans and bitmap heap scans. How can I improve the performance of the combined query? Queries and output from EXPLAIN ANALYZE can be seen here with some syntax highlighting: http://rafb.net/p/BJIW4p69.html I will also paste it here: ============================================================================= QUERY 1 (very *fast*): ============================================================================= SELECT keyId, count(1) as num_matches FROM stats GROUP BY keyId LIMIT 50 Limit (cost=0.00..23.65 rows=50 width=8) (actual time=0.090..2.312 rows=50 loops=1) -> GroupAggregate (cost=0.00..4687.46 rows=9912 width=8) (actual time=0.085..2.145 rows=50 loops=1) -> Index Scan using stats_keyId on stats (cost=0.00..3820.19 rows=99116 width=8) (actual time=0.031..1.016 rows=481 loops=1) Total runtime: 2.451 ms (4 rows) ============================================================================= QUERY 2 (very *fast*): ============================================================================= SELECT keyId, sortNum FROM items i WHERE sortNum > 123 ORDER BY sortNum LIMIT 50 Limit (cost=0.01..9.87 rows=50 width=8) (actual time=0.068..0.610 rows=50 loops=1) InitPlan -> Limit (cost=0.00..0.01 rows=1 width=0) (actual time=0.009..0.025 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.006..0.007 rows=1 loops=1) -> Index Scan using items_sortNum on items i (cost=0.00..1053.67 rows=5344 width=8) (actual time=0.063..0.455 rows=50 loops=1) Index Cond: (sortNum >= $0) Total runtime: 0.749 ms (7 rows) ============================================================================= COMBINED QUERY (very *slow*): ============================================================================= SELECT keyId, sortNum, count(1) FROM stats s, items i WHERE s.keyId = i.keyId AND i.sortNum > 123 GROUP BY i.keyId, i.sortNum ORDER BY i.sortNum LIMIT 50 Limit (cost=3281.72..3281.84 rows=50 width=16) (actual time=435.838..436.043 rows=50 loops=1) InitPlan -> Limit (cost=0.00..0.01 rows=1 width=0) (actual time=0.016..0.021 rows=1 loops=1) -> Result (cost=0.00..0.01 rows=1 width=0) (actual time=0.012..0.013 rows=1 loops=1) -> Sort (cost=3281.71..3289.97 rows=3304 width=16) (actual time=435.833..435.897 rows=50 loops=1) Sort Key: i.sortNum -> Hash Join (cost=2745.80..3088.59 rows=3304 width=16) (actual time=364.247..413.164 rows=8490 loops=1) Hash Cond: (s.keyId = i.keyId) -> HashAggregate (cost=2270.53..2394.43 rows=9912 width=8) (actual time=337.869..356.533 rows=9911 loops=1) -> Seq Scan on items (cost=0.00..1527.16 rows=99116 width=8) (actual time=0.016..148.118 rows=99116 loops=1) -> Hash (cost=408.47..408.47 rows=5344 width=12) (actual time=26.342..26.342 rows=4491 loops=1) -> Bitmap Heap Scan on items i (cost=121.67..408.47 rows=5344 width=12) (actual time=5.007..16.898 rows=4491 loops=1) Recheck Cond: (sortNum >= $0) -> Bitmap Index Scan on items_sortNum (cost=0.00..120.33 rows=5344 width=0) (actual time=4.273..4.273 rows=13375 loops=1) Index Cond: (sortNum >= $0) Total runtime: 436.421 ms (16 rows)
cluster wrote: > SELECT keyId, sortNum, count(1) > FROM stats s, items i > WHERE s.keyId = i.keyId AND i.sortNum > 123 > GROUP BY i.keyId, i.sortNum > ORDER BY i.sortNum > LIMIT 50 > > Limit (cost=3281.72..3281.84 rows=50 width=16) (actual > time=435.838..436.043 rows=50 loops=1) > InitPlan > -> Limit (cost=0.00..0.01 rows=1 width=0) (actual > time=0.016..0.021 rows=1 loops=1) > -> Result (cost=0.00..0.01 rows=1 width=0) (actual > time=0.012..0.013 rows=1 loops=1) > -> Sort (cost=3281.71..3289.97 rows=3304 width=16) (actual > time=435.833..435.897 rows=50 loops=1) > Sort Key: i.sortNum > -> Hash Join (cost=2745.80..3088.59 rows=3304 width=16) > (actual time=364.247..413.164 rows=8490 loops=1) > Hash Cond: (s.keyId = i.keyId) > -> HashAggregate (cost=2270.53..2394.43 rows=9912 > width=8) (actual time=337.869..356.533 rows=9911 loops=1) > -> Seq Scan on items (cost=0.00..1527.16 > rows=99116 width=8) (actual time=0.016..148.118 rows=99116 loops=1) > -> Hash (cost=408.47..408.47 rows=5344 width=12) > (actual time=26.342..26.342 rows=4491 loops=1) > -> Bitmap Heap Scan on items i > (cost=121.67..408.47 rows=5344 width=12) (actual time=5.007..16.898 > rows=4491 loops=1) > Recheck Cond: (sortNum >= $0) > -> Bitmap Index Scan on items_sortNum > (cost=0.00..120.33 rows=5344 width=0) (actual time=4.273..4.273 > rows=13375 loops=1) > Index Cond: (sortNum >= $0) > Total runtime: 436.421 ms > (16 rows) There's something odd about that plan. It's doing both a seq scan and a bitmap scan on "items", but I can't see stats table being mentioned anywhere. Looking at the row count, I believe that seq scan is actually on the stats table, not items like it says above. Is that really a verbatim copy of the output you got? Which version of Postgres is this? You could try rewriting the query like this: SELECT keyId, sortNum, (SELECT count(*) FROM stats s WHERE s.keyId = i.keyId) AS stats_cnt FROM items i WHERE i.sortNum > 123 ORDER BY sortNum LIMIT 50 -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
cluster <skrald@amossen.dk> writes: > Queries and output from EXPLAIN ANALYZE can be seen here with some > syntax highlighting: > http://rafb.net/p/BJIW4p69.html You are lying to us about how those queries were posed to Postgres (and no I don't feel a need to explain how I know). In future please present the full truth about what you are doing, not a simplification that you think is sufficient. But I think the short answer to your question is that query 1 is fast because it need only select the first 50 rows in some ordering, and query 2 is fast because it need only select the first 50 rows in some ordering, but they are not the same ordering so the join query doesn't get to exploit that shortcut. regards, tom lane
> There's something odd about that plan. It's doing both a seq scan and a > bitmap scan on "items", but I can't see stats table being mentioned > anywhere. Huh? Aaah, sorry. I made a major search/replace-refactoring (that obviously went wrong) on all open files in the editor before posting to this newsgroup, and one of these files was my preparation of the queries and planner output. Damn. Sorry for wasting your time! :-( However, your suggestion worked perfectly. Thanks a lot!
> You are lying to us about how those queries were posed to Postgres > (and no I don't feel a need to explain how I know). Sorry. The "lying" was not intended as explained in my reply to Heikku. Thanks for the tips anyways.