sum of left join greater than its parts - Mailing list pgsql-performance
From | Robert Treat |
---|---|
Subject | sum of left join greater than its parts |
Date | |
Msg-id | 1137517669.28011.728.camel@camel Whole thread Raw |
Responses |
Re: sum of left join greater than its parts
|
List | pgsql-performance |
8.1.1, everything vacuumed/analyzed. basically i have two queries that when executed individually run quite quickly, but if I try to left join the second query onto the first, everything gets quite a bit slower. rms=# explain analyze rms-# SELECT rms-# software_download.* rms-# FROM rms-# ( rms(# SELECT rms(# host_id, max(mtime) as mtime rms(# FROM rms(# software_download rms(# WHERE rms(# bds_status_id not in (6,17,18) rms(# GROUP BY rms(# host_id, software_binary_id rms(# ) latest_download rms-# JOIN software_download using (host_id,mtime) rms-# JOIN software_binary b USING (software_binary_id) rms-# WHERE rms-# binary_type_id IN (3,5,6); QUERY PLAN ----------------------------------------------------------------------------------------------------------------------------------------- Hash Join (cost=870.00..992.56 rows=1 width=96) (actual time=90.566..125.782 rows=472 loops=1) Hash Cond: (("outer".host_id = "inner".host_id) AND ("outer"."?column2?" = "inner".mtime)) -> HashAggregate (cost=475.88..495.32 rows=1555 width=16) (actual time=51.300..70.761 rows=10870 loops=1) -> Seq Scan on software_download (cost=0.00..377.78 rows=13080 width=16) (actual time=0.010..23.700 rows=13167loops=1) Filter: ((bds_status_id <> 6) AND (bds_status_id <> 17) AND (bds_status_id <> 18)) -> Hash (cost=379.37..379.37 rows=2949 width=96) (actual time=39.167..39.167 rows=639 loops=1) -> Hash Join (cost=5.64..379.37 rows=2949 width=96) (actual time=0.185..37.808 rows=639 loops=1) Hash Cond: ("outer".software_binary_id = "inner".software_binary_id) -> Seq Scan on software_download (cost=0.00..277.16 rows=13416 width=96) (actual time=0.008..19.338 rows=13416loops=1) -> Hash (cost=5.59..5.59 rows=20 width=4) (actual time=0.149..0.149 rows=22 loops=1) -> Seq Scan on software_binary b (cost=0.00..5.59 rows=20 width=4) (actual time=0.011..0.108 rows=22loops=1) Filter: ((binary_type_id = 3) OR (binary_type_id = 5) OR (binary_type_id = 6)) Total runtime: 126.704 ms (13 rows) rms=# explain analyze rms-# SELECT rms-# entityid, rmsbinaryid, rmsbinaryid as software_binary_id, timestamp as downloaded,ia.host_id rms-# FROM rms-# (SELECT rms(# entityid, rmsbinaryid,max(msgid) as msgid rms(# FROM rms(# msg306u rms(# WHERE rms(# downloadstatus=1 rms(# GROUP BY entityid,rmsbinaryid rms(# ) a1 rms-# JOIN myapp_app ia on (entityid=myapp_app_id) rms-# JOIN rms-# (SELECT * rms(# FROM msg306u rms(# WHERE rms(# downloadstatus != 0 rms(# ) a2 USING(entityid,rmsbinaryid,msgid) rms-# ; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=1733.79..4620.38 rows=1 width=20) (actual time=81.160..89.826 rows=238 loops=1) -> Nested Loop (cost=1733.79..4615.92 rows=1 width=20) (actual time=81.142..86.826 rows=238 loops=1) Join Filter: ("outer".rmsbinaryid = "inner".rmsbinaryid) -> HashAggregate (cost=1733.79..1740.92 rows=570 width=12) (actual time=81.105..81.839 rows=323 loops=1) -> Bitmap Heap Scan on msg306u (cost=111.75..1540.65 rows=25752 width=12) (actual time=4.490..41.233 rows=25542loops=1) -> Bitmap Index Scan on rht3 (cost=0.00..111.75 rows=25752 width=0) (actual time=4.248..4.248 rows=25542loops=1) -> Index Scan using msg306u_entityid_msgid_idx on msg306u (cost=0.00..5.02 rows=1 width=20) (actual time=0.008..0.010rows=1 loops=323) Index Cond: (("outer".entityid = msg306u.entityid) AND ("outer"."?column3?" = msg306u.msgid)) Filter: (downloadstatus <> '0'::text) -> Index Scan using myapp_app_pkey on myapp_app ia (cost=0.00..4.44 rows=1 width=8) (actual time=0.006..0.007 rows=1loops=238) Index Cond: ("outer".entityid = ia.myapp_app_id) Total runtime: 90.270 ms (12 rows) and here are the two queries left joined together. rms=# explain analyze rms-# select * from ( rms(# SELECT rms(# software_download.* rms(# FROM rms(# ( rms(# SELECT rms(# host_id, max(mtime) as mtime rms(# FROM rms(# software_download rms(# WHERE rms(# bds_status_id not in (6,17,18) rms(# GROUP BY rms(# host_id, software_binary_id rms(# ) latest_download rms(# JOIN software_download using (host_id,mtime) rms(# JOIN software_binary b USING (software_binary_id) rms(# WHERE rms(# binary_type_id IN (3,5,6) rms(# ) ld rms-# LEFT JOIN rms-# (SELECT rms(# entityid, rmsbinaryid, rmsbinaryid as software_binary_id, timestamp as downloaded,ia.host_id rms(# FROM rms(# (SELECT rms(# entityid, rmsbinaryid,max(msgid) as msgid rms(# FROM rms(# msg306u rms(# WHERE rms(# downloadstatus=1 rms(# GROUP BY entityid,rmsbinaryid rms(# ) a1 rms(# JOIN myapp_app ia on (entityid=myapp_app_id) rms(# JOIN rms(# (SELECT * rms(# FROM msg306u rms(# WHERE rms(# downloadstatus != 0 rms(# ) a2 USING(entityid,rmsbinaryid,msgid) rms(# ) aa USING (host_id,software_binary_id); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Nested Loop Left Join (cost=2603.79..5612.95 rows=1 width=112) (actual time=181.988..4359.330 rows=472 loops=1) Join Filter: (("outer".host_id = "inner".host_id) AND ("outer".software_binary_id = "inner".rmsbinaryid)) -> Hash Join (cost=870.00..992.56 rows=1 width=96) (actual time=92.048..131.154 rows=472 loops=1) Hash Cond: (("outer".host_id = "inner".host_id) AND ("outer"."?column2?" = "inner".mtime)) -> HashAggregate (cost=475.88..495.32 rows=1555 width=16) (actual time=52.302..73.892 rows=10870 loops=1) -> Seq Scan on software_download (cost=0.00..377.78 rows=13080 width=16) (actual time=0.010..24.181 rows=13167loops=1) Filter: ((bds_status_id <> 6) AND (bds_status_id <> 17) AND (bds_status_id <> 18)) -> Hash (cost=379.37..379.37 rows=2949 width=96) (actual time=39.645..39.645 rows=639 loops=1) -> Hash Join (cost=5.64..379.37 rows=2949 width=96) (actual time=0.187..38.265 rows=639 loops=1) Hash Cond: ("outer".software_binary_id = "inner".software_binary_id) -> Seq Scan on software_download (cost=0.00..277.16 rows=13416 width=96) (actual time=0.008..19.905rows=13416 loops=1) -> Hash (cost=5.59..5.59 rows=20 width=4) (actual time=0.151..0.151 rows=22 loops=1) -> Seq Scan on software_binary b (cost=0.00..5.59 rows=20 width=4) (actual time=0.011..0.109rows=22 loops=1) Filter: ((binary_type_id = 3) OR (binary_type_id = 5) OR (binary_type_id = 6)) -> Nested Loop (cost=1733.79..4620.38 rows=1 width=20) (actual time=0.196..8.620 rows=238 loops=472) -> Nested Loop (cost=1733.79..4615.92 rows=1 width=16) (actual time=0.186..5.702 rows=238 loops=472) Join Filter: ("outer".rmsbinaryid = "inner".rmsbinaryid) -> HashAggregate (cost=1733.79..1740.92 rows=570 width=12) (actual time=0.173..0.886 rows=323 loops=472) -> Bitmap Heap Scan on msg306u (cost=111.75..1540.65 rows=25752 width=12) (actual time=4.372..41.248rows=25542 loops=1) -> Bitmap Index Scan on rht3 (cost=0.00..111.75 rows=25752 width=0) (actual time=4.129..4.129rows=25542 loops=1) -> Index Scan using msg306u_entityid_msgid_idx on msg306u (cost=0.00..5.02 rows=1 width=20) (actual time=0.008..0.010rows=1 loops=152456) Index Cond: (("outer".entityid = msg306u.entityid) AND ("outer"."?column3?" = msg306u.msgid)) Filter: (downloadstatus <> '0'::text) -> Index Scan using myapp_app_pkey on myapp_app ia (cost=0.00..4.44 rows=1 width=8) (actual time=0.005..0.007rows=1 loops=112336) Index Cond: ("outer".entityid = ia.myapp_app_id) Total runtime: 4360.552 ms (26 rows) istm this query should be able to return quite a bit faster, and setting enable_nestloop = off seems to back up this theory: rms=# explain analyze rms-# select * from ( rms(# SELECT rms(# software_download.* rms(# FROM rms(# ( rms(# SELECT rms(# host_id, max(mtime) as mtime rms(# FROM rms(# software_download rms(# WHERE rms(# bds_status_id not in (6,17,18) rms(# GROUP BY rms(# host_id, software_binary_id rms(# ) latest_download rms(# JOIN software_download using (host_id,mtime) rms(# JOIN software_binary b USING (software_binary_id) rms(# WHERE rms(# binary_type_id IN (3,5,6) rms(# ) ld rms-# LEFT JOIN rms-# (SELECT rms(# entityid, rmsbinaryid, rmsbinaryid as software_binary_id, timestamp as downloaded,ia.host_id rms(# FROM rms(# (SELECT rms(# entityid, rmsbinaryid,max(msgid) as msgid rms(# FROM rms(# msg306u rms(# WHERE rms(# downloadstatus=1 rms(# GROUP BY entityid,rmsbinaryid rms(# ) a1 rms(# JOIN myapp_app ia on (entityid=myapp_app_id) rms(# JOIN rms(# (SELECT * rms(# FROM msg306u rms(# WHERE rms(# downloadstatus != 0 rms(# ) a2 USING(entityid,rmsbinaryid,msgid) rms(# ) aa USING (host_id,software_binary_id); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Hash Left Join (cost=6976.52..7099.10 rows=1 width=112) (actual time=500.681..537.894 rows=472 loops=1) Hash Cond: (("outer".host_id = "inner".host_id) AND ("outer".software_binary_id = "inner".rmsbinaryid)) -> Hash Join (cost=870.00..992.56 rows=1 width=96) (actual time=91.738..127.423 rows=472 loops=1) Hash Cond: (("outer".host_id = "inner".host_id) AND ("outer"."?column2?" = "inner".mtime)) -> HashAggregate (cost=475.88..495.32 rows=1555 width=16) (actual time=52.025..71.872 rows=10870 loops=1) -> Seq Scan on software_download (cost=0.00..377.78 rows=13080 width=16) (actual time=0.009..23.959 rows=13167loops=1) Filter: ((bds_status_id <> 6) AND (bds_status_id <> 17) AND (bds_status_id <> 18)) -> Hash (cost=379.37..379.37 rows=2949 width=96) (actual time=39.612..39.612 rows=639 loops=1) -> Hash Join (cost=5.64..379.37 rows=2949 width=96) (actual time=0.183..38.220 rows=639 loops=1) Hash Cond: ("outer".software_binary_id = "inner".software_binary_id) -> Seq Scan on software_download (cost=0.00..277.16 rows=13416 width=96) (actual time=0.008..19.511rows=13416 loops=1) -> Hash (cost=5.59..5.59 rows=20 width=4) (actual time=0.147..0.147 rows=22 loops=1) -> Seq Scan on software_binary b (cost=0.00..5.59 rows=20 width=4) (actual time=0.011..0.108rows=22 loops=1) Filter: ((binary_type_id = 3) OR (binary_type_id = 5) OR (binary_type_id = 6)) -> Hash (cost=6106.52..6106.52 rows=1 width=20) (actual time=408.915..408.915 rows=238 loops=1) -> Merge Join (cost=5843.29..6106.52 rows=1 width=20) (actual time=338.516..408.477 rows=238 loops=1) Merge Cond: (("outer".rmsbinaryid = "inner".rmsbinaryid) AND ("outer".msgid = "inner".msgid) AND ("outer".entityid= "inner".entityid)) -> Sort (cost=1857.37..1858.80 rows=570 width=16) (actual time=88.816..89.179 rows=323 loops=1) Sort Key: a1.rmsbinaryid, a1.msgid, a1.entityid -> Hash Join (cost=1793.98..1831.28 rows=570 width=16) (actual time=86.452..88.074 rows=323 loops=1) Hash Cond: ("outer".entityid = "inner".myapp_app_id) -> HashAggregate (cost=1733.79..1740.92 rows=570 width=12) (actual time=80.772..81.320 rows=323loops=1) -> Bitmap Heap Scan on msg306u (cost=111.75..1540.65 rows=25752 width=12) (actual time=4.515..40.984rows=25542 loops=1) -> Bitmap Index Scan on rht3 (cost=0.00..111.75 rows=25752 width=0) (actual time=4.271..4.271rows=25542 loops=1) -> Hash (cost=55.95..55.95 rows=1695 width=8) (actual time=5.663..5.663 rows=1695 loops=1) -> Seq Scan on myapp_app ia (cost=0.00..55.95 rows=1695 width=8) (actual time=0.006..2.888rows=1695 loops=1) -> Sort (cost=3985.92..4050.30 rows=25752 width=20) (actual time=249.682..286.295 rows=25542 loops=1) Sort Key: public.msg306u.rmsbinaryid, public.msg306u.msgid, public.msg306u.entityid -> Seq Scan on msg306u (cost=0.00..1797.28 rows=25752 width=20) (actual time=0.010..80.572 rows=25542loops=1) Filter: (downloadstatus <> '0'::text) Total runtime: 540.284 ms (31 rows) i've been banging on this one off and on for awhile now with little progress, can someone explain why it is choosing the initial slower plan and/or how to get it to run something closer to the second faster plan? Robert Treat -- Build A Brighter Lamp :: Linux Apache {middleware} PostgreSQL
pgsql-performance by date: