Thread: Help speeding up a left join aggregate
I have a pretty well tuned setup, with appropriate indexes and 16GB of available RAM. Should this be taking this long? I forced it to not use a sequential scan and that only knocked a second off the plan. QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------ Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual time=3931.567..3931.583 rows=4 loops=1) Hash Cond: (songs_downloaded.advertisement_id = a.id) -> HashAggregate (cost=105881.21..105881.26 rows=4 width=13) (actual time=3931.484..3931.489 rows=3 loops=1) -> Seq Scan on songs_downloaded (cost=0.00..95455.96 rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752 loops=1) Filter: (advertiser_id = 6553406) -> Hash (cost=1.10..1.10 rows=3 width=46) (actual time=0.050..0.050 rows=4 loops=1) Buckets: 1024 Batches: 1 Memory Usage: 1kB -> Seq Scan on advertisements a (cost=0.00..1.10 rows=3 width=46) (actual time=0.037..0.041 rows=4 loops=1) Filter: (advertiser_id = 6553406) Total runtime: 3931.808 ms (10 rows) SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads, COALESCE(sd.download_revenue,0) AS download_revenue FROM advertisements a LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id = sd.advertisement_id WHERE advertiser_id = 6553406
On 31 Jan 2012, at 4:55, Nick wrote: > I have a pretty well tuned setup, with appropriate indexes and 16GB of > available RAM. Should this be taking this long? I forced it to not use > a sequential scan and that only knocked a second off the plan. > > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------ > Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual > time=3931.567..3931.583 rows=4 loops=1) > Hash Cond: (songs_downloaded.advertisement_id = a.id) > -> HashAggregate (cost=105881.21..105881.26 rows=4 width=13) > (actual time=3931.484..3931.489 rows=3 loops=1) > -> Seq Scan on songs_downloaded (cost=0.00..95455.96 > rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752 > loops=1) > Filter: (advertiser_id = 6553406) > -> Hash (cost=1.10..1.10 rows=3 width=46) (actual > time=0.050..0.050 rows=4 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> Seq Scan on advertisements a (cost=0.00..1.10 rows=3 > width=46) (actual time=0.037..0.041 rows=4 loops=1) > Filter: (advertiser_id = 6553406) > Total runtime: 3931.808 ms > (10 rows) I bet the group by query would be far more selective on advertisement_id than on the advertiser_id it's currently using,wouldn't it? Perhaps the query planner chooses the wrong filter here because the advertiser_id is in the inner query, while the advertisement_idis outside it. You could try and see what happens if you move the advertiser_id into the join condition: SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads, COALESCE(sd.download_revenue,0) AS download_revenue FROM advertisements a LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS download_revenue, COUNT(1) AS downloads FROM songs_downloaded GROUP BY advertisement_id) AS sd ON a.id = sd.advertisement_id AND a.advertiser_id = sd.advertiser_id WHERE advertiser_id = 6553406 Alban Hertroys -- Screwing up is an excellent way to attach something to the ceiling.
Nick wrote: > I have a pretty well tuned setup, with appropriate indexes and 16GB of > available RAM. Should this be taking this long? I forced it to not use > a sequential scan and that only knocked a second off the plan. > > QUERY > PLAN > ------------------------------------------------------------------------------------------------------------------------------------------ > Hash Right Join (cost=105882.35..105882.47 rows=3 width=118) (actual > time=3931.567..3931.583 rows=4 loops=1) > Hash Cond: (songs_downloaded.advertisement_id = a.id) > -> HashAggregate (cost=105881.21..105881.26 rows=4 width=13) > (actual time=3931.484..3931.489 rows=3 loops=1) > -> Seq Scan on songs_downloaded (cost=0.00..95455.96 > rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752 > loops=1) > Filter: (advertiser_id = 6553406) > -> Hash (cost=1.10..1.10 rows=3 width=46) (actual > time=0.050..0.050 rows=4 loops=1) > Buckets: 1024 Batches: 1 Memory Usage: 1kB > -> Seq Scan on advertisements a (cost=0.00..1.10 rows=3 > width=46) (actual time=0.037..0.041 rows=4 loops=1) > Filter: (advertiser_id = 6553406) > Total runtime: 3931.808 ms > (10 rows) What indexes do you have? Can you show some? I bet you need something like (advertiser_id, advertisement_id), because plain index would not be sorted right. > SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads, > COALESCE(sd.download_revenue,0) AS download_revenue > FROM advertisements a > LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS > download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE > advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id = > sd.advertisement_id > WHERE advertiser_id = 6553406 -- Sphinx of black quartz judge my vow.