Thread: Help speeding up a left join aggregate

Help speeding up a left join aggregate

From
Nick
Date:
I have a pretty well tuned setup, with appropriate indexes and 16GB of
available RAM. Should this be taking this long? I forced it to not use
a sequential scan and that only knocked a second off the plan.

                                                                QUERY
PLAN

------------------------------------------------------------------------------------------------------------------------------------------
 Hash Right Join  (cost=105882.35..105882.47 rows=3 width=118) (actual
time=3931.567..3931.583 rows=4 loops=1)
   Hash Cond: (songs_downloaded.advertisement_id = a.id)
   ->  HashAggregate  (cost=105881.21..105881.26 rows=4 width=13)
(actual time=3931.484..3931.489 rows=3 loops=1)
         ->  Seq Scan on songs_downloaded  (cost=0.00..95455.96
rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
loops=1)
               Filter: (advertiser_id = 6553406)
   ->  Hash  (cost=1.10..1.10 rows=3 width=46) (actual
time=0.050..0.050 rows=4 loops=1)
         Buckets: 1024  Batches: 1  Memory Usage: 1kB
         ->  Seq Scan on advertisements a  (cost=0.00..1.10 rows=3
width=46) (actual time=0.037..0.041 rows=4 loops=1)
               Filter: (advertiser_id = 6553406)
 Total runtime: 3931.808 ms
(10 rows)

SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
COALESCE(sd.download_revenue,0) AS download_revenue
FROM advertisements a
LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE
advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id =
sd.advertisement_id
WHERE advertiser_id = 6553406

Re: Help speeding up a left join aggregate

From
Alban Hertroys
Date:
On 31 Jan 2012, at 4:55, Nick wrote:

> I have a pretty well tuned setup, with appropriate indexes and 16GB of
> available RAM. Should this be taking this long? I forced it to not use
> a sequential scan and that only knocked a second off the plan.
>
>                                                                QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------
> Hash Right Join  (cost=105882.35..105882.47 rows=3 width=118) (actual
> time=3931.567..3931.583 rows=4 loops=1)
>   Hash Cond: (songs_downloaded.advertisement_id = a.id)
>   ->  HashAggregate  (cost=105881.21..105881.26 rows=4 width=13)
> (actual time=3931.484..3931.489 rows=3 loops=1)
>         ->  Seq Scan on songs_downloaded  (cost=0.00..95455.96
> rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
> loops=1)
>               Filter: (advertiser_id = 6553406)
>   ->  Hash  (cost=1.10..1.10 rows=3 width=46) (actual
> time=0.050..0.050 rows=4 loops=1)
>         Buckets: 1024  Batches: 1  Memory Usage: 1kB
>         ->  Seq Scan on advertisements a  (cost=0.00..1.10 rows=3
> width=46) (actual time=0.037..0.041 rows=4 loops=1)
>               Filter: (advertiser_id = 6553406)
> Total runtime: 3931.808 ms
> (10 rows)

I bet the group by query would be far more selective on advertisement_id than on the advertiser_id it's currently
using,wouldn't it? 
Perhaps the query planner chooses the wrong filter here because the advertiser_id is in the inner query, while the
advertisement_idis outside it. You could try and see what happens if you move the advertiser_id into the join
condition:

SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
COALESCE(sd.download_revenue,0) AS download_revenue
FROM advertisements a
LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
download_revenue, COUNT(1) AS downloads FROM songs_downloaded
 GROUP BY advertisement_id) AS sd ON a.id =
sd.advertisement_id AND a.advertiser_id = sd.advertiser_id
WHERE advertiser_id = 6553406

Alban Hertroys

--
Screwing up is an excellent way to attach something to the ceiling.


Re: Help speeding up a left join aggregate

From
Volodymyr Kostyrko
Date:
Nick wrote:
> I have a pretty well tuned setup, with appropriate indexes and 16GB of
> available RAM. Should this be taking this long? I forced it to not use
> a sequential scan and that only knocked a second off the plan.
>
>                                                                  QUERY
> PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------
>   Hash Right Join  (cost=105882.35..105882.47 rows=3 width=118) (actual
> time=3931.567..3931.583 rows=4 loops=1)
>     Hash Cond: (songs_downloaded.advertisement_id = a.id)
>     ->   HashAggregate  (cost=105881.21..105881.26 rows=4 width=13)
> (actual time=3931.484..3931.489 rows=3 loops=1)
>           ->   Seq Scan on songs_downloaded  (cost=0.00..95455.96
> rows=1042525 width=13) (actual time=0.071..1833.680 rows=1034752
> loops=1)
>                 Filter: (advertiser_id = 6553406)
>     ->   Hash  (cost=1.10..1.10 rows=3 width=46) (actual
> time=0.050..0.050 rows=4 loops=1)
>           Buckets: 1024  Batches: 1  Memory Usage: 1kB
>           ->   Seq Scan on advertisements a  (cost=0.00..1.10 rows=3
> width=46) (actual time=0.037..0.041 rows=4 loops=1)
>                 Filter: (advertiser_id = 6553406)
>   Total runtime: 3931.808 ms
> (10 rows)

What indexes do you have? Can you show some?

I bet you need something like (advertiser_id, advertisement_id), because
plain index would not be sorted right.

> SELECT a.id, sd.price, COALESCE(sd.downloads,0) AS downloads,
> COALESCE(sd.download_revenue,0) AS download_revenue
> FROM advertisements a
> LEFT JOIN (SELECT advertisement_id, AVG(price) AS price, SUM(price) AS
> download_revenue, COUNT(1) AS downloads FROM songs_downloaded WHERE
> advertiser_id = 6553406 GROUP BY advertisement_id) AS sd ON a.id =
> sd.advertisement_id
> WHERE advertiser_id = 6553406

--
Sphinx of black quartz judge my vow.