Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3 - Mailing list pgsql-general

From Greg Stark
Subject Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3
Date
Msg-id 87u1fapx6k.fsf@stark.dyndns.tv
Whole thread Raw
In response to Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Greg Stark <gsstark@mit.edu> writes:
> > Hm, CVS doesn't seem to be using a hash aggregate. At least, if it is it isn't
> > obvious from the plan.
>
> > SELECT hier.level_0_id as parent_id,
> >        (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
> >        *
> >   FROM hier LEFT OUTER JOIN (
> >         SELECT distinct level_0_id, level_1_id
> >           FROM cache_foo JOIN foo_hier USING (foo_id)
> >          WHERE key_value = 839
> >            AND dist < 60
> >      ) AS cache ON (hier.hier_id = cache.level_1_id)
> >  WHERE level = 1
> >  ORDER BY 1,2
>
> Why would you expect hash aggregation to be used here?  There's no
> aggregates ... nor even any GROUP BY.

Well, "SELECT distinct level_0_id, level_1_id"  is equivalent to a GROUP BY
level_0_id, level_1_id.

Um, I think I grabbed the wrong query from the logs though, sorry. Here's a
better example from the actual code, in fact I think it's what the above query
turned into after more work.

There's only a small decrease in speed from 7.3 to CVS now, but I was hoping
for a big speed increase from hash aggregates since most of the time is being
sunk into that sort. But it definitely isn't using them. I guess TNSTAAFL.


SELECT hier.level_0_id as parent_id,
       (select localized_text from localized_text where text_id = hier.short_name_text_id and lang_code = 'en') as
name,
       *
  FROM hier LEFT OUTER JOIN (
        SELECT min(dist) AS mindist, count(distinct foo_id) AS num_foos, level_1_id
          FROM cache_foos JOIN foo_hier USING (foo_id)
         WHERE key_id = 839
           AND dist < 60
         GROUP BY level_0_id, level_1_id
     ) AS cache ON (hier.hier_id = cache.level_1_id)
 WHERE level = 1
 ORDER BY level_0_id;

CVS:

                                                                                   QUERY PLAN
                                                         

---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=3936.87..3937.06 rows=76 width=1596) (actual time=1047.13..1047.16 rows=31 loops=1)
   Sort Key: hier.level_0_id
   ->  Hash Left Join  (cost=2989.02..3934.50 rows=76 width=1596) (actual time=1044.90..1046.87 rows=31 loops=1)
         Hash Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Seq Scan on hier  (cost=0.00..63.86 rows=14 width=1576) (actual time=7.92..8.13 rows=31 loops=1)
               Filter: ("level" = 1)
         ->  Hash  (cost=2951.21..2951.21 rows=15122 width=24) (actual time=1033.78..1033.78 rows=0 loops=1)
               ->  Subquery Scan "cache"  (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.66..1033.60
rows=31loops=1) 
                     ->  GroupAggregate  (cost=2686.58..2951.21 rows=15122 width=24) (actual time=917.64..1033.40
rows=31loops=1) 
                           ->  Sort  (cost=2686.58..2724.38 rows=15122 width=24) (actual time=913.00..931.40 rows=16440
loops=1)
                                 Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                 ->  Merge Join  (cost=123.56..1636.78 rows=15122 width=24) (actual time=280.80..779.05
rows=16440loops=1) 
                                       Merge Cond: ("outer".foo_id = "inner".foo_id)
                                       ->  Index Scan using foo_hier_foo on foo_hier  (cost=0.00..1173.54 rows=45140
width=12)(actual time=0.04..225.13 rows=45140 loops=1) 
                                       ->  Sort  (cost=123.56..123.73 rows=67 width=12) (actual time=280.69..302.62
rows=16441loops=1) 
                                             Sort Key: cache_foos.foo_id
                                             ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..121.53
rows=67width=12) (actual time=0.05..128.19 rows=16486 loops=1) 
                                                   Index Cond: ((key_id = 839) AND (dist < 60::double precision))
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.01 rows=1 width=516) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 1058.63 msec


7.3:

                                                                                     QUERY PLAN
                                                            

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=4103.84..4103.92 rows=31 width=97) (actual time=1033.79..1033.82 rows=31 loops=1)
   Sort Key: hier.level_0_id
   ->  Merge Join  (cost=4094.32..4103.08 rows=31 width=97) (actual time=1031.62..1033.54 rows=31 loops=1)
         Merge Cond: ("outer".hier_id = "inner".level_1_id)
         ->  Sort  (cost=64.63..64.71 rows=31 width=77) (actual time=7.92..7.96 rows=31 loops=1)
               Sort Key: hier.hier_id
               ->  Seq Scan on hier  (cost=0.00..63.86 rows=31 width=77) (actual time=7.25..7.77 rows=31 loops=1)
                     Filter: ("level" = 1)
         ->  Sort  (cost=4029.69..4033.87 rows=1674 width=24) (actual time=1023.54..1023.58 rows=31 loops=1)
               Sort Key: "cache".level_1_id
               ->  Subquery Scan "cache"  (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.88..1023.35
rows=31loops=1) 
                     ->  Aggregate  (cost=3730.75..3940.04 rows=1674 width=24) (actual time=829.86..1023.14 rows=31
loops=1)
                           ->  Group  (cost=3730.75..3856.32 rows=16743 width=24) (actual time=822.88..940.44
rows=16440loops=1) 
                                 ->  Sort  (cost=3730.75..3772.61 rows=16743 width=24) (actual time=822.86..841.47
rows=16440loops=1) 
                                       Sort Key: foo_hier.level_0_id, foo_hier.level_1_id
                                       ->  Hash Join  (cost=1410.87..2556.15 rows=16743 width=24) (actual
time=347.17..662.63rows=16440 loops=1) 
                                             Hash Cond: ("outer".foo_id = "inner".foo_id)
                                             ->  Index Scan using idx_cache_foos on cache_foos  (cost=0.00..810.43
rows=16743width=12) (actual time=0.07..152.56 rows=16486 loops=1) 
                                                   Index Cond: ((key_id = 839) AND (dist < 60::double precision))
                                             ->  Hash  (cost=746.40..746.40 rows=45140 width=12) (actual
time=345.53..345.53rows=0 loops=1) 
                                                   ->  Seq Scan on foo_hier  (cost=0.00..746.40 rows=45140 width=12)
(actualtime=0.06..213.59 rows=45140 loops=1) 
         SubPlan
           ->  Index Scan using localized_text_pkey on localized_text  (cost=0.00..4.03 rows=1 width=17) (actual
time=0.03..0.03rows=1 loops=31) 
                 Index Cond: ((text_id = $0) AND (lang_code = 'en'::bpchar))
 Total runtime: 1039.57 msec





--
greg

pgsql-general by date:

Previous
From: Phil Howard
Date:
Subject: mailing list archives
Next
From: Tom Lane
Date:
Subject: Re: Fw: Priority against catalog