Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3 - Mailing list pgsql-general

From Tom Lane
Subject Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3
Date
Msg-id 27373.1044991037@sss.pgh.pa.us
Whole thread Raw
In response to Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3  (Greg Stark <gsstark@mit.edu>)
Responses Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3  (Greg Stark <gsstark@mit.edu>)
List pgsql-general
Greg Stark <gsstark@mit.edu> writes:
> I don't see how you're ever going to reliably come up with a good
> estimate for this.

Well, it'll probably never be really good, but it's only been a few
weeks that we've had code that tried to do it at all; I'm not prepared
to write off the concept without even trying.  And I see it's not doing
that badly in your example, now that you've ANALYZEd --- 124 estimated
groups vs 31 actual is well within what would make me happy.

But with only 124 estimated groups, it's certainly not the size of the
hashtable that's dissuading it from hashing.  [ Looks at code... ]
Oh, here's the problem:

             * Executor doesn't support hashed aggregation with DISTINCT
             * aggregates.  (Doing so would imply storing *all* the input
             * values in the hash table, which seems like a certain loser.)

The count(distinct) you've got in there turns it off.

> If the prediction is wrong is it just a performance penalty? The hash can
> still proceed if it has to go to disk?

Long as you don't run out of swap space, sure ;-).  So the estimate only
really has to be right within a factor of (swap space)/sort_mem.

            regards, tom lane

pgsql-general by date:

Previous
From: Greg Stark
Date:
Subject: Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3
Next
From: Greg Stark
Date:
Subject: Re: Is Hash Agg being used? 7.4 seems to handle this query worse than 7.3