Tomas Vondra <tv@fuzzy.cz> writes:
> On 20.8.2013 18:24, Tom Lane wrote:
>> No, I don't think so. I'm pretty sure the reason
>> choose_hashed_distinct is like that is that I subconsciously assumed
>> hash_agg_entry_size would produce zero for numAggs = 0; but in fact
>> it does not and should not, because there's still some overhead for
>> the per-group hash entry whether or not there's any aggregates. So
>> the right fix is that choose_hashed_distinct should add
>> hash_agg_entry_size(0) onto its hashentrysize estimate.
> Hmmm. I think the main 'issue' here is that the queries behave quite
> differently although it seems like they should do the same thing (well,
> I understand they're not the same).
They are the same, assuming we choose to use hashed grouping for both.
It's somewhat unfortunate that the planner treats work_mem as a hard
boundary; if the estimated memory requirement is 1 byte over the limit,
it will not consider a hashed aggregation, period. It might be better if
we applied some kind of sliding penalty. On the other hand, given that
the calculation depends on an ndistinct estimate that's frequently pretty
bad, it doesn't seem wise to me to have a calculation that's encouraging
use of hashed aggregation by underestimating the space needed per row;
and the current code in choose_hashed_distinct is definitely doing that.
A quick experiment (grouping a single float8 row) says that the actual
space consumption per group is about 80 bytes, using HEAD on a 64-bit
machine. This compares to choose_hashed_grouping's estimate of 56 bytes
and choose_hashed_distinct's estimate of 24. I think the remaining
discrepancy is because the estimation code isn't allowing for palloc
overhead --- the actual space for each representative tuple is really
more than MAXALIGN(data_width) + MAXALIGN(sizeof(tuple_header)).
I'm not entirely sure if we should add in the palloc overhead, but
I am pretty sure that choose_hashed_distinct isn't doing anyone any
favors by being wrong by a factor of 3.
> Anyway, I still don't understand why the same logic around
> hash_agg_entry_size should not apply to choose_hashed_grouping as well?
> Well, it would make it slower in this particular corner case, but
> wouldn't it be more correct?
choose_hashed_grouping has it right, or at least more nearly right.
choose_hashed_distinct is simply failing to account for space that
will in fact be consumed. Not fixing that is not a good way to
deal with inaccurate number-of-groups estimates; if that estimate
is low rather than high, the consequences will be a lot worse than
they are here.
regards, tom lane