Thread: Additional size of hash table is alway zero for hash aggregates
Hi hacker,
When reading the grouping sets codes, I find that the additional size of the hash table for hash aggregates is always zero, this seems to be incorrect to me, attached a patch to fix it, please help to check.
Thanks,
Pengzhou
Attachment
Hi, On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote: > When reading the grouping sets codes, I find that the additional size of > the hash table for hash aggregates is always zero, this seems to be > incorrect to me, attached a patch to fix it, please help to check. Indeed, that's incorrect. Causes the number of buckets for the hashtable to be set higher - the size is just used for that. I'm a bit wary of changing this in the stable branches - could cause performance changes? Greetings, Andres Freund
On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote: > On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote: > > When reading the grouping sets codes, I find that the additional size of > > the hash table for hash aggregates is always zero, this seems to be > > incorrect to me, attached a patch to fix it, please help to check. > > Indeed, that's incorrect. Causes the number of buckets for the hashtable > to be set higher - the size is just used for that. I'm a bit wary of > changing this in the stable branches - could cause performance changes? I found that it was working when Andres implemented TupleHashTable, but broke at: | b5635948ab Support hashed aggregation with grouping sets. So affects v11 and v12. entrysize isn't used for anything else. -- Justin
>>>>> "Justin" == Justin Pryzby <pryzby@telsasoft.com> writes: > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote: >> Indeed, that's incorrect. Causes the number of buckets for the >> hashtable to be set higher - the size is just used for that. I'm a >> bit wary of changing this in the stable branches - could cause >> performance changes? I think (offhand, not tested) that the number of buckets would only be affected if the (planner-supplied) numGroups value would cause work_mem to be exceeded; the planner doesn't plan a hashagg at all in that case unless forced to (grouping by a hashable but not sortable column). Note that for various reasons the planner tends to over-estimate the memory requirement anyway. Or maybe if work_mem had been reduced between plan time and execution time.... So this is unlikely to be causing any issue in practice, so backpatching may not be called for. I'll deal with it in HEAD only, unless someone else has a burning desire to take it. -- Andrew (irc:RhodiumToad)
Hi, On 2020-03-13 00:34:22 +0000, Andrew Gierth wrote: > >>>>> "Justin" == Justin Pryzby <pryzby@telsasoft.com> writes: > > > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote: > >> Indeed, that's incorrect. Causes the number of buckets for the > >> hashtable to be set higher - the size is just used for that. I'm a > >> bit wary of changing this in the stable branches - could cause > >> performance changes? > > I think (offhand, not tested) that the number of buckets would only be > affected if the (planner-supplied) numGroups value would cause work_mem > to be exceeded; the planner doesn't plan a hashagg at all in that case > unless forced to (grouping by a hashable but not sortable column). Note > that for various reasons the planner tends to over-estimate the memory > requirement anyway. That's a good point. > Or maybe if work_mem had been reduced between plan time and execution > time.... > > So this is unlikely to be causing any issue in practice, so backpatching > may not be called for. Sounds sane to me. > I'll deal with it in HEAD only, unless someone else has a burning desire > to take it. Feel free. I wonder if we should just remove the parameter though? I'm not sure there's much point in having it, given it's just callers filling ->additionalstate. And the nbuckets is passed in externally anyway - so there needs to have been a memory sizing determination previously anyway? The other users just specify 0 already. Greetings, Andres Freund
On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:
> When reading the grouping sets codes, I find that the additional size of
> the hash table for hash aggregates is always zero, this seems to be
> incorrect to me, attached a patch to fix it, please help to check.
Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?
thanks for confirming this.
On Fri, Mar 13, 2020 at 8:34 AM Andrew Gierth <andrew@tao11.riddles.org.uk> wrote:
>>>>> "Justin" == Justin Pryzby <pryzby@telsasoft.com> writes:
> On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote:
>> Indeed, that's incorrect. Causes the number of buckets for the
>> hashtable to be set higher - the size is just used for that. I'm a
>> bit wary of changing this in the stable branches - could cause
>> performance changes?
I think (offhand, not tested) that the number of buckets would only be
affected if the (planner-supplied) numGroups value would cause work_mem
to be exceeded; the planner doesn't plan a hashagg at all in that case
unless forced to (grouping by a hashable but not sortable column). Note
that for various reasons the planner tends to over-estimate the memory
requirement anyway.
That makes sense, thanks
Thanks, Andres Freund and Andres Gierth.
To be related, can I invite you to help to review the parallel grouping sets
patches? It will be very great to hear some comments from you since you
contributed most of the codes for grouping sets.
the thread is https://www.postgresql.org/message-id/CAG4reAQ8rFCc%2Bi0oju3VjaW7xSOJAkvLrqa4F-NYZzAG4SW7iQ%40mail.gmail.com
Thanks,
Pengzhou
On Fri, Mar 13, 2020 at 3:16 AM Andres Freund <andres@anarazel.de> wrote:
Hi,
On 2020-03-12 16:35:15 +0800, Pengzhou Tang wrote:
> When reading the grouping sets codes, I find that the additional size of
> the hash table for hash aggregates is always zero, this seems to be
> incorrect to me, attached a patch to fix it, please help to check.
Indeed, that's incorrect. Causes the number of buckets for the hashtable
to be set higher - the size is just used for that. I'm a bit wary of
changing this in the stable branches - could cause performance changes?
Greetings,
Andres Freund
On Fri, 2020-03-13 at 00:34 +0000, Andrew Gierth wrote: > > > > > > "Justin" == Justin Pryzby <pryzby@telsasoft.com> writes: > > > On Thu, Mar 12, 2020 at 12:16:26PM -0700, Andres Freund wrote: > >> Indeed, that's incorrect. Causes the number of buckets for the > >> hashtable to be set higher - the size is just used for that. I'm > a It's also used to set the 'entrysize' field of the TupleHashTable, which doesn't appear to be used for anything? Maybe we should just remove that field... it confused me for a moment as I was looking into this. > >> bit wary of changing this in the stable branches - could cause > >> performance changes? > > I think (offhand, not tested) that the number of buckets would only > be > affected if the (planner-supplied) numGroups value would cause > work_mem > to be exceeded; the planner doesn't plan a hashagg at all in that > Now that we have Disk-based HashAgg, which already tries to choose the number of buckets with work_mem in mind; and no other caller specifies non-zero additionalsize, why not just get rid of that argument completely? It can still sanity check against work_mem for the sake of other callers. But it doesn't need 'additionalsize' to do so. Or, we can keep the 'additionalsize' argument but put it to work store the AggStatePerGroupData inline in the hash table. That would allow us to remove the 'additional' pointer from TupleHashEntryData, saving 8 bytes plus the chunk header for every group. That sounds very tempting. If we want to get even more clever, we could try to squish AggStatePerGroupData into 8 bytes by putting the flags (transValueIsNull and noTransValue) into unused bits of the Datum. That would work if the transtype is by-ref (low bits if pointer will be unused), or if the type's size is less than 8, or if the particular aggregate doesn't need either of those booleans. It would get messy, but saving 8 bytes per group is non-trivial. Regards, Jeff Davis
Hi, On 2020-03-21 17:45:31 -0700, Jeff Davis wrote: > Or, we can keep the 'additionalsize' argument but put it to work store > the AggStatePerGroupData inline in the hash table. That would allow us > to remove the 'additional' pointer from TupleHashEntryData, saving 8 > bytes plus the chunk header for every group. That sounds very tempting. I don't see how? That'd require making the hash bucket addressing deal with variable sizes, which'd be bad for performance reasons. Since there can be a aggstate->numtrans AggStatePerGroupDatas for each hash table entry, I don't see how to avoid a variable size? > If we want to get even more clever, we could try to squish > AggStatePerGroupData into 8 bytes by putting the flags > (transValueIsNull and noTransValue) into unused bits of the Datum. > That would work if the transtype is by-ref (low bits if pointer will > be unused), or if the type's size is less than 8, or if the particular > aggregate doesn't need either of those booleans. It would get messy, > but saving 8 bytes per group is non-trivial. I'm somewhat doubtful it's worth going for those per-type optimizations - the wins don't seem large enough, relative to other per-group space needs. Also adds additional instructions to fetching those values... If we want to optimize memory usage, I think I'd first go for allocating the group's "firstTuple" together with all the AggStatePerGroupDatas. Greetings, Andres Freund
On Sat, 2020-03-21 at 18:26 -0700, Andres Freund wrote: > I don't see how? That'd require making the hash bucket addressing > deal > with variable sizes, which'd be bad for performance reasons. Since > there > can be a aggstate->numtrans AggStatePerGroupDatas for each hash table > entry, I don't see how to avoid a variable size? It would not vary for a given hash table. Do you mean the compile-time specialization (of simplehash.h) would not work any more? If we aren't storing the "additional" inline in the hash entry, I don't see any purpose for the argument to BuildTupleHashTableExt(), nor the purpose of the "entrysize" field of TupleHashTableData. > If we want to optimize memory usage, I think I'd first go for > allocating > the group's "firstTuple" together with all the AggStatePerGroupDatas. That's a good idea. Regards, Jeff Davis
Hi, On 2020-03-23 13:29:02 -0700, Jeff Davis wrote: > On Sat, 2020-03-21 at 18:26 -0700, Andres Freund wrote: > > I don't see how? That'd require making the hash bucket addressing > > deal > > with variable sizes, which'd be bad for performance reasons. Since > > there > > can be a aggstate->numtrans AggStatePerGroupDatas for each hash table > > entry, I don't see how to avoid a variable size? > > It would not vary for a given hash table. Do you mean the compile-time > specialization (of simplehash.h) would not work any more? Yes. > If we aren't storing the "additional" inline in the hash entry, I don't > see any purpose for the argument to BuildTupleHashTableExt(), nor the > purpose of the "entrysize" field of TupleHashTableData. Yea, that was my conclusion too. It looked like Andrew was going to commit a fix, but that hasn't happened yet. Greetings, Andres Freund