On Thu, Dec 12, 2019 at 06:10:50PM -0800, Jeff Davis wrote:
>On Thu, 2019-11-28 at 18:46 +0100, Tomas Vondra wrote:
>> 13) As for this:
>>
>> /* make sure that we don't exhaust the hash bits */
>> if (partition_bits + input_bits >= 32)
>> partition_bits = 32 - input_bits;
>>
>> We already ran into this issue (exhausting bits in a hash value) in
>> hashjoin batching, we should be careful to use the same approach in
>> both
>> places (not the same code, just general approach).
>
>I assume you're talking about ExecHashIncreaseNumBatches(), and in
>particular, commit 8442317b. But that's a 10-year-old commit, so
>perhaps you're talking about something else?
>
>It looks like that code in HJ is protecting against having a very large
>number of batches, such that we can't allocate an array of pointers for
>each batch. And it seems like the concern is more related to a planner
>error causing such a large nbatch.
>
>I don't quite see the analogous case in HashAgg. npartitions is already
>constrained to a maximum of 256. And the batches are individually
>allocated, held in a list, not an array.
>
>It could perhaps use some defensive programming to make sure that we
>don't run into problems if the max is set very high.
>
>Can you clarify what you're looking for here?
>
I'm talking about this recent discussion on pgsql-bugs:
https://www.postgresql.org/message-id/CA%2BhUKGLyafKXBMFqZCSeYikPbdYURbwr%2BjP6TAy8sY-8LO0V%2BQ%40mail.gmail.com
I.e. when number of batches/partitions and buckets is high enough, we
may end up with very few bits in one of the parts.
>Perhaps I can also add a comment saying that we can have less than
>HASH_MIN_PARTITIONS when running out of bits.
>
Maybe.
regards
--
Tomas Vondra http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services