Thread: Replace hashtable growEnable flag

Replace hashtable growEnable flag

From
Hubert Zhang
Date:
Hi all,

When we build hash table for a hash join node etc., we split tuples into different hash buckets. Since tuples could not all be held in memory. Postgres splits each bucket into batches, only the current batch of bucket is in memory while other batches are written to disk.

During ExecHashTableInsert(), if the memory cost exceeds the operator allowed limit(hashtable->spaceAllowed), batches will be split on the fly by calling ExecHashIncreaseNumBatches().

In past, if data is distributed unevenly, the split of batch may failed(All the tuples falls into one split batch and the other batch is empty) Then Postgres will set hashtable->growEnable to false. And never expand batch number any more.

If tuples become diverse in future, spliting batch is still valuable and could avoid the current batch become too big and finally OOM.

To fix this, we introduce a penalty on hashtable->spaceAllowed, which is the threshold to determine whether to increase batch number.
If batch split failed, we increase the penalty instead of just turn off the growEnable flag.

Any comments?


--
Thanks

Hubert Zhang
Attachment

Re: Replace hashtable growEnable flag

From
Tomas Vondra
Date:
On Wed, May 15, 2019 at 06:19:38PM +0800, Hubert Zhang wrote:
>Hi all,
>
>When we build hash table for a hash join node etc., we split tuples into
>different hash buckets. Since tuples could not all be held in memory.
>Postgres splits each bucket into batches, only the current batch of bucket
>is in memory while other batches are written to disk.
>
>During ExecHashTableInsert(), if the memory cost exceeds the operator
>allowed limit(hashtable->spaceAllowed), batches will be split on the fly by
>calling ExecHashIncreaseNumBatches().
>
>In past, if data is distributed unevenly, the split of batch may failed(All
>the tuples falls into one split batch and the other batch is empty) Then
>Postgres will set hashtable->growEnable to false. And never expand batch
>number any more.
>
>If tuples become diverse in future, spliting batch is still valuable and
>could avoid the current batch become too big and finally OOM.
>
>To fix this, we introduce a penalty on hashtable->spaceAllowed, which is
>the threshold to determine whether to increase batch number.
>If batch split failed, we increase the penalty instead of just turn off the
>growEnable flag.
>
>Any comments?
>

There's already another thread discussing various issues with how hashjoin
increases the number of batches, including various issues with how/when we
disable adding more batches.

    https://commitfest.postgresql.org/23/2109/

In general I think you're right something like this is necessary, but I
think we may need to rethink growEnable a bit more.

For example, the way you implemented it, after reaching the increased
limit, we just increase the number of batches just like today, and then
decide whether it actually helped. But that means we double the number of
BufFile entries, which uses more and more memory (because each is 8kB and
we need 1 per batch). I think in this case (after increasing the limit) we
should check whether increasing batches makes sense or not. And only do it
if it helps. Otherwise we'll double the amount of memory for BufFile(s)
and also the work_mem. That's not a good idea.

But as I said, there are other issues discussed on the other thread. For
example we only disable the growth when all rows fall into the same batch.
But that's overly strict.


regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services




Re: Replace hashtable growEnable flag

From
Hubert Zhang
Date:
Thanks Tomas.
I will follow this problem on your thread. This thread could be terminated.

On Thu, May 16, 2019 at 3:58 AM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, May 15, 2019 at 06:19:38PM +0800, Hubert Zhang wrote:
>Hi all,
>
>When we build hash table for a hash join node etc., we split tuples into
>different hash buckets. Since tuples could not all be held in memory.
>Postgres splits each bucket into batches, only the current batch of bucket
>is in memory while other batches are written to disk.
>
>During ExecHashTableInsert(), if the memory cost exceeds the operator
>allowed limit(hashtable->spaceAllowed), batches will be split on the fly by
>calling ExecHashIncreaseNumBatches().
>
>In past, if data is distributed unevenly, the split of batch may failed(All
>the tuples falls into one split batch and the other batch is empty) Then
>Postgres will set hashtable->growEnable to false. And never expand batch
>number any more.
>
>If tuples become diverse in future, spliting batch is still valuable and
>could avoid the current batch become too big and finally OOM.
>
>To fix this, we introduce a penalty on hashtable->spaceAllowed, which is
>the threshold to determine whether to increase batch number.
>If batch split failed, we increase the penalty instead of just turn off the
>growEnable flag.
>
>Any comments?
>

There's already another thread discussing various issues with how hashjoin
increases the number of batches, including various issues with how/when we
disable adding more batches.

    https://commitfest.postgresql.org/23/2109/

In general I think you're right something like this is necessary, but I
think we may need to rethink growEnable a bit more.

For example, the way you implemented it, after reaching the increased
limit, we just increase the number of batches just like today, and then
decide whether it actually helped. But that means we double the number of
BufFile entries, which uses more and more memory (because each is 8kB and
we need 1 per batch). I think in this case (after increasing the limit) we
should check whether increasing batches makes sense or not. And only do it
if it helps. Otherwise we'll double the amount of memory for BufFile(s)
and also the work_mem. That's not a good idea.

But as I said, there are other issues discussed on the other thread. For
example we only disable the growth when all rows fall into the same batch.
But that's overly strict.


regards

--
Tomas Vondra                  https://urldefense.proofpoint.com/v2/url?u=http-3A__www.2ndQuadrant.com&d=DwIBAg&c=lnl9vOaLMzsy2niBC8-h_K-7QJuNJEsFrzdndhuJ3Sw&r=lz-kpGdw_rtpgYV2ho3DjDSB5Psxis_b-3VZKON7K7c&m=y2bI6_b4EPRd9aTQGv9Pio3c_ZtCWs_jzKd4t8CtJEI&s=XHLORM8U7I6XR_EDkgSFtJDvhxIVd2rDA7r-xvJa278&e=
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



--
Thanks

Hubert Zhang