pgsql: Fix hashjoin memory balancing logic - Mailing list pgsql-committers

From Tomas Vondra
Subject pgsql: Fix hashjoin memory balancing logic
Date
Msg-id E1v9r4I-002BwH-30@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Fix hashjoin memory balancing logic

Commit a1b4f289beec improved the hashjoin sizing to also consider the
memory used by BufFiles for batches. The code however had multiple
issues, making it ineffective or not working as expected in some cases.

* The amount of memory needed by buffers was calculated using uint32,
  so it would overflow for nbatch >= 262144. If this happened the loop
  would exit prematurely and the memory usage would not be reduced.

  The nbatch overflow is fixed by reworking the condition to not use a
  multiplication at all, so there's no risk of overflow. An explicit
  cast was added to a similar calculation in ExecHashIncreaseBatchSize.

* The loop adjusting the nbatch value used hash_table_bytes to calculate
  the old/new size, but then updated only space_allowed. The consequence
  is the total memory usage was not reduced, but all the memory saved by
  reducing the number of batches was used for the internal hash table.

  This was fixed by using only space_allowed. This is also more correct,
  because hash_table_bytes does not account for skew buckets.

* The code was also doubling multiple parameters (e.g. the number of
  buckets for hash table), but was missing overflow protections.

  The loop now checks for overflow, and terminates if needed. It'd be
  possible to cap the value and continue the loop, but it's not worth
  the complexity. And the overflow implies the in-memory hash table is
  already very large anyway.

While at it, rework the comment explaining how the memory balancing
works, to make it more concise and easier to understand.

The initial nbatch overflow issue was reported by Vaibhav Jain. The
other issues were noticed by me and Melanie Plageman. Fix by me, with a
lot of review and feedback by Melanie.

Backpatch to 18, where the hashjoin memory balancing was introduced.

Reported-by: Vaibhav Jain <jainva@google.com>
Reviewed-by: Melanie Plageman <melanieplageman@gmail.com>
Backpatch-through: 18
Discussion: https://postgr.es/m/CABa-Az174YvfFq7rLS+VNKaQyg7inA2exvPWmPWqnEn6Ditr_Q@mail.gmail.com

Branch
------
REL_18_STABLE

Details
-------
https://git.postgresql.org/pg/commitdiff/aa151022ec13f6a24f70c30dbfb9a5747db620e8

Modified Files
--------------
src/backend/executor/nodeHash.c | 122 +++++++++++++++++++++-------------------
1 file changed, 64 insertions(+), 58 deletions(-)


pgsql-committers by date:

Previous
From: Tomas Vondra
Date:
Subject: pgsql: Fix hashjoin memory balancing logic
Next
From: David Rowley
Date:
Subject: pgsql: Englishify comment wording