pgsql: Improve hash join's handling of tuples with null join keys. - Mailing list pgsql-committers

From Tom Lane
Subject pgsql: Improve hash join's handling of tuples with null join keys.
Date
Msg-id E1w3IwB-000XHJ-3A@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Improve hash join's handling of tuples with null join keys.

In a plain join, we can just summarily discard an input tuple
with null join key(s), since it cannot match anything from
the other side of the join (assuming a strict join operator).
However, if the tuple comes from the outer side of an outer join
then we have to emit it with null-extension of the other side.

Up to now, hash joins did that by inserting the tuple into the hash
table as though it were a normal tuple.  This is unnecessarily
inefficient though, since the required processing is far simpler than
for a potentially-matchable tuple.  Worse, if there are a lot of such
tuples they will bloat the hash bucket they go into, possibly causing
useless repeated attempts to split that bucket or increase the number
of batches.  We have a report of a large join vainly creating many
thousands of batches when faced with such input.

This patch improves the situation by keeping such tuples out of the
hash table altogether, instead pushing them into a separate tuplestore
from which we return them later.  (One might consider trying to return
them immediately; but that would require substantial refactoring, and
it doesn't work anyway for cases where we rescan an unmodified hash
table.)  This works even in parallel hash joins, because whichever
worker reads a null-keyed tuple can just return it; there's no need
for consultation with other workers.  Thus the tuplestores are local
storage even in a parallel join.

A pre-existing buglet that I noticed while analyzing the code's
behavior is that ExecHashRemoveNextSkewBucket fails to decrement
hashtable->skewTuples for tuples moved into the main hash table
from the skew hash table.  This invalidates ExecHashTableInsert's
calculation of the number of main-hash-table tuples, though probably
not by a lot since we expect the skew table to be small relative
to the main one.  Nonetheless, let's fix that too while we're here.

Bug: #18909
Reported-by: Sergey Koposov <Sergey.Koposov@ed.ac.uk>
Author: Tom Lane <tgl@sss.pgh.pa.us>
Reviewed-by: Chao Li <li.evan.chao@gmail.com>
Discussion: https://postgr.es/m/3061845.1746486714@sss.pgh.pa.us

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/1811f1af98fb237fdd5adb588cd4b57c433b75f8

Modified Files
--------------
src/backend/executor/execExpr.c         |  22 +--
src/backend/executor/nodeHash.c         |  96 ++++++++---
src/backend/executor/nodeHashjoin.c     | 283 +++++++++++++++++++++++++++-----
src/backend/utils/sort/tuplestore.c     |  32 ++++
src/include/executor/executor.h         |   2 +-
src/include/executor/hashjoin.h         |  22 ++-
src/include/executor/nodeHash.h         |   1 +
src/include/nodes/execnodes.h           |   8 +
src/include/utils/tuplestore.h          |   3 +
src/test/regress/expected/join.out      |   7 +-
src/test/regress/expected/join_hash.out |  15 +-
src/test/regress/sql/join.sql           |   4 +-
src/test/regress/sql/join_hash.sql      |   1 +
13 files changed, 414 insertions(+), 82 deletions(-)


pgsql-committers by date:

Previous
From: Tom Lane
Date:
Subject: pgsql: Avoid leaking duplicated file descriptors in corner cases.
Next
From: Masahiko Sawada
Date:
Subject: pgsql: Allow explicit casting between bytea and uuid.