pgsql: Parallel Hash Full Join. - Mailing list pgsql-committers

From Thomas Munro
Subject pgsql: Parallel Hash Full Join.
Date
Msg-id E1pi11w-000XW9-0X@gemulon.postgresql.org
Whole thread Raw
List pgsql-committers
Parallel Hash Full Join.

Full and right outer joins were not supported in the initial
implementation of Parallel Hash Join because of deadlock hazards (see
discussion).  Therefore FULL JOIN inhibited parallelism, as the other
join strategies can't do that in parallel either.

Add a new PHJ phase PHJ_BATCH_SCAN that scans for unmatched tuples on
the inner side of one batch's hash table.  For now, sidestep the
deadlock problem by terminating parallelism there.  The last process to
arrive at that phase emits the unmatched tuples, while others detach and
are free to go and work on other batches, if there are any, but
otherwise they finish the join early.

That unfairness is considered acceptable for now, because it's better
than no parallelism at all.  The build and probe phases are run in
parallel, and the new scan-for-unmatched phase, while serial, is usually
applied to the smaller of the two relations and is either limited by
some multiple of work_mem, or it's too big and is partitioned into
batches and then the situation is improved by batch-level parallelism.

Author: Melanie Plageman <melanieplageman@gmail.com>
Author: Thomas Munro <thomas.munro@gmail.com>
Reviewed-by: Thomas Munro <thomas.munro@gmail.com>
Discussion: https://postgr.es/m/CA%2BhUKG%2BA6ftXPz4oe92%2Bx8Er%2BxpGZqto70-Q_ERwRaSyA%3DafNg%40mail.gmail.com

Branch
------
master

Details
-------
https://git.postgresql.org/pg/commitdiff/11c2d6fdf5af1aacec9ca2005543f1b0fc4cc364

Modified Files
--------------
src/backend/executor/nodeHash.c         | 175 +++++++++++++++++++++++++++++++-
src/backend/executor/nodeHashjoin.c     |  81 ++++++++++-----
src/backend/optimizer/path/joinpath.c   |  14 ++-
src/include/executor/hashjoin.h         |   6 +-
src/include/executor/nodeHash.h         |   3 +
src/test/regress/expected/join_hash.out |  65 +++++++++++-
src/test/regress/sql/join_hash.sql      |  27 ++++-
7 files changed, 323 insertions(+), 48 deletions(-)


pgsql-committers by date:

Previous
From: Andres Freund
Date:
Subject: pgsql: pg_stat_wal: Accumulate time as instr_time instead of microsecon
Next
From: David Rowley
Date:
Subject: pgsql: Fix List memory issue in transformColumnDefinition