Re: Support run-time partition pruning for hash join - Mailing list pgsql-hackers

From Richard Guo
Subject Re: Support run-time partition pruning for hash join
Date
Msg-id CAMbWs4_1sqDXMV_qWup_GknNpL7qL9PKTKA+oe_FqhtNzSE=WA@mail.gmail.com
Whole thread Raw
In response to Re: Support run-time partition pruning for hash join  (David Rowley <dgrowleyml@gmail.com>)
List pgsql-hackers

On Tue, Aug 22, 2023 at 2:38 PM David Rowley <dgrowleyml@gmail.com> wrote:
With Hash Join, it seems to me that the pruning must take place for
every row that makes it into the hash table.  There will be maybe
cases where the unioned set of partitions simply yields every
partition and all the work results in no savings. Pruning on a scalar
value seems much more likely to be able to prune away unneeded
Append/MergeAppend subnodes.

Yeah, you're right.  If we have 'pt HashJoin t', for a subnode of 'pt'
to be pruned, it needs every row of 't' to be able to prune that
subnode.  The situation may improve if we have more than 2-way hash
joins, because the final surviving subnodes would be the intersection of
matching subnodes in each Hash.

With parameterized nestloop I agree that it's more likely to be able to
prune subnodes at rescan of Append/MergeAppend nodes based on scalar
values.

Sometimes we may just not generate parameterized nestloop as final plan,
such as when there are no indexes and no lateral references in the
Append/MergeAppend node.  In this case I think it would be great if we
can still do some partition prunning.  So I think this new 'join
partition prunning mechanism' (maybe this is not a proper name) should
be treated as a supplement to, not a substitute for, the current
run-time partition prunning based on parameterized nestloop, and it is
so implemented in the patch.
 
Perhaps there can be something adaptive in Hash Join which stops
trying to prune when all partitions must be visited.  On a quick
glance of the patch, I don't see any code in ExecJoinPartitionPrune()
which gives up trying to prune when the number of members in
part_prune_result is equal to the prunable Append/MergeAppend
subnodes.

Yeah, we can do that.
 
But run-time pruning already works for Nested Loops... I must be
missing something here.

Here I mean nestloop with non-parameterized inner path.  As I explained
upthread, we need to have a Material node on the outer side for that to
work, which seems not possible in real world.

Thanks
Richard

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: add timing information to pg_upgrade
Next
From: Amit Kapila
Date:
Subject: Re: persist logical slots to disk during shutdown checkpoint