Re: Support run-time partition pruning for hash join - Mailing list pgsql-hackers
From | Richard Guo |
---|---|
Subject | Re: Support run-time partition pruning for hash join |
Date | |
Msg-id | CAMbWs488_uMqmyQo0Cq-XrDSiEqiXCnfJtCyhXrWn_HdBen3Tw@mail.gmail.com Whole thread Raw |
In response to | Re: Support run-time partition pruning for hash join (David Rowley <dgrowleyml@gmail.com>) |
Responses |
Re: Support run-time partition pruning for hash join
|
List | pgsql-hackers |
On Fri, Aug 25, 2023 at 11:03 AM David Rowley <dgrowleyml@gmail.com> wrote:
I'd suggest writing some cost which costs an execution of run-time
pruning. With LIST and RANGE you probably want something like
cpu_operator_cost * LOG2(nparts) once for each hashed tuple to account
for the binary search over the sorted datum array. For HASH
partitions, something like cpu_operator_cost * npartcols once for each
hashed tuple.
You'll need to then come up with some counter costs to subtract from
the Append/MergeAppend. This is tricky, as discussed. Just come up
with something crude for now.
To start with, it could just be as crude as:
total_costs *= (Min(expected_outer_rows, n_append_subnodes) /
n_append_subnodes);
i.e assume that every outer joined row will require exactly 1 new
partition up to the total number of partitions. That's pretty much
worst-case, but it'll at least allow the optimisation to work for
cases like where the hash table is expected to contain just a tiny
number of rows (fewer than the number of partitions)
To make it better, you might want to look at join selectivity
estimation and see if you can find something there to influence
something better.
I have a go at writing some costing codes according to your suggestion.
That's compute_partprune_cost() in the v2 patch.
For the hash side, this function computes the pruning cost as
cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and
cpu_operator_cost * nparts * inner_rows for HASH.
For the Append/MergeAppend side, this function first estimates the size
of outer side that matches, using the same idea as we estimate the
joinrel size for JOIN_SEMI. Then it assumes that each outer joined row
occupies one new partition (the worst case) and computes how much cost
can be saved from partition pruning.
If the cost saved from the Append/MergeAppend side is larger than the
pruning cost from the Hash side, then we say that partition pruning is a
win.
Note that this costing logic runs for each Append-Hash pair, so it copes
with the case where we have multiple join levels.
With this costing logic added, I performed the same performance
comparisons of the worst case as in [1], and here is what I got.
tuples unpatched patched
10000 44.66 44.37 -0.006493506
20000 52.41 52.29 -0.002289639
30000 61.11 61.12 +0.000163639
40000 67.87 68.24 +0.005451599
50000 74.51 74.75 +0.003221044
60000 82.3 81.55 -0.009113001
70000 87.16 86.98 -0.002065168
80000 93.49 93.89 +0.004278532
90000 101.52 100.83 -0.00679669
100000 108.34 108.56 +0.002030644
So the costing logic successfully avoids performing the partition
pruning in the worst case.
I also tested the cases where partition pruning is possible with
different sizes of the hash side.
tuples unpatched patched
100 36.86 2.4 -0.934888768
200 35.87 2.37 -0.933928074
300 35.95 2.55 -0.92906815
400 36.4 2.63 -0.927747253
500 36.39 2.85 -0.921681781
600 36.32 2.97 -0.918226872
700 36.6 3.23 -0.911748634
800 36.88 3.44 -0.906724512
900 37.02 3.46 -0.906537007
1000 37.25 37.21 -0.001073826
The first 9 rows show that the costing logic allows the partition
pruning to be performed and the pruning turns out to be a big win. The
last row shows that the partition pruning is disallowed by the costing
logic because it thinks no partition can be pruned (we have 1000
partitions in total).
So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases. I think this might be a good start to push this patch forward.
Any thoughts or comments?
[1] https://www.postgresql.org/message-id/CAMbWs49%2Bp6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q%40mail.gmail.com
Thanks
Richard
That's compute_partprune_cost() in the v2 patch.
For the hash side, this function computes the pruning cost as
cpu_operator_cost * LOG2(nparts) * inner_rows for LIST and RANGE, and
cpu_operator_cost * nparts * inner_rows for HASH.
For the Append/MergeAppend side, this function first estimates the size
of outer side that matches, using the same idea as we estimate the
joinrel size for JOIN_SEMI. Then it assumes that each outer joined row
occupies one new partition (the worst case) and computes how much cost
can be saved from partition pruning.
If the cost saved from the Append/MergeAppend side is larger than the
pruning cost from the Hash side, then we say that partition pruning is a
win.
Note that this costing logic runs for each Append-Hash pair, so it copes
with the case where we have multiple join levels.
With this costing logic added, I performed the same performance
comparisons of the worst case as in [1], and here is what I got.
tuples unpatched patched
10000 44.66 44.37 -0.006493506
20000 52.41 52.29 -0.002289639
30000 61.11 61.12 +0.000163639
40000 67.87 68.24 +0.005451599
50000 74.51 74.75 +0.003221044
60000 82.3 81.55 -0.009113001
70000 87.16 86.98 -0.002065168
80000 93.49 93.89 +0.004278532
90000 101.52 100.83 -0.00679669
100000 108.34 108.56 +0.002030644
So the costing logic successfully avoids performing the partition
pruning in the worst case.
I also tested the cases where partition pruning is possible with
different sizes of the hash side.
tuples unpatched patched
100 36.86 2.4 -0.934888768
200 35.87 2.37 -0.933928074
300 35.95 2.55 -0.92906815
400 36.4 2.63 -0.927747253
500 36.39 2.85 -0.921681781
600 36.32 2.97 -0.918226872
700 36.6 3.23 -0.911748634
800 36.88 3.44 -0.906724512
900 37.02 3.46 -0.906537007
1000 37.25 37.21 -0.001073826
The first 9 rows show that the costing logic allows the partition
pruning to be performed and the pruning turns out to be a big win. The
last row shows that the partition pruning is disallowed by the costing
logic because it thinks no partition can be pruned (we have 1000
partitions in total).
So it seems that the new costing logic is quite crude and tends to be
very conservative, but it can help avoid the large overhead in the worst
cases. I think this might be a good start to push this patch forward.
Any thoughts or comments?
[1] https://www.postgresql.org/message-id/CAMbWs49%2Bp6hBxXJHFiSwOtPCSkAHwhJj3hTpCR_pmMiUUVLZ1Q%40mail.gmail.com
Thanks
Richard
Attachment
pgsql-hackers by date: