Home > mailing lists

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

From	Ashutosh Bapat
Subject	Re: apply_scanjoin_target_to_paths and partitionwise join
Date	May 6, 2024 12:58:14
Msg-id	CAExHW5utxnWq3xuVjEeejkP_=TqMEoXnVktDxu6myJ7Tb+kK-Q@mail.gmail.com Whole thread
In response to	Re: apply_scanjoin_target_to_paths and partitionwise join (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses	Re: apply_scanjoin_target_to_paths and partitionwise join
List	pgsql-hackers

Tree view

On Mon, May 6, 2024 at 4:26 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:

Hi Ashutosh & hackers,

On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Here's patch with
>
[..]
> Adding to the next commitfest but better to consider this for the next set of minor releases.

1. The patch does not pass cfbot -
https://cirrus-ci.com/task/5486258451906560 on master due to test
failure "not ok 206 + partition_join"

So I need to create a patch for master first. I thought CFBot somehow knew that the patch was created for PG 15. :)

2. Without the patch applied, the result of the meson test on master
was clean (no failures , so master is fine). After applying patch
there were expected some hunk failures (as the patch was created for
15_STABLE):

patching file src/backend/optimizer/plan/planner.c
Hunk #1 succeeded at 7567 (offset 468 lines).
Hunk #2 succeeded at 7593 (offset 468 lines).
patching file src/test/regress/expected/partition_join.out
Hunk #1 succeeded at 4777 (offset 56 lines).
Hunk #2 succeeded at 4867 (offset 56 lines).
patching file src/test/regress/sql/partition_join.sql
Hunk #1 succeeded at 1136 (offset 1 line).

3. Without patch there is performance regression/bug on master (cost
is higher with enable_partitionwise_join=on that without it):

data preparation:
-- Test the process_outer_partition() code path
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0000',
'0001', '0002');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i;
ANALYZE plt1_adv;

CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
ANALYZE plt2_adv;

CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
ANALYZE plt3_adv;

off:
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=22.02..22.58 rows=223 width=27)
Sort Key: t1.c, t1.a, t2.a, t3.a
-> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
[..]

with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65):
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=27.65..28.37 rows=289 width=27)
Sort Key: t1.c, t1.a, t2.a, t3.a
-> Append (cost=2.23..15.83 rows=289 width=27)
-> Hash Full Join (cost=2.23..4.81 rows=41 width=27)
[..]
-> Hash Full Join (cost=2.45..9.57 rows=248 width=27)
[..]

However with the patch applied the plan with minimal cost is always
chosen ("22"):

explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=22.02..22.58 rows=223 width=27)
Sort Key: t1.c, t1.a, t2.a, t3.a
-> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
[..]

set enable_partitionwise_join to on;
explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0 ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
QUERY PLAN
-----------------------------------------------------------------------------------------------
Sort (cost=22.02..22.58 rows=223 width=27)
Sort Key: t1.c, t1.a, t2.a, t3.a
-> Hash Full Join (cost=4.83..13.33 rows=223 width=27)
[..]

with the patch applied, the minimal cost (with toggle on or off) the
cost always stays the minimal from the available ones. We cannot
provide a reproducer for real performance regression, but for the
affected customer it took 530+s (with enable_partitionwise_join=on)

and without that GUC it it was ~23s.

Thanks for providing actual timing. That's a huge difference.

4. meson test ends up with failures like below:

4/290 postgresql:regress / regress/regress
ERROR 32.67s
6/290 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
ERROR 56.96s
35/290 postgresql:recovery / recovery/027_stream_regress
ERROR 40.20s

(all due to "regression tests pass" failures)

the partition_join.sql is failing for test 206, so for this:

-- partitionwise join with fractional paths
CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');

-- insert data
INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
ANALYZE fract_t;

-- verify plan; nested index only scans
SET max_parallel_workers_per_gather = 0;
SET enable_partitionwise_join = on;

the testsuite was expecting the below with enable_partitionwise_join = on;

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER
BY x.id ASC LIMIT 10;
QUERY PLAN
-----------------------------------------------------------------------
Limit
-> Merge Append
Sort Key: x.id
-> Merge Left Join
Merge Cond: (x_1.id = y_1.id)
-> Index Only Scan using fract_t0_pkey on fract_t0 x_1
-> Index Only Scan using fract_t0_pkey on fract_t0 y_1
-> Merge Left Join
Merge Cond: (x_2.id = y_2.id)
-> Index Only Scan using fract_t1_pkey on fract_t1 x_2
-> Index Only Scan using fract_t1_pkey on fract_t1 y_2

but actually with patch it gets this (here with costs):

EXPLAIN (COSTS) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
USING (id) ORDER BY x.id ASC LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=1.10..2.21 rows=10 width=16)
-> Merge Left Join (cost=1.10..223.10 rows=2000 width=16)
Merge Cond: (x.id = y.id)
-> Append (cost=0.55..96.55 rows=2000 width=8)
[..]
-> Append (cost=0.55..96.55 rows=2000 width=8)
[..]

if you run it without patch and again with enable_partitionwise_join=on:

EXPLAIN SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING
(id) ORDER BY x.id ASC LIMIT 10;
QUERY PLAN
-------------------------------------------------------------------------------------------------------------
Limit (cost=1.11..2.22 rows=10 width=16)
-> Merge Append (cost=1.11..223.11 rows=2000 width=16)
Sort Key: x.id
-> Merge Left Join (cost=0.55..101.55 rows=1000 width=16)
[..]
-> Merge Left Join (cost=0.55..101.55 rows=1000 width=16)
[..]

So with the patch that SQL does not use partitionwise join as it finds
it more optimal to stick to a plan with cost of "1.10..2.21" instead
of "1.11..2.22" (w/ partition_join), nitpicking but still a failure
technically. Perhaps it could be even removed? (it's pretty close to
noise?).

I think we need to replace the failing query with something which uses partitionwise join even with the patch.

I will take a look at this after returning from a two week long vacation, unless someone else is interested in fixing this before that.

Best Wishes,

Ashutosh Bapat

pgsql-hackers by date:

From: "David G. Johnston"
Date: 06 May 2024, 12:55:45
Subject: Re: Increase the length of identifers from 63 characters to 128 characters or more

From: Ashutosh Bapat
Date: 06 May 2024, 13:22:41
Subject: Re: partitioning and identity column

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

Previous

Next