Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: apply_scanjoin_target_to_paths and partitionwise join
Date
Msg-id CAExHW5utxnWq3xuVjEeejkP_=TqMEoXnVktDxu6myJ7Tb+kK-Q@mail.gmail.com
Whole thread Raw
In response to Re: apply_scanjoin_target_to_paths and partitionwise join  (Jakub Wartak <jakub.wartak@enterprisedb.com>)
Responses Re: apply_scanjoin_target_to_paths and partitionwise join
List pgsql-hackers


On Mon, May 6, 2024 at 4:26 PM Jakub Wartak <jakub.wartak@enterprisedb.com> wrote:
Hi Ashutosh & hackers,

On Mon, Apr 15, 2024 at 9:00 AM Ashutosh Bapat
<ashutosh.bapat.oss@gmail.com> wrote:
>
> Here's patch with
>
[..]
> Adding to the next commitfest but better to consider this for the next set of minor releases.

1. The patch does not pass cfbot -
https://cirrus-ci.com/task/5486258451906560 on master due to test
failure "not ok 206   + partition_join"

So I need to create a patch for master first. I thought CFBot somehow knew that the patch was created for PG 15. :)
 

2. Without the patch applied, the result of the meson test on master
was clean (no failures , so master is fine). After applying patch
there were expected some hunk failures (as the patch was created for
15_STABLE):

patching file src/backend/optimizer/plan/planner.c
Hunk #1 succeeded at 7567 (offset 468 lines).
Hunk #2 succeeded at 7593 (offset 468 lines).
patching file src/test/regress/expected/partition_join.out
Hunk #1 succeeded at 4777 (offset 56 lines).
Hunk #2 succeeded at 4867 (offset 56 lines).
patching file src/test/regress/sql/partition_join.sql
Hunk #1 succeeded at 1136 (offset 1 line).

3. Without patch there is performance regression/bug on master (cost
is higher with enable_partitionwise_join=on that without it):

data preparation:
-- Test the process_outer_partition() code path
CREATE TABLE plt1_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt1_adv_p1 PARTITION OF plt1_adv FOR VALUES IN ('0000',
'0001', '0002');
CREATE TABLE plt1_adv_p2 PARTITION OF plt1_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt1_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i;
ANALYZE plt1_adv;

CREATE TABLE plt2_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt2_adv_p1 PARTITION OF plt2_adv FOR VALUES IN ('0002');
CREATE TABLE plt2_adv_p2 PARTITION OF plt2_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt2_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (2, 3, 4);
ANALYZE plt2_adv;

CREATE TABLE plt3_adv (a int, b int, c text) PARTITION BY LIST (c);
CREATE TABLE plt3_adv_p1 PARTITION OF plt3_adv FOR VALUES IN ('0001');
CREATE TABLE plt3_adv_p2 PARTITION OF plt3_adv FOR VALUES IN ('0003', '0004');
INSERT INTO plt3_adv SELECT i, i, to_char(i % 5, 'FM0000') FROM
generate_series(0, 24) i WHERE i % 5 IN (1, 3, 4);
ANALYZE plt3_adv;

off:
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]


with enable_partitionwise_join=ON (see the jump from cost 22.02 -> 27.65):
EXPLAIN SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1
LEFT JOIN plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c
= t3.c) WHERE coalesce(t1.a, 0) % 5 != 3 AND coalesce(t1.a, 0) % 5 !=
4 ORDER BY t1.c, t1.a, t2.a, t3.a;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Sort  (cost=27.65..28.37 rows=289 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Append  (cost=2.23..15.83 rows=289 width=27)
         ->  Hash Full Join  (cost=2.23..4.81 rows=41 width=27)
[..]
         ->  Hash Full Join  (cost=2.45..9.57 rows=248 width=27)
[..]


However with the patch applied the plan with minimal cost is always
chosen ("22"):

explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0    ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]


set enable_partitionwise_join to on;
explain SELECT t1.a, t1.c, t2.a, t2.c, t3.a, t3.c FROM (plt1_adv t1 LEFT JOIN
plt2_adv t2 ON (t1.c = t2.c)) FULL JOIN plt3_adv t3 ON (t1.c = t3.c) WHERE
coalesce(t1.a, 0    ) % 5 != 3 AND coalesce(t1.a, 0) % 5 != 4 ORDER BY
t1.c, t1.a, t2.a, t3.a;
                                          QUERY PLAN
-----------------------------------------------------------------------------------------------
 Sort  (cost=22.02..22.58 rows=223 width=27)
   Sort Key: t1.c, t1.a, t2.a, t3.a
   ->  Hash Full Join  (cost=4.83..13.33 rows=223 width=27)
[..]

with the patch applied, the minimal cost (with toggle on or off) the
cost always stays the minimal from the available ones. We cannot
provide a reproducer for real performance regression, but for the
affected customer it took 530+s (with enable_partitionwise_join=on) 
and without that GUC it it was ~23s.
 
Thanks for providing actual timing. That's a huge difference.


4. meson test ends up with failures like below:

  4/290 postgresql:regress / regress/regress
                 ERROR          32.67s
  6/290 postgresql:pg_upgrade / pg_upgrade/002_pg_upgrade
                 ERROR          56.96s
 35/290 postgresql:recovery / recovery/027_stream_regress
                 ERROR          40.20s

(all due to "regression tests pass" failures)

the partition_join.sql is failing for test 206, so for this:

-- partitionwise join with fractional paths
CREATE TABLE fract_t (id BIGINT, PRIMARY KEY (id)) PARTITION BY RANGE (id);
CREATE TABLE fract_t0 PARTITION OF fract_t FOR VALUES FROM ('0') TO ('1000');
CREATE TABLE fract_t1 PARTITION OF fract_t FOR VALUES FROM ('1000') TO ('2000');

-- insert data
INSERT INTO fract_t (id) (SELECT generate_series(0, 1999));
ANALYZE fract_t;

-- verify plan; nested index only scans
SET max_parallel_workers_per_gather = 0;
SET enable_partitionwise_join = on;

the testsuite was expecting the below with enable_partitionwise_join = on;

EXPLAIN (COSTS OFF)
SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER
BY x.id ASC LIMIT 10;
                              QUERY PLAN
-----------------------------------------------------------------------
 Limit
   ->  Merge Append
         Sort Key: x.id
         ->  Merge Left Join
               Merge Cond: (x_1.id = y_1.id)
               ->  Index Only Scan using fract_t0_pkey on fract_t0 x_1
               ->  Index Only Scan using fract_t0_pkey on fract_t0 y_1
         ->  Merge Left Join
               Merge Cond: (x_2.id = y_2.id)
               ->  Index Only Scan using fract_t1_pkey on fract_t1 x_2
               ->  Index Only Scan using fract_t1_pkey on fract_t1 y_2

but actually with patch it gets this (here with costs):

EXPLAIN (COSTS) SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y
USING (id) ORDER BY x.id ASC LIMIT 10;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.10..2.21 rows=10 width=16)
   ->  Merge Left Join  (cost=1.10..223.10 rows=2000 width=16)
         Merge Cond: (x.id = y.id)
         ->  Append  (cost=0.55..96.55 rows=2000 width=8)
[..]
         ->  Append  (cost=0.55..96.55 rows=2000 width=8)
[..]


if you run it without patch and again with enable_partitionwise_join=on:

EXPLAIN SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING
(id) ORDER BY x.id ASC LIMIT 10;
                                                 QUERY PLAN
-------------------------------------------------------------------------------------------------------------
 Limit  (cost=1.11..2.22 rows=10 width=16)
   ->  Merge Append  (cost=1.11..223.11 rows=2000 width=16)
         Sort Key: x.id
         ->  Merge Left Join  (cost=0.55..101.55 rows=1000 width=16)
[..]
         ->  Merge Left Join  (cost=0.55..101.55 rows=1000 width=16)
[..]

So with the patch that SQL does not use partitionwise join as it finds
it more optimal to stick to a plan with cost of "1.10..2.21" instead
of "1.11..2.22" (w/ partition_join), nitpicking but still a failure
technically. Perhaps it could be even removed? (it's pretty close to
noise?).

I think we need to replace the failing query with something which uses partitionwise join even with the patch.

I will take a look at this after returning from a two week long vacation, unless someone else is interested in fixing this before that.
 
--
Best Wishes,
Ashutosh Bapat

pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: Increase the length of identifers from 63 characters to 128 characters or more
Next
From: Ashutosh Bapat
Date:
Subject: Re: partitioning and identity column