Thread: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

tender wang

Date:

05 September 2023, 11:52:35

Hi all,

I recently run benchmark[1] on master, but I found performance problem as below:

explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=78.00..45267050.75 rows=1 width=27) (actual time=299695.097..299695.099 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.651..0.652 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.650..0.651 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.013..0.428 rows=2000 loops=1)
-> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27) (actual time=299695.096..299695.096 rows=0 loops=1)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 3621030625
-> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175 width=11) (actual time=0.026..6.225 rows=60175 loops=1)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.554 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
Planning Time: 0.172 ms
Execution Time: 299695.501 ms
(16 rows)

After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual time=192669.605..192670.425 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.662..0.663 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.661..0.662 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.017..0.430 rows=2000 loops=1)
-> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual time=192669.604..192670.422 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..91025107.47 rows=1 width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
Planning Time: 0.174 ms
Execution Time: 192670.458 ms
(19 rows)

I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.

When enable_material = true, we can see Material path won in first plan, but Parallel Seq Scan node doesn't add as outer path, which because

in try_partial_nestloop_path() , the cost of nestloop wat computed using seq scan path not material path.

[1] include test table schema and data, you can repeat above problem.

I try fix this problem in attached patch, and I found pg12.12 also had this issue. Please review my patch, thanks!

[1] https://github.com/tenderwg/tpch_test

Attachment

0001-Parallel-seq-scan-should-consider-materila-inner-pat.patch

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

tender wang

Date:

05 September 2023, 13:10:32

After using patch, the result as below :

QUERY PLAN
------------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..26630101.20 rows=1 width=27) (actual time=160571.005..160571.105 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=1.065..1.066 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=1.064..1.065 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.046..0.830 rows=2000 loops=1)
-> Gather (cost=1000.00..26630023.20 rows=1 width=27) (actual time=160571.003..160571.102 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..26629023.10 rows=1 width=27) (actual time=160549.257..160549.258 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.010..3.393 rows=30088 loops=2)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.839 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.008..11.381 rows=60175 loops=2)
Planning Time: 0.174 ms
Execution Time: 160571.476 ms
(20 rows)

tender wang <tndrwang@gmail.com> 于2023年9月5日周二 16:52写道：

Hi all,

I recently run benchmark[1] on master, but I found performance problem as below:

explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=78.00..45267050.75 rows=1 width=27) (actual time=299695.097..299695.099 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.651..0.652 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.650..0.651 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.013..0.428 rows=2000 loops=1)
-> Nested Loop Left Join (cost=0.00..45266972.75 rows=1 width=27) (actual time=299695.096..299695.096 rows=0 loops=1)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 3621030625
-> Seq Scan on lineitem ref_0 (cost=0.00..1969.75 rows=60175 width=11) (actual time=0.026..6.225 rows=60175 loops=1)
-> Materialize (cost=0.00..2270.62 rows=60175 width=27) (actual time=0.000..2.554 rows=60175 loops=60175)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.004..8.169 rows=60175 loops=1)
Planning Time: 0.172 ms
Execution Time: 299695.501 ms
(16 rows)

After I set enable_material to off, the same query run faster, as below:
set enable_material = off;
explain analyze select
subq_0.c0 as c0,
subq_0.c1 as c1,
subq_0.c2 as c2
from
(select
ref_0.l_shipmode as c0,
sample_0.l_orderkey as c1,
sample_0.l_quantity as c2,
ref_0.l_orderkey as c3,
sample_0.l_shipmode as c5,
ref_0.l_shipinstruct as c6
from
public.lineitem as ref_0
left join public.lineitem as sample_0
on ((select p_partkey from public.part order by p_partkey limit 1)
is not NULL)
where sample_0.l_orderkey is NULL) as subq_0
where subq_0.c5 is NULL
limit 1;
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------
Limit (cost=1078.00..91026185.57 rows=1 width=27) (actual time=192669.605..192670.425 rows=0 loops=1)
InitPlan 1 (returns $0)
-> Limit (cost=78.00..78.00 rows=1 width=8) (actual time=0.662..0.663 rows=1 loops=1)
-> Sort (cost=78.00..83.00 rows=2000 width=8) (actual time=0.661..0.662 rows=1 loops=1)
Sort Key: part.p_partkey
Sort Method: top-N heapsort Memory: 25kB
-> Seq Scan on part (cost=0.00..68.00 rows=2000 width=8) (actual time=0.017..0.430 rows=2000 loops=1)
-> Gather (cost=1000.00..91026107.57 rows=1 width=27) (actual time=192669.604..192670.422 rows=0 loops=1)
Workers Planned: 1
Params Evaluated: $0
Workers Launched: 1
-> Nested Loop Left Join (cost=0.00..91025107.47 rows=1 width=27) (actual time=192588.143..192588.144 rows=0 loops=2)
Join Filter: ($0 IS NOT NULL)
Filter: ((sample_0.l_orderkey IS NULL) AND (sample_0.l_shipmode IS NULL))
Rows Removed by Filter: 1810515312
-> Parallel Seq Scan on lineitem ref_0 (cost=0.00..1721.97 rows=35397 width=11) (actual time=0.007..3.797 rows=30088 loops=2)
-> Seq Scan on lineitem sample_0 (cost=0.00..1969.75 rows=60175 width=27) (actual time=0.000..2.637 rows=60175 loops=60175)
Planning Time: 0.174 ms
Execution Time: 192670.458 ms
(19 rows)

I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.
When enable_material = true, we can see Material path won in first plan, but Parallel Seq Scan node doesn't add as outer path, which because
in try_partial_nestloop_path() , the cost of nestloop wat computed using seq scan path not material path.

[1] include test table schema and data, you can repeat above problem.

I try fix this problem in attached patch, and I found pg12.12 also had this issue. Please review my patch, thanks!

[1] https://github.com/tenderwg/tpch_test

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

05 September 2023, 13:50:56

On Tue, Sep 5, 2023 at 4:52 PM tender wang <tndrwang@gmail.com> wrote:

I recently run benchmark[1] on master, but I found performance problem as below:
...

I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.

Yeah, this seems an omission in commit 45be99f8. I reviewed the patch
and here are some comments.

* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to
unique-ify the inner path.

* I think we can check if it'd be parallel safe before creating the
material path, thus avoid the creation in unsafe cases.

* I don't think the test case you added works for the code changes.
Maybe a plan likes below is better:

explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
QUERY PLAN
----------------------------------------------
Gather
Workers Planned: 4
-> Nested Loop
Join Filter: (tenk1.two = tenk2.two)
-> Parallel Seq Scan on tenk1
-> Materialize
-> Seq Scan on tenk2
(7 rows)

Thanks
Richard

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

tender wang

Date:

07 September 2023, 12:56:40

Richard Guo <guofenglinux@gmail.com> 于2023年9月5日周二 18:51写道：

On Tue, Sep 5, 2023 at 4:52 PM tender wang <tndrwang@gmail.com> wrote:
I recently run benchmark[1] on master, but I found performance problem as below:
...

I debug the code and find consider_parallel_nestloop() doesn't consider materialized form of the cheapest inner path.

Yeah, this seems an omission in commit 45be99f8. I reviewed the patch
and here are some comments.

* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to
unique-ify the inner path.

That's right. The V2 patch has been fixed.

* I think we can check if it'd be parallel safe before creating the
material path, thus avoid the creation in unsafe cases.

Agreed.

* I don't think the test case you added works for the code changes.
Maybe a plan likes below is better:

Agreed.

explain (costs off)
select * from tenk1, tenk2 where tenk1.two = tenk2.two;
QUERY PLAN
----------------------------------------------
Gather
Workers Planned: 4
-> Nested Loop
Join Filter: (tenk1.two = tenk2.two)
-> Parallel Seq Scan on tenk1
-> Materialize
-> Seq Scan on tenk2
(7 rows)

Thanks
Richard

Attachment

v2-0001-Parallel-seq-scan-should-consider-materila-inner-.patch

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Robert Haas

Date:

07 September 2023, 22:14:50

On Tue, Sep 5, 2023 at 8:07 AM Richard Guo <guofenglinux@gmail.com> wrote:
> Yeah, this seems an omission in commit 45be99f8.

It's been a while, but I think I omitted this deliberately because I
didn't really understand the value of it and wanted to keep the
planning cost down.

The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?

--
Robert Haas
EDB: http://www.enterprisedb.com

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

08 September 2023, 09:06:44

On Fri, Sep 8, 2023 at 3:15 AM Robert Haas <robertmhaas@gmail.com> wrote:

The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?

Yes the given example query is not that convincing. I tried a query
with plans as below (after some GUC setting) which might be more
realistic in real world.

unpatched:

explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..5522666.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..5522666.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(6 rows)

patched:

explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..1807085.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Materialize (cost=0.00..307.00 rows=8000 width=157)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(7 rows)

The execution time (ms) are (avg of 3 runs):

unpatched: 71769.21
patched: 65510.04

So we can see some (~9%) performance gains in this case.

Thanks
Richard

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

tender wang

Date:

27 September 2023, 16:06:03

Hi tom,

Do you have any comments or suggestions on this issue? Thanks.

Richard Guo <guofenglinux@gmail.com> 于2023年9月8日周五 14:06写道：

On Fri, Sep 8, 2023 at 3:15 AM Robert Haas <robertmhaas@gmail.com> wrote:
The example query provided here seems rather artificial. Surely few
people write a join clause that references neither of the tables being
joined. Is there a more realistic case where this makes a big
difference?

Yes the given example query is not that convincing. I tried a query
with plans as below (after some GUC setting) which might be more
realistic in real world.

unpatched:

explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..5522666.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..5522666.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(6 rows)

patched:

explain select * from partsupp join lineitem on l_partkey > ps_partkey;
QUERY PLAN
--------------------------------------------------------------------------------------
Gather (cost=0.00..1807085.44 rows=160466667 width=301)
Workers Planned: 4
-> Nested Loop (cost=0.00..1807085.44 rows=40116667 width=301)
Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
-> Parallel Seq Scan on lineitem (cost=0.00..1518.44 rows=15044 width=144)
-> Materialize (cost=0.00..307.00 rows=8000 width=157)
-> Seq Scan on partsupp (cost=0.00..267.00 rows=8000 width=157)
(7 rows)

The execution time (ms) are (avg of 3 runs):

unpatched: 71769.21
patched: 65510.04

So we can see some (~9%) performance gains in this case.

Thanks
Richard

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

David Rowley

Date:

28 September 2023, 02:41:03

On Fri, 8 Sept 2023 at 09:41, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Sep 5, 2023 at 8:07 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > Yeah, this seems an omission in commit 45be99f8.
>
> It's been a while, but I think I omitted this deliberately because I
> didn't really understand the value of it and wanted to keep the
> planning cost down.

I think the value is potentially not having to repeatedly execute some
expensive rescan to the nested loop join once for each outer-side
tuple.

The planning cost is something to consider for sure, but it seems
strange that we deemed it worthy to consider material paths for the
non-parallel input paths but draw the line for the parallel/partial
ones. It seems to me that the additional costs and the possible
benefits are the same for both.

David

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

David Rowley

Date:

28 September 2023, 02:49:47

On Fri, 8 Sept 2023 at 19:14, Richard Guo <guofenglinux@gmail.com> wrote:
> explain select * from partsupp join lineitem on l_partkey > ps_partkey;
>                                       QUERY PLAN
> --------------------------------------------------------------------------------------
>  Gather  (cost=0.00..1807085.44 rows=160466667 width=301)
>    Workers Planned: 4
>    ->  Nested Loop  (cost=0.00..1807085.44 rows=40116667 width=301)
>          Join Filter: (lineitem.l_partkey > partsupp.ps_partkey)
>          ->  Parallel Seq Scan on lineitem  (cost=0.00..1518.44 rows=15044 width=144)
>          ->  Materialize  (cost=0.00..307.00 rows=8000 width=157)
>                ->  Seq Scan on partsupp  (cost=0.00..267.00 rows=8000 width=157)
> (7 rows)
>
> The execution time (ms) are (avg of 3 runs):
>
> unpatched:  71769.21
> patched:    65510.04

This gap would be wider if the partsupp Seq Scan were filtering off
some rows and wider still if you added more rows to lineitem.
However, a clauseless seqscan is not the most compelling use case
below a material node. The inner side of the nested loop could be some
subquery that takes 6 days to complete. Running the 6 day query ~15044
times seems like something that would be good to avoid.

It seems worth considering Material paths to me.  I think that the
above example could be tuned any way you like to make it look better
or worse.

David

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Alena Rybakina

Date:

18 October 2023, 16:44:08

Hi!

Thank you for your work on the subject.

I reviewed your patch and found that your commit message does not fully explain your code, in addition, I found several spelling mistakes.

I think it's better to change to:With parallel seqscan, we should consider materializing the cheapest inner path in
case of nested loop if it doesn't contain a unique node or it is unsafe to use it in a subquery.

Besides, I couldn't understand why we again check that material path is safe?

if (matpath != NULL && matpath->parallel_safe)
try_partial_nestloop_path(root, joinrel, outerpath, matpath,
pathkeys, jointype, extra);

-- 
Regards,
Alena Rybakina

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

"Andrey M. Borodin"

Date:

08 April, 12:40:17

> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote:
>
>    Do you have any comments or suggestions on this issue? Thanks.
Hi Tender,

there are some review comments in the thread, that you might be interested in.
I'll mark this [0] entry "Waiting on Author" and move to next CF.

Thanks!

Best regards, Andrey Borodin.

[0]https://commitfest.postgresql.org/47/4549/

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

08 April, 13:54:41

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年4月8日周一 17:40写道：

> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote:
>
> Do you have any comments or suggestions on this issue? Thanks.
Hi Tender,

there are some review comments in the thread, that you might be interested in.
I'll mark this [0] entry "Waiting on Author" and move to next CF.

Thank you for the reminder. I will update the patch later.

I also deeply hope to get more advice about this patch.

(even the advice that not worth continuint to work on this patch).

Thanks.

Thanks!

Best regards, Andrey Borodin.

[0]https://commitfest.postgresql.org/47/4549/

Tender Wang

OpenPie: https://en.openpie.com/

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

23 April, 11:59:42

Andrey M. Borodin <x4mmm@yandex-team.ru> 于2024年4月8日周一 17:40写道：

> On 27 Sep 2023, at 16:06, tender wang <tndrwang@gmail.com> wrote:
>
> Do you have any comments or suggestions on this issue? Thanks.
Hi Tender,

there are some review comments in the thread, that you might be interested in.
I'll mark this [0] entry "Waiting on Author" and move to next CF.

Thanks!

Best regards, Andrey Borodin.

[0]https://commitfest.postgresql.org/47/4549/

I have rebased master and fixed a plan diff case.

Tender Wang

OpenPie: https://en.openpie.com/

Attachment

v3-0001-Support-materializing-inner-path-on-parallel-oute.patch

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tomasz Rybak

Date:

30 May, 23:21:14

On Tue, 2024-04-23 at 16:59 +0800, Tender Wang wrote:
>
[ cut ]
>
> I have rebased master and fixed a plan diff case.

We (me, Paul Jungwirth, and Yuki Fujii) reviewed this patch
at PgConf.dev Patch Review Workshop.
Here are our findings.

Patch tries to allow for using materialization together
with parallel subqueries.
It applies cleanly on 8fea1bd5411b793697a4c9087c403887e050c4ac
(current HEAD).
Tests pass locally on macOS and Linux in VM under Windows.
Tests are also green in cfbot (for last 2 weeks; they were
red previously, probably because of need to rebase).

Please add more tests.  Especially please add some negative tests;
current patch checks that it is safe to apply materialization. It would
be helpful to add tests checking that materialization is not applied
in both checked cases:
1. when inner join path is not parallel safe
2. when matpath is not parallel safe

This patch tries to apply materialization only when join type
is not JOIN_UNIQUE_INNER. Comment mentions this, but does not
explain why. So please either add comment describing reason for that
or try enabling materialization in such a case.

Best regards.

--
Tomasz Rybak, Debian Developer <serpent@debian.org>
GPG: A565 CE64 F866 A258 4DDC F9C7 ECB7 3E37 E887 AA8C

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

04 June, 13:51:02

Tomasz Rybak <tomasz.rybak@post.pl> 于2024年5月31日周五 04:21写道：

On Tue, 2024-04-23 at 16:59 +0800, Tender Wang wrote:
>
[ cut ]
>
> I have rebased master and fixed a plan diff case.

We (me, Paul Jungwirth, and Yuki Fujii) reviewed this patch
at PgConf.dev Patch Review Workshop.

Thanks for reviewing this patch.

Here are our findings.

Patch tries to allow for using materialization together
with parallel subqueries.
It applies cleanly on 8fea1bd5411b793697a4c9087c403887e050c4ac
(current HEAD).
Tests pass locally on macOS and Linux in VM under Windows.
Tests are also green in cfbot (for last 2 weeks; they were
red previously, probably because of need to rebase).

Please add more tests. Especially please add some negative tests;
current patch checks that it is safe to apply materialization. It would
be helpful to add tests checking that materialization is not applied
in both checked cases:
1. when inner join path is not parallel safe
2. when matpath is not parallel safe

I added a test case that inner rel is not parallel safe. Actually, matpath will not create

if inner rel is not parallel safe. So I didn't add test case for the second scenario.

This patch tries to apply materialization only when join type
is not JOIN_UNIQUE_INNER. Comment mentions this, but does not
explain why. So please either add comment describing reason for that
or try enabling materialization in such a case.

Yeah, Richard commented the v1 patch about JOIN_UNIQUE_INNER in [1]

* I think we should not consider materializing the cheapest inner path
if we're doing JOIN_UNIQUE_INNER, because in this case we have to

unique-ify the inner path.

We don't consider material inner path if jointype is JOIN_UNIQUE_INNER in match_unsorted_order().

So here is as same logic as match_unsorted_order(). I added comments to explain why.

[1] https://www.postgresql.org/message-id/CAMbWs49LbQF_Z0iKPRPnTHfsRECT7M-4rF6ft5vpW1ARSpBkPA%40mail.gmail.com

Tender Wang

OpenPie: https://en.openpie.com/

Attachment

v4-0001-Support-materializing-inner-path-on-parallel-oute.patch

RE: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

"Fujii.Yuki@df.MitsubishiElectric.co.jp"

Date:

05 June, 04:26:30

Hi. Tender.

Thank you for modification.

> From: Tender Wang <tndrwang@gmail.com>
> Sent: Tuesday, June 4, 2024 7:51 PM
>     Please add more tests.  Especially please add some negative tests;
>     current patch checks that it is safe to apply materialization. It would
>     be helpful to add tests checking that materialization is not applied
>     in both checked cases:
>     1. when inner join path is not parallel safe
>     2. when matpath is not parallel safe
> 
> 
> 
> I added a test case that inner rel is not parallel safe. Actually, 
> matpath will not create if inner rel is not parallel safe. So I didn't add test case for the second  scenario.
Is there case in which matpath is not parallel safe and inner rel is parallel safe?
If right, I think that it would be suitable to add a negative test in a such case.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

11 June, 11:11:26

Fujii.Yuki@df.MitsubishiElectric.co.jp <Fujii.Yuki@df.mitsubishielectric.co.jp> 于2024年6月5日周三 09:26写道：

Hi. Tender.

Thank you for modification.

> From: Tender Wang <tndrwang@gmail.com>
> Sent: Tuesday, June 4, 2024 7:51 PM
> Please add more tests. Especially please add some negative tests;
> current patch checks that it is safe to apply materialization. It would
> be helpful to add tests checking that materialization is not applied
> in both checked cases:
> 1. when inner join path is not parallel safe
> 2. when matpath is not parallel safe
>
>
>
> I added a test case that inner rel is not parallel safe. Actually,
> matpath will not create if inner rel is not parallel safe. So I didn't add test case for the second scenario.
Is there case in which matpath is not parallel safe and inner rel is parallel safe?
If right, I think that it would be suitable to add a negative test in a such case.

I looked through create_xxx_path(), and I found that almost path.parallel_safe is assigned from RelOptiInfo.consider_parallel.

Some pathes take subpath->parallel_safe into account(e.g. Material path). In most cases, Material is parallel_safe if rel is parallel

safe. Now I haven't come up a query plan that material is un parallel-safe but rel is parallel-safe.

Sincerely yours,
Yuuki Fujii

--
Yuuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Tender Wang

OpenPie: https://en.openpie.com/

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

"Fujii.Yuki@df.MitsubishiElectric.co.jp"

Date:

11 June, 13:40:14

Hi. Tender.

> From: Tender Wang <tndrwang@gmail.com>
> Sent: Tuesday, June 11, 2024 5:11 PM
> 
>     > From: Tender Wang <tndrwang@gmail.com <mailto:tndrwang@gmail.com> >
>     > Sent: Tuesday, June 4, 2024 7:51 PM
>     >       Please add more tests.  Especially please add some negative tests;
>     >       current patch checks that it is safe to apply materialization. It would
>     >       be helpful to add tests checking that materialization is not applied
>     >       in both checked cases:
>     >       1. when inner join path is not parallel safe
>     >       2. when matpath is not parallel safe
>     >
>     >
>     >
>     > I added a test case that inner rel is not parallel safe. Actually,
>     > matpath will not create if inner rel is not parallel safe. So I didn't add test case for the second  scenario.
>     Is there case in which matpath is not parallel safe and inner rel is parallel safe?
>     If right, I think that it would be suitable to add a negative test in a such case.
> 
> 
> 
> I looked through create_xxx_path(), and I found that almost path.parallel_safe is assigned from
> RelOptiInfo.consider_parallel.
> Some pathes take subpath->parallel_safe into account(e.g. Material path). In most cases, Material is parallel_safe if
relis
 
> parallel safe. Now I haven't come up a query plan that material is un parallel-safe but rel is parallel-safe.
Thank you for looking into the source code. I understand the situation now.

Sincerely yours,
Yuki Fujii

--
Yuki Fujii
Information Technology R&D Center Mitsubishi Electric Corporation

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

14 June, 06:02:43

Hi Robert,

Since this patch had been reviewed at PgConf.dev Patch Review Workshop. And I have updated

the patch according to the review advice. Now there are no others to comment this patch.

The status of this patch on commitfest have stayed "need review" for a long time.

I want to know if it is ready to move to the next status "Ready for commiter".

Thanks.

Tender Wang

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

18 June, 12:24:05

On Tue, Jun 4, 2024 at 6:51 PM Tender Wang <tndrwang@gmail.com> wrote:
> Yeah, Richard commented the v1 patch about JOIN_UNIQUE_INNER in [1]
>
> * I think we should not consider materializing the cheapest inner path
> if we're doing JOIN_UNIQUE_INNER, because in this case we have to
> unique-ify the inner path.
>
> We don't consider material inner path if jointype is JOIN_UNIQUE_INNER in match_unsorted_order().
> So here is as same logic as match_unsorted_order(). I added comments to explain why.

I looked through the v4 patch and found an issue.  For the plan diff:

+         ->  Nested Loop
+               ->  Parallel Seq Scan on prt1_p1 t1_1
+               ->  Materialize
+                     ->  Sample Scan on prt1_p1 t2_1
+                           Sampling: system (t1_1.a) REPEATABLE (t1_1.b)
+                           Filter: (t1_1.a = a)

This does not seem correct to me.  The inner path is parameterized by
the outer rel, in which case it does not make sense to add a Materialize
node on top of it.

I updated the patch to include a check in consider_parallel_nestloop
ensuring that inner_cheapest_total is not parameterized by outerrel
before materializing it.  I also tweaked the comments, test cases and
commit message.

Thanks
Richard

Attachment

v5-0001-Consid.pat

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

19 June, 05:55:25

Richard Guo <guofenglinux@gmail.com> 于2024年6月18日周二 17:24写道：

On Tue, Jun 4, 2024 at 6:51 PM Tender Wang <tndrwang@gmail.com> wrote:
> Yeah, Richard commented the v1 patch about JOIN_UNIQUE_INNER in [1]
>
> * I think we should not consider materializing the cheapest inner path
> if we're doing JOIN_UNIQUE_INNER, because in this case we have to
> unique-ify the inner path.
>
> We don't consider material inner path if jointype is JOIN_UNIQUE_INNER in match_unsorted_order().
> So here is as same logic as match_unsorted_order(). I added comments to explain why.

I looked through the v4 patch and found an issue. For the plan diff:

+ -> Nested Loop
+ -> Parallel Seq Scan on prt1_p1 t1_1
+ -> Materialize
+ -> Sample Scan on prt1_p1 t2_1
+ Sampling: system (t1_1.a) REPEATABLE (t1_1.b)
+ Filter: (t1_1.a = a)

This does not seem correct to me. The inner path is parameterized by
the outer rel, in which case it does not make sense to add a Materialize
node on top of it.

Yeah, you're right.

I updated the patch to include a check in consider_parallel_nestloop
ensuring that inner_cheapest_total is not parameterized by outerrel
before materializing it. I also tweaked the comments, test cases and
commit message.

Thanks for the work. Now it looks better.

I have changed the status from "need review" to "ready for commiters" on the commitfest.

Tender Wang

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

06 July, 12:32:41

On Wed, Jun 19, 2024 at 10:55 AM Tender Wang <tndrwang@gmail.com> wrote:
> Richard Guo <guofenglinux@gmail.com> 于2024年6月18日周二 17:24写道：
>> I updated the patch to include a check in consider_parallel_nestloop
>> ensuring that inner_cheapest_total is not parameterized by outerrel
>> before materializing it.  I also tweaked the comments, test cases and
>> commit message.
>
> Thanks for the work. Now it looks better.
> I have changed the status from "need review" to "ready for commiters"  on the commitfest.

Here is a new rebase.

I'm planning to push it next week, barring any objections.

Thanks
Richard

Attachment

v6-0001-Consid.pat

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

12 July, 05:29:48

On Sat, Jul 6, 2024 at 5:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Here is a new rebase.
>
> I'm planning to push it next week, barring any objections.

Pushed.

Thanks
Richard

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Tender Wang

Date:

12 July, 05:43:58

Richard Guo <guofenglinux@gmail.com> 于2024年7月12日周五 10:30写道：

On Sat, Jul 6, 2024 at 5:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
> Here is a new rebase.
>
> I'm planning to push it next week, barring any objections.

Pushed.

Thanks
Richard

Thanks for pushing.

Tender Wang

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Alexander Lakhin

Date:

18 July, 11:00:00

Hello Richard,

12.07.2024 05:29, Richard Guo wrote:
> On Sat, Jul 6, 2024 at 5:32 PM Richard Guo <guofenglinux@gmail.com> wrote:
>> Here is a new rebase.
>>
>> I'm planning to push it next week, barring any objections.
> Pushed.

Please look at a recent buildfarm failure [1], which shows some
instability of that test addition:
  -- the joinrel is not parallel-safe due to the OFFSET clause in the subquery
  explain (costs off)
      select * from tenk1 t1, (select * from tenk2 t2 offset 0) t2 where t1.two > t2.two;
-                QUERY PLAN
--------------------------------------------
+                   QUERY PLAN
+-------------------------------------------------
   Nested Loop
     Join Filter: (t1.two > t2.two)
-   ->  Gather
-         Workers Planned: 4
-         ->  Parallel Seq Scan on tenk1 t1
+   ->  Seq Scan on tenk2 t2
     ->  Materialize
-         ->  Seq Scan on tenk2 t2
+         ->  Gather
+               Workers Planned: 4
+               ->  Parallel Seq Scan on tenk1 t1
  (7 rows)

I've managed to reproduce this plan change when running
multiple 027_stream_regress.pl instances simultaneously, with
parallel_schedule reduced to:
test: test_setup
test: create_misc
test: create_index
test: sanity_check
test: select_parallel

I've added the following to the test and got two verbose plans for
comparison (see the attachment).
  -- the joinrel is not parallel-safe due to the OFFSET clause in the subquery
  explain (costs off)
      select * from tenk1 t1, (select * from tenk2 t2 offset 0) t2 where t1.two > t2.two;
+\o plan.txt
+explain (verbose)
+    select * from tenk1 t1, (select * from tenk2 t2 offset 0) t2 where t1.two > t2.two;
+\o
  alter table tenk2 reset (parallel_workers);

[1] https://buildfarm.postgresql.org/cgi-bin/show_log.pl?nm=tamandua&dt=2024-07-17%2017%3A12%3A53

Best regards,
Alexander

Attachment

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

18 July, 11:11:50

On Thu, Jul 18, 2024 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> Please look at a recent buildfarm failure [1], which shows some
> instability of that test addition:
>   -- the joinrel is not parallel-safe due to the OFFSET clause in the subquery
>   explain (costs off)
>       select * from tenk1 t1, (select * from tenk2 t2 offset 0) t2 where t1.two > t2.two;
> -                QUERY PLAN
> --------------------------------------------
> +                   QUERY PLAN
> +-------------------------------------------------
>    Nested Loop
>      Join Filter: (t1.two > t2.two)
> -   ->  Gather
> -         Workers Planned: 4
> -         ->  Parallel Seq Scan on tenk1 t1
> +   ->  Seq Scan on tenk2 t2
>      ->  Materialize
> -         ->  Seq Scan on tenk2 t2
> +         ->  Gather
> +               Workers Planned: 4
> +               ->  Parallel Seq Scan on tenk1 t1
>   (7 rows)

Thank you for the report and investigation.  Will have a look.

Thanks
Richard

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

18 July, 17:30:22

On Thu, Jul 18, 2024 at 4:11 PM Richard Guo <guofenglinux@gmail.com> wrote:
> On Thu, Jul 18, 2024 at 4:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> > Please look at a recent buildfarm failure [1], which shows some
> > instability of that test addition:
> >   -- the joinrel is not parallel-safe due to the OFFSET clause in the subquery
> >   explain (costs off)
> >       select * from tenk1 t1, (select * from tenk2 t2 offset 0) t2 where t1.two > t2.two;
> > -                QUERY PLAN
> > --------------------------------------------
> > +                   QUERY PLAN
> > +-------------------------------------------------
> >    Nested Loop
> >      Join Filter: (t1.two > t2.two)
> > -   ->  Gather
> > -         Workers Planned: 4
> > -         ->  Parallel Seq Scan on tenk1 t1
> > +   ->  Seq Scan on tenk2 t2
> >      ->  Materialize
> > -         ->  Seq Scan on tenk2 t2
> > +         ->  Gather
> > +               Workers Planned: 4
> > +               ->  Parallel Seq Scan on tenk1 t1
> >   (7 rows)
>
> Thank you for the report and investigation.  Will have a look.

The problemed plan is a non-parallel nestloop join.  It's just chance
which join order the planner will pick, and slight variations in
underlying statistics could result in a different displayed plan.
From the two verbose plans, we can see slight variations in the
statistics for the parallel seqscan of tenk1.

->  Parallel Seq Scan on public.tenk1 t1  (cost=0.00..370.00 rows=2500
width=244)

VS.

->  Parallel Seq Scan on public.tenk1 t1  (cost=0.00..369.99 rows=2499
width=244)

I have no idea why the underlying statistics changed, but it seems
that this slight change is sufficent to result in a different plan.

According to the discussion in [1], I think what we wanted to test
with this query is that parallel nestloop join is not generated if the
inner path is not parallel-safe.  Therefore, I modified this test case
to use a lateral join, rendering the inner path not parallel-safe
while also enforcing the join order.  Please see attached.

[1] https://postgr.es/m/5641923793cef7706395a34e62538b75d05e498b.camel@post.pl

Thanks
Richard

Attachment

v1-0001-Fix-unstable-test-in-select_parallel.sql.patch

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Alexander Lakhin

Date:

19 July, 07:00:01

Hello Richard,

18.07.2024 17:30, Richard Guo wrote:
> The problemed plan is a non-parallel nestloop join.  It's just chance
> which join order the planner will pick, and slight variations in
> underlying statistics could result in a different displayed plan.
>  From the two verbose plans, we can see slight variations in the
> statistics for the parallel seqscan of tenk1.
>
> ->  Parallel Seq Scan on public.tenk1 t1  (cost=0.00..370.00 rows=2500
> width=244)
>
> VS.
>
> ->  Parallel Seq Scan on public.tenk1 t1  (cost=0.00..369.99 rows=2499
> width=244)
>
> I have no idea why the underlying statistics changed, but it seems
> that this slight change is sufficent to result in a different plan.

I think it could be caused by the same reason as [1] and I really can
easily (without multiple instances/loops. just with `make check`) reproduce
the failure with cranky-ConditionalLockBufferForCleanup.patch (but
targeted for "VACUUM ANALYZE tenk1;").

> According to the discussion in [1], I think what we wanted to test
> with this query is that parallel nestloop join is not generated if the
> inner path is not parallel-safe.  Therefore, I modified this test case
> to use a lateral join, rendering the inner path not parallel-safe
> while also enforcing the join order.  Please see attached.

The modified test survives my testing procedure. Thank you for the patch!

[1] https://www.postgresql.org/message-id/flat/66eb9a6e-fc67-a230-c5b1-2a741e8b88c6%40gmail.com

Best regards,
Alexander

Re: Should consider materializing the cheapest inner path in consider_parallel_nestloop()

From

Richard Guo

Date:

22 July, 05:43:50

On Fri, Jul 19, 2024 at 12:00 PM Alexander Lakhin <exclusion@gmail.com> wrote:
> 18.07.2024 17:30, Richard Guo wrote:
> > I have no idea why the underlying statistics changed, but it seems
> > that this slight change is sufficent to result in a different plan.
>
> I think it could be caused by the same reason as [1] and I really can
> easily (without multiple instances/loops. just with `make check`) reproduce
> the failure with cranky-ConditionalLockBufferForCleanup.patch (but
> targeted for "VACUUM ANALYZE tenk1;").

Yeah.  Anyway I think we need to make the test more tolerant of slight
variations in the statistics.

> > According to the discussion in [1], I think what we wanted to test
> > with this query is that parallel nestloop join is not generated if the
> > inner path is not parallel-safe.  Therefore, I modified this test case
> > to use a lateral join, rendering the inner path not parallel-safe
> > while also enforcing the join order.  Please see attached.
>
> The modified test survives my testing procedure. Thank you for the patch!

Thanks for testing this patch.  I've pushed it.

Thanks
Richard