Home > mailing lists

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: apply_scanjoin_target_to_paths and partitionwise join
Date	October 30 18:52:38
Msg-id	CA+TgmoZ6zmGo9r4EXi3WEzUZiFr=cqH89Hse+TGmCYU-bR=q5w@mail.gmail.com Whole thread Raw
In response to	Re: apply_scanjoin_target_to_paths and partitionwise join (Arne Roland <arne.roland@malkut.net>)
Responses	Re: apply_scanjoin_target_to_paths and partitionwise join
List	pgsql-hackers

Tree view

On Wed, Oct 29, 2025 at 9:23 PM Arne Roland <arne.roland@malkut.net> wrote:
> The main factor of your example is, that the amount of rows handled by the (Merge) Append is different.

Right. Although that's the main thing here, I am inclined to suspect
there are other ways to hit this problem, maybe ways that are more
likely to happen in the real world, because...

> My second sentence just captured the mundane observation, if the join has significantly more tuples, than any base
relation,the place of the (Merge) Append might be more relevant. If I join everything with a generate_series(1, 30000)
Iget more tuples to process.

...as you imply, joins that inflate the row count are somewhat
uncommon. They definitely do happen, but they're not the most typical
pattern, and there might well be other reasons why a partitionwise
join fails to win that we haven't figured out yet. These could even be
cases where, for example, a certain optimization that works in the
non-partitionwise case is not preserved in the partitionwise case. I
feel like I now understand *one* case where Ashutosh's patch can make
a demonstrable positive difference, but whether that's the only case
that exists seems quite uncertain.

> I'd like to make one more side note about this example: The planner punishes the partitionwise join for having an
extranode, that emits N rows (three Hash joins + Append vs two Appends + Hash Join). This plan is chosen because of the
cpu_tuple_cost.I'm happy it picks the plan with the smaller memory footprint, but in my real world experience for a
timingbased approach the default cpu_tuple_cost tends to be too high to get a fair comparison between partitionwise and
nonpartitionwise joins.

Have you localized the problem to cpu_tuple_cost specifically, vs.
cpu_index_tuple_cost or cpu_operator_cost? I've generally found that I
need to reduce random_page_cost and seq_page_cost significantly to
avoid getting sequential scans when index scans would be more
reasonable, but that goes in the opposite direction as what you
suggest here, in that it brings the I/O and CPU costs closer together,
whereas your suggestion would push them further apart. I remember that
Kevin Grittner used to say that the default value of this parameter
was bad, too, but he recommended *raising* it:

https://www.postgresql.org/message-id/1385148245.49487.YahooMailNeo%40web162904.mail.bf1.yahoo.com
https://www.postgresql.org/message-id/4FF179780200002500048CBD@gw.wicourts.gov
https://www.postgresql.org/message-id/CACjxUsNp4uEx3xsunw4wVpBDVomas7o6hnv_49bSbaz-HAVdyA%40mail.gmail.com

I don't actually know what's best in terms of settings in this area. I
don't have experience tuning for partitionwise join specifically.

--
Robert Haas
EDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Nico Williams
Date: 30 October, 18:32:23
Subject: Re: Channel binding for post-quantum cryptography

From: Maxim Orlov
Date: 30 October, 19:17:00
Subject: Re: POC: make mxidoff 64 bits

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

Previous

Next