Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers
| From | Robert Haas | 
|---|---|
| Subject | Re: apply_scanjoin_target_to_paths and partitionwise join | 
| Date | |
| Msg-id | CA+TgmoZ6zmGo9r4EXi3WEzUZiFr=cqH89Hse+TGmCYU-bR=q5w@mail.gmail.com Whole thread Raw | 
| In response to | Re: apply_scanjoin_target_to_paths and partitionwise join (Arne Roland <arne.roland@malkut.net>) | 
| Responses | Re: apply_scanjoin_target_to_paths and partitionwise join | 
| List | pgsql-hackers | 
On Wed, Oct 29, 2025 at 9:23 PM Arne Roland <arne.roland@malkut.net> wrote: > The main factor of your example is, that the amount of rows handled by the (Merge) Append is different. Right. Although that's the main thing here, I am inclined to suspect there are other ways to hit this problem, maybe ways that are more likely to happen in the real world, because... > My second sentence just captured the mundane observation, if the join has significantly more tuples, than any base relation,the place of the (Merge) Append might be more relevant. If I join everything with a generate_series(1, 30000) Iget more tuples to process. ...as you imply, joins that inflate the row count are somewhat uncommon. They definitely do happen, but they're not the most typical pattern, and there might well be other reasons why a partitionwise join fails to win that we haven't figured out yet. These could even be cases where, for example, a certain optimization that works in the non-partitionwise case is not preserved in the partitionwise case. I feel like I now understand *one* case where Ashutosh's patch can make a demonstrable positive difference, but whether that's the only case that exists seems quite uncertain. > I'd like to make one more side note about this example: The planner punishes the partitionwise join for having an extranode, that emits N rows (three Hash joins + Append vs two Appends + Hash Join). This plan is chosen because of the cpu_tuple_cost.I'm happy it picks the plan with the smaller memory footprint, but in my real world experience for a timingbased approach the default cpu_tuple_cost tends to be too high to get a fair comparison between partitionwise and nonpartitionwise joins. Have you localized the problem to cpu_tuple_cost specifically, vs. cpu_index_tuple_cost or cpu_operator_cost? I've generally found that I need to reduce random_page_cost and seq_page_cost significantly to avoid getting sequential scans when index scans would be more reasonable, but that goes in the opposite direction as what you suggest here, in that it brings the I/O and CPU costs closer together, whereas your suggestion would push them further apart. I remember that Kevin Grittner used to say that the default value of this parameter was bad, too, but he recommended *raising* it: https://www.postgresql.org/message-id/1385148245.49487.YahooMailNeo%40web162904.mail.bf1.yahoo.com https://www.postgresql.org/message-id/4FF179780200002500048CBD@gw.wicourts.gov https://www.postgresql.org/message-id/CACjxUsNp4uEx3xsunw4wVpBDVomas7o6hnv_49bSbaz-HAVdyA%40mail.gmail.com I don't actually know what's best in terms of settings in this area. I don't have experience tuning for partitionwise join specifically. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: