Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

From Ashutosh Bapat
Subject Re: apply_scanjoin_target_to_paths and partitionwise join
Date
Msg-id CAExHW5uRW=Z==bmLR=NXm6Vv3JGH4rUvb+Rfft8TfjrfzUUm3g@mail.gmail.com
Whole thread Raw
In response to Re: apply_scanjoin_target_to_paths and partitionwise join  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Dec 5, 2025 at 11:48 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Dec 3, 2025 at 9:53 PM Richard Guo <guofenglinux@gmail.com> wrote:
> > But I admit that I am unsure if this addresses
> > any real problems, so the effort might not be justified.  I agree that
> > maybe we should just go ahead with your current patch and see what
> > happens.
>
> Thanks, I have committed that patch. So far the buildfarm seems OK...

Thanks a lot Robert and Richard. Sorry for responding so late. I had
lost the context required to respond to the thread. I have fully
caught up now and reviewed the commit.

I repeated the experiments mentioned in [1] by Robert. I see the same
results as Robert. I ran the queries multiple times with assert
enabled dev binary as well as release binary with assertions disabled.
The results are consistent across the runs and in both the cases the
query was slower after enabling partitionwise join. This is a good
reproduction, required as per [2]. I looked again at the queries I
tried to reproduce this regression. Back then, I had tried many
variations of the query mentioned in [4] using many partitions as well
as large data sets. What stands out in Robert's query is it has 10000
* 10000 join, has an index on the column in join condition and that
column is the sole column in the table; all factors contributing to
the reproduction. I didn't go to that extreme. Thanks Robert for
coming up with elusive reproduction.

Generally the reworded comment is better than mine. However, I feel
the connection between "rel only has Append and MergeAppend paths" and
the condition IS_SIMPLE_REL(rel) is obscure; something that my wording
made explicit. However, I don't think we need to change that now.

> Another test case failure that would have happened had Ashutosh not
> compensated for it in the patch is with this query:
>
> EXPLAIN (COSTS OFF)
> SELECT x.id, y.id FROM fract_t x LEFT JOIN fract_t y USING (id) ORDER
> BY x.id ASC LIMIT 10;
>
> Now, currently, this chooses a partitionwise-join. The Limit node at
> the top of the plan tree has a cost of 2.22 and the underlying Merge
> Append node has a cost of 223.11. But if you apply the fix and not
> Ashutosh's adjustments to the test case, then you get a
> non-partitionwise join, where the Limit at the top of the plan tree
> has a cost of 2.21 and the underlying Merge Append node has a cost of
> 223.10. Since we're just trying to decide whether to Append some Merge
> Joins or Merge Join some Appends, it's not surprising that the costs
> are very close. However, I also find it slightly discouraging in terms
> of accepting Ashutosh's proposed fix. In the Q1 case, above, we
> apparently reduce the cost specifically by not flushing the path list.
> But here, we just end up picking a nearly equivalent path with a
> nearly-equivalent cost. At least, that means the test case isn't
> likely to be stable, and we could just patch around that, as Ashutosh
> did, by suppressing partitionwise join (it is not clear whether this
> compromises the goals of the test case, but it's not obvious that it
> does). But it might also be taken as a worrying indication that plans
> of this form are going to come out as either partitionwise or not
> based on essentially random factors, which could be viewed as a flaw
> in the approach. I'm not really sure which way to view it, and if is a
> flaw in the approach, then I'm not sure what to do instead.

This was discussed earlier in the thread around [5]. According to Arne
those queries were added to make sure that the code added by
6b94e7a6da2f1c6df1a42efe64251f32a444d174 was getting exercised even in
partitionwise join planning. Even with my adjustment to the test, it
was getting exercised, so I thought it was ok to adjust the expected
output instead of trying to make the query use partitionwise join.
However, your fix is better since the query still uses partitionwise
join and also exercises the required code. Thanks again. I verified
that the code gets exercised under a debugger.

My patch removed a redundant SET enable_partitionwise_join = on. That
change is not included in your commit, I believe, because it's
irrelevant to the fix. However, it's better to avoid the redundancy to
avoid confusion. PFA patch for the same.

[1] https://www.postgresql.org/message-id/CA%2BTgmoY9egbm1qehv%3DaSp%2BcwckOdbbdRePaXpBRSdL6PWDjvuQ%40mail.gmail.com
[2] https://www.postgresql.org/message-id/303774.1735851504%40sss.pgh.pa.us
[3] https://www.postgresql.org/message-id/786.1565541557%40sss.pgh.pa.us
[4] https://www.postgresql.org/message-id/CAKZiRmyaFFvxyEYGG_hu0F-EVEcqcnveH23MULhW6UY_jwykGw%40mail.gmail.com
[5] https://www.postgresql.org/message-id/CAExHW5uZh5fJ2siyYpwVWUq%2Br5EW_gatE1WRZ9H56x%2B115CUCg%40mail.gmail.com


--
Best Wishes,
Ashutosh Bapat

Attachment

pgsql-hackers by date:

Previous
From: Zsolt Parragi
Date:
Subject: Re: Periodic authorization expiration checks using GoAway message
Next
From: Tomas Vondra
Date:
Subject: Re: failed NUMA pages inquiry status: Operation not permitted