Home > mailing lists

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

From	Arne Roland
Subject	Re: apply_scanjoin_target_to_paths and partitionwise join
Date	October 29 13:43:10
Msg-id	b2af6332-8ecc-442c-8cd9-e50b2e944ff7@malkut.net Whole thread Raw
In response to	Re: apply_scanjoin_target_to_paths and partitionwise join (Robert Haas <robertmhaas@gmail.com>)
Responses	Re: apply_scanjoin_target_to_paths and partitionwise join
List	pgsql-hackers

Tree view

Hi Robert,

Richard already covered a lot. I mainly want to reiterate, that a public test case would be immensely helpful.

On 2025-10-28 21:17, Robert Haas wrote:

On Mon, Oct 27, 2025 at 5:12 PM Robert Haas <robertmhaas@gmail.com> wrote:

I haven't had a chance just yet to think through all the details of
the proposed patch, but I now believe we should commit something along
those lines. I still suspect that back-patching is unwise; even though
I now agree with Ashutosh's claim that this is a bug, because previous
experience with destabilizing plans in back-branches has not been
good. Hence, I'm inclined to fix only master. I do think the comments
in the patch need some work, and I plan to tackle that tomorrow.

It seems that, in the time sense this patch was originally posted,
it's been side-swiped by Richard Guo's commits 24225ad9aafc and
9b282a9359a1, with the result that the regression tests now fail with
the patch applied, and I'm not immediately certain how to clean that
up. I'm also not sure that the way the patch handles the test cases it
did adjust is optimal. Here is some preliminary analysis; opinions
appreciated.

With the patch as last posted applied, I see three regression test
failures. The first one is for this query:
 explain (verbose, costs off) select * from unique_tbl_p t1, unique_tbl_p t2 where (t1.a, t2.a) in (select a, a from unique_tbl_p t3) order by t1.a, t2.a;

You earlier requested a case, where we can in fact measure an advantage of the new plan. I think we won't be able to get rid of the disadvantages. You said yourself beautifully:

On 2025-01-02 20:43:12, Robert Haas wrote:

I don't actually have a clear understanding of why we need this. In
https://www.postgresql.org/message-id/CAKZiRmyaFFvxyEYGG_hu0F-EVEcqcnveH23MULhW6UY_jwykGw%40mail.gmail.com
Jakub says that an EDB customer experienced a case where the
partitionwise plan took 530+s and the non-partitionwise plan took 23s,
but unfortunately there's no public test case, and in the examples
shared publicly, either the partionwise plan is actually slower but is
mistakenly estimated to be faster, or the two are extremely close to
the same speed so it doesn't really matter. So the customer scenario
(which is not public) is justification for a code-change, but the
publicly-posted examples, as far as I can see, are not.

The Q1 you mentioned sadly isn't a real test case, where I can measure performance impact. More an academic difference in costs, which I don't fully comprehend as of now.

On 2025-10-28 21:17, Robert Haas wrote:

[...]In the Q1 case, above, we
apparently reduce the cost specifically by not flushing the path list.
But here, we just end up picking a nearly equivalent path with a
nearly-equivalent cost. At least, that means the test case isn't
likely to be stable, and we could just patch around that, as Ashutosh
did, by suppressing partitionwise join (it is not clear whether this
compromises the goals of the test case, but it's not obvious that it
does). But it might also be taken as a worrying indication that plans
of this form are going to come out as either partitionwise or not
based on essentially random factors, which could be viewed as a flaw
in the approach. I'm not really sure which way to view it, and if is a
flaw in the approach, then I'm not sure what to do instead.

While this is probably a common occurrence, the use of CPU cycles is close enough, that I suspect this wouldn't be a massive issue. The main problem I see between these two very similar plans seem to me the potentially very different memory footprint. The work_mem spill file is still independent per worker node. With the patch, I can easily see a world, where that never becomes a problem on some development database, but on the nearly identical live database. This behavior seems incredibly hard to test for.

Thoughts?

Did you encounter a case a in production, that made you reevaluate this thread? If so a public reproducer would be very appreciated.

Regards
Arne

pgsql-hackers by date:

From: "Joel Jacobson"
Date: 29 October, 13:33:07
Subject: Re: Optimize LISTEN/NOTIFY

From: Dilip Kumar
Date: 29 October, 13:44:53
Subject: Re: Logical Replication of sequences

Re: apply_scanjoin_target_to_paths and partitionwise join - Mailing list pgsql-hackers

Previous

Next