Re: Query result differences between PostgreSQL 17 vs 16 - Mailing list pgsql-bugs
From | Tom Lane |
---|---|
Subject | Re: Query result differences between PostgreSQL 17 vs 16 |
Date | |
Msg-id | 2748914.1740757698@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Query result differences between PostgreSQL 17 vs 16 (Richard Guo <guofenglinux@gmail.com>) |
Responses |
Re: Query result differences between PostgreSQL 17 vs 16
Re: Query result differences between PostgreSQL 17 vs 16 |
List | pgsql-bugs |
Richard Guo <guofenglinux@gmail.com> writes: > On Thu, Feb 27, 2025 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've not actually worked through the bug case to verify my hypothesis >> that it's the lack of a representative for a clone group that is >> breaking things. (Maybe you did?) > Yeah, I think this is the cause of the failure. Specifically, when we > start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need > a way to prevent upper clauses that use only C Vars from being pushed > down and applied as a filter clause at the lower B/C join. To achieve > this, we consider that the pushed-down B/C join has not completed, and > hence do not include B/C join's ojrelid in its relid set (it will be > added when we form the A/B join). > This is why, in this case, the version of 'customer.cid IS NOT NULL' > that is not marked as nullable by the left join to customer is chosen: > we've assumed that that left join has not completed yet. Oh! So I found the right solution through the wrong chain of reasoning ;=). It's not that the clone group is missed during qual selection: it's that the selected representative lacks a nullingrel bit, so when add_join_clause_to_rel acts on it later, it thinks it's okay to throw away. > However, > performing the NOT NULL deduction on this clone would be unsafe. It > seems to me that, for clone clauses, the nullingrels mark is not that > reliable for performing such deductions. Precisely. We have to remove the nullingrel bit from the Var, else the placement cross-checks in setrefs.c will (correctly) complain that it's being evaluated at the wrong level. But relying on it in this particular way at this particular point in the proceedings gives the wrong conclusion. >> I'm a bit inclined to argue that we should put in the no-clone-quals >> limitation anyway, to protect future callers that might not manage >> to dodge the problem. > Now I'm inclined to agree with you. It seems we've been lucky so far > not to encounter issues with IS NULL deductions on clone clauses, but > this feels like a problem waiting to happen in the future. Yes, I'm on board with that too. I think the comments need more work than you've done though. I think they should say something about "nullingrel bits in clone clauses may not reflect reality, so we dare not draw conclusions from clones about whether Vars are guaranteed not-null". Also, I'm not convinced any more that this has anything to do with missing clone members, so we should remove that claim. > The no-clone-quals limitation in restriction_is_always_false makes the > fix in f00ab1fd1 unnecessary. I think we should revert those changes. Hmm, I'm less on board with that. The saving-and-restoring of last_rinfo_serial looks correct to me, even though this fix might cause the test case you found before to not tickle that bug. > Attached is the updated patch. I'm confused by the changes to existing test cases in this? It looks like you're undoing some test changes that are not from f00ab1fd1, so I don't understand why. My inclination here would be to add a new test that exposes the current bug, but leave the rest alone, even though they may no longer be testing exactly what they were originally meant to do. regards, tom lane
pgsql-bugs by date: