Thread: Query result differences between PostgreSQL 17 vs 16
Hi,
We have observed an issue after upgrading to PostgreSQL 17 that caused us to roll back to 16. Some of our queries are returning what I believe to be erroneous results.
I've attached two files here that can be used to reproduce what I'm seeing:
schema_plus_data.sql - This contains a pg_dump of a reproducible test case with a contrived schema and dataset of our use case.
reproducer.sql - This isn't necessary, but perhaps you find it helpful. This is a SQL script used to generate the schema and random data that hits this edge case. It was used to generate the previous file and it has reliably hit the issue so far from the handful of times I've tried.
The query I'm observing issues for is the following:
SELECT * FROM rc1
LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description
LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference
LEFT JOIN LATERAL rc_select(rc3.id) ON rc3.id IS NOT NULL;
Under PostgreSQL 17, I'm seeing ~400k results returned, whereas in PostgreSQL 16, I see ~6k. The results I believe to be erroneous are those that have 'BUG HIT' in the output for PostgreSQL 17. These are results joined from rc_select where rc3.id is null. I'm not expecting to see any of these rows as is the case in PostgreSQL 16 output (and 15 as well from prior experience).
I've observed this behavior in the latest PostgreSQL 17.4 but have also encountered this in 17.2 and 17.3. The OS being used is RHEL 9.5 (plow). Please let me know if you need any more information.
Thank you in advance for any help,
Ron
Attachment
On Fri, Feb 21, 2025 at 11:13:17AM -0500, Ronald Cruz wrote: > Hi, > > We have observed an issue after upgrading to PostgreSQL 17 that caused us to > roll back to 16. Some of our queries are returning what I believe to be > erroneous results. > > I've attached two files here that can be used to reproduce what I'm seeing: > > schema_plus_data.sql - This contains a pg_dump of a reproducible test case with > a contrived schema and dataset of our use case. > > reproducer.sql - This isn't necessary, but perhaps you find it helpful. This is > a SQL script used to generate the schema and random data that hits this edge > case. It was used to generate the previous file and it has reliably hit the > issue so far from the handful of times I've tried. > > The query I'm observing issues for is the following: > > SELECT * FROM rc1 > LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description > LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference > LEFT JOIN LATERAL rc_select(rc3.id) ON rc3.id IS NOT NULL; > > Under PostgreSQL 17, I'm seeing ~400k results returned, whereas in PostgreSQL > 16, I see ~6k. The results I believe to be erroneous are those that have 'BUG > HIT' in the output for PostgreSQL 17. These are results joined from rc_select > where rc3.id is null. I'm not expecting to see any of these rows as is the case > in PostgreSQL 16 output (and 15 as well from prior experience). > > I've observed this behavior in the latest PostgreSQL 17.4 but have also > encountered this in 17.2 and 17.3. The OS being used is RHEL 9.5 (plow). Please > let me know if you need any more information. We have a known problem with composite types and NULL constraints in PG 17 that I think we are fixing in PG 18. I saw IS NOT NULL in your query so I thought I would mention it: https://www.postgresql.org/message-id/Z37p0paENWWUarj-%40momjian.us We do have several NULL optimizations in PG 17. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Bruce Momjian <bruce@momjian.us> writes: > On Fri, Feb 21, 2025 at 11:13:17AM -0500, Ronald Cruz wrote: >> We have observed an issue after upgrading to PostgreSQL 17 that caused us to >> roll back to 16. Some of our queries are returning what I believe to be >> erroneous results. > We have a known problem with composite types and NULL constraints in PG > 17 that I think we are fixing in PG 18. There's no composite type at hand here. I think the problem is an erroneous deduction from a column NOT NULL constraint. I can reproduce a faulty plan in the regression database with explain (costs off) select * from tenk1 left join int4_tbl i on (unique1 = f1) left join customer on (i.f1 = cid) left join int4_tbl j on cid is not null; v16 produces Hash Left Join Hash Cond: (tenk1.unique1 = i.f1) -> Seq Scan on tenk1 -> Hash -> Nested Loop Left Join Join Filter: (customer.cid IS NOT NULL) -> Hash Right Join Hash Cond: (customer.cid = i.f1) -> Seq Scan on customer -> Hash -> Seq Scan on int4_tbl i -> Materialize -> Seq Scan on int4_tbl j but HEAD produces Hash Left Join Hash Cond: (tenk1.unique1 = i.f1) -> Seq Scan on tenk1 -> Hash -> Nested Loop Left Join -> Hash Right Join Hash Cond: (customer.cid = i.f1) -> Seq Scan on customer -> Hash -> Seq Scan on int4_tbl i -> Materialize -> Seq Scan on int4_tbl j Note the lack of any IS NOT NULL test. I think the planner has convinced itself that the not null constraint on customer.cid makes that test redundant, despite the fact that what it is testing is a post-outer-join value that most certainly could be null. "git bisect" fingers this commit: b262ad440edecda0b1aba81d967ab560a83acb8a is the first bad commit commit b262ad440edecda0b1aba81d967ab560a83acb8a Author: David Rowley <drowley@postgresql.org> Date: Tue Jan 23 18:09:18 2024 +1300 Add better handling of redundant IS [NOT] NULL quals I've not looked at the code, but I suspect that it is failing to check varnullingrels before believing that it can trust the applicability of table constraints. regards, tom lane
On Sat, Feb 22, 2025 at 8:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > "git bisect" fingers this commit: > > b262ad440edecda0b1aba81d967ab560a83acb8a is the first bad commit > commit b262ad440edecda0b1aba81d967ab560a83acb8a > Author: David Rowley <drowley@postgresql.org> > Date: Tue Jan 23 18:09:18 2024 +1300 > > Add better handling of redundant IS [NOT] NULL quals > > > I've not looked at the code, but I suspect that it is failing > to check varnullingrels before believing that it can trust > the applicability of table constraints. Hmm, we do check varnullingrels in expr_is_nonnullable(). My best guess is that we have generated two versions of the qual 'customer.cid IS NOT NULL': one with customer.cid marked as nullable by the left join to customer, and one without. The latter is dropped because of the not null constraint on customer.cid, while the former fails to be applied on the left join to int4_tbl j. Thanks Richard
Richard Guo <guofenglinux@gmail.com> writes: > On Sat, Feb 22, 2025 at 8:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've not looked at the code, but I suspect that it is failing >> to check varnullingrels before believing that it can trust >> the applicability of table constraints. > Hmm, we do check varnullingrels in expr_is_nonnullable(). My best > guess is that we have generated two versions of the qual 'customer.cid > IS NOT NULL': one with customer.cid marked as nullable by the left > join to customer, and one without. The latter is dropped because of > the not null constraint on customer.cid, while the former fails to be > applied on the left join to int4_tbl j. If the check against table not-null constraints is applied after we clone outer-join quals, that's probably bad. I think there are assumptions in there that every clone qual will have doppelgangers, so filtering NOT NULLs later would break that. Maybe not applying the filter to quals marked has_clone or is_clone would help? regards, tom lane
On Sat, Feb 22, 2025 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > On Sat, Feb 22, 2025 at 8:51 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > >> I've not looked at the code, but I suspect that it is failing > >> to check varnullingrels before believing that it can trust > >> the applicability of table constraints. > > > Hmm, we do check varnullingrels in expr_is_nonnullable(). My best > > guess is that we have generated two versions of the qual 'customer.cid > > IS NOT NULL': one with customer.cid marked as nullable by the left > > join to customer, and one without. The latter is dropped because of > > the not null constraint on customer.cid, while the former fails to be > > applied on the left join to int4_tbl j. > > If the check against table not-null constraints is applied after we > clone outer-join quals, that's probably bad. I think there are > assumptions in there that every clone qual will have doppelgangers, > so filtering NOT NULLs later would break that. Maybe not applying > the filter to quals marked has_clone or is_clone would help? Yeah, I think this approach can fix the issue. Perhaps we should check whether the RestrictInfo is a clone clause and avoid assuming that it's always true in that case, maybe by adding something like below at the start of restriction_is_always_true. + if (restrictinfo->has_clone || restrictinfo->is_clone) + return false; We may lose some optimization opportunities with NOT NULL quals, as this could prevent us from reducing certain such quals to constant true, but I think correctness should be our top priority. I'll go ahead and write a patch. Thanks Richard
On Sat, Feb 22, 2025 at 12:08 PM Richard Guo <guofenglinux@gmail.com> wrote: > On Sat, Feb 22, 2025 at 11:32 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > If the check against table not-null constraints is applied after we > > clone outer-join quals, that's probably bad. I think there are > > assumptions in there that every clone qual will have doppelgangers, > > so filtering NOT NULLs later would break that. Maybe not applying > > the filter to quals marked has_clone or is_clone would help? > Yeah, I think this approach can fix the issue. Perhaps we should > check whether the RestrictInfo is a clone clause and avoid assuming > that it's always true in that case, maybe by adding something like > below at the start of restriction_is_always_true. > > + if (restrictinfo->has_clone || restrictinfo->is_clone) > + return false; > > We may lose some optimization opportunities with NOT NULL quals, as > this could prevent us from reducing certain such quals to constant > true, but I think correctness should be our top priority. > > I'll go ahead and write a patch. Here is the patch. Thanks Richard
Attachment
Richard Guo <guofenglinux@gmail.com> writes: > Here is the patch. Thanks for that. The code and comment added to restriction_is_always_true look good, but I can't help wondering whether we don't need the same in restriction_is_always_false. Not very sure what a query proving the need for that would look like, but leaving it asymmetric feels wrong. regards, tom lane
On Wed, Feb 26, 2025 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Here is the patch. > Thanks for that. The code and comment added to > restriction_is_always_true look good, but I can't help wondering > whether we don't need the same in restriction_is_always_false. > Not very sure what a query proving the need for that would > look like, but leaving it asymmetric feels wrong. Yeah, I think you are right. The thing here is that we don't have a reliable way to determine if the input expression of a NullTest is non-nullable if it's a clone clause. This applies to both restriction_is_always_true and restriction_is_always_false. So I think we should add the same check in restriction_is_always_false too. I'll give it a try to find a query that shows this is necessary. Thanks Richard
On Wed, Feb 26, 2025 at 4:03 PM Richard Guo <guofenglinux@gmail.com> wrote: > On Wed, Feb 26, 2025 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Thanks for that. The code and comment added to > > restriction_is_always_true look good, but I can't help wondering > > whether we don't need the same in restriction_is_always_false. > > Not very sure what a query proving the need for that would > > look like, but leaving it asymmetric feels wrong. > Yeah, I think you are right. The thing here is that we don't have a > reliable way to determine if the input expression of a NullTest is > non-nullable if it's a clone clause. This applies to both > restriction_is_always_true and restriction_is_always_false. So I > think we should add the same check in restriction_is_always_false too. > > I'll give it a try to find a query that shows this is necessary. After some attempts, I failed to find such a query. An IS NULL qual is not strict, and it makes the outer join not commutable with other joins per outer-join identity 3. We could add another strict qual to the join clauses to make the outer join commutable. However, in this case whether we reduce the IS NULL qual to a constant false would not affect the join outputs, due to the strict qual. I'm starting to believe that reducing an IS NULL clone clause to a constant false should be safe. Any thoughts? Thanks Richard
Richard Guo <guofenglinux@gmail.com> writes: >> On Wed, Feb 26, 2025 at 3:26 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> Thanks for that. The code and comment added to >>> restriction_is_always_true look good, but I can't help wondering >>> whether we don't need the same in restriction_is_always_false. > After some attempts, I failed to find such a query. An IS NULL qual > is not strict, and it makes the outer join not commutable with other > joins per outer-join identity 3. We could add another strict qual to > the join clauses to make the outer join commutable. However, in this > case whether we reduce the IS NULL qual to a constant false would not > affect the join outputs, due to the strict qual. > I'm starting to believe that reducing an IS NULL clone clause to a > constant false should be safe. Any thoughts? Not sure. I think the risk factor here is fooling the clone-clause mechanisms by failing to install some clone of an original outer join clause in a rel's restriction list. Looking at the callers of restriction_is_always_false: add_base_clause_to_rel add_join_clause_to_rels These two replace the clause with a Const "false" that is marked with the same rinfo_serial. Therefore the expectation that every group of clones will have a representative is still satisfied, even though the representative doesn't contain quite what you might expect. apply_child_basequals This just summarily rejects the child rel as not needing to be built. I think that might be okay, since we don't really make any further deductions with an appendrel child's quals. I've not actually worked through the bug case to verify my hypothesis that it's the lack of a representative for a clone group that is breaking things. (Maybe you did?) But if that is the direct cause of the failure, then it's plausible that none of these callers are subject to it. I'm a bit inclined to argue that we should put in the no-clone-quals limitation anyway, to protect future callers that might not manage to dodge the problem. OTOH, if we get rid of these functions altogether as I expressed my desire to do nearby, it's all moot. regards, tom lane
On Thu, Feb 27, 2025 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I've not actually worked through the bug case to verify my hypothesis > that it's the lack of a representative for a clone group that is > breaking things. (Maybe you did?) But if that is the direct > cause of the failure, then it's plausible that none of these > callers are subject to it. Yeah, I think this is the cause of the failure. Specifically, when we start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need a way to prevent upper clauses that use only C Vars from being pushed down and applied as a filter clause at the lower B/C join. To achieve this, we consider that the pushed-down B/C join has not completed, and hence do not include B/C join's ojrelid in its relid set (it will be added when we form the A/B join). This is why, in this case, the version of 'customer.cid IS NOT NULL' that is not marked as nullable by the left join to customer is chosen: we've assumed that that left join has not completed yet. However, performing the NOT NULL deduction on this clone would be unsafe. It seems to me that, for clone clauses, the nullingrels mark is not that reliable for performing such deductions. > I'm a bit inclined to argue that we should put in the no-clone-quals > limitation anyway, to protect future callers that might not manage > to dodge the problem. Now I'm inclined to agree with you. It seems we've been lucky so far not to encounter issues with IS NULL deductions on clone clauses, but this feels like a problem waiting to happen in the future. The no-clone-quals limitation in restriction_is_always_false makes the fix in f00ab1fd1 unnecessary. I think we should revert those changes. Attached is the updated patch. > OTOH, if we get rid of these functions > altogether as I expressed my desire to do nearby, it's all moot. Yeah, if we can find a way to perform the NullTest deduction during eval_const_expressions, it will be much cleaner. Thanks Richard
Attachment
Richard Guo <guofenglinux@gmail.com> writes: > On Thu, Feb 27, 2025 at 1:28 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: >> I've not actually worked through the bug case to verify my hypothesis >> that it's the lack of a representative for a clone group that is >> breaking things. (Maybe you did?) > Yeah, I think this is the cause of the failure. Specifically, when we > start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need > a way to prevent upper clauses that use only C Vars from being pushed > down and applied as a filter clause at the lower B/C join. To achieve > this, we consider that the pushed-down B/C join has not completed, and > hence do not include B/C join's ojrelid in its relid set (it will be > added when we form the A/B join). > This is why, in this case, the version of 'customer.cid IS NOT NULL' > that is not marked as nullable by the left join to customer is chosen: > we've assumed that that left join has not completed yet. Oh! So I found the right solution through the wrong chain of reasoning ;=). It's not that the clone group is missed during qual selection: it's that the selected representative lacks a nullingrel bit, so when add_join_clause_to_rel acts on it later, it thinks it's okay to throw away. > However, > performing the NOT NULL deduction on this clone would be unsafe. It > seems to me that, for clone clauses, the nullingrels mark is not that > reliable for performing such deductions. Precisely. We have to remove the nullingrel bit from the Var, else the placement cross-checks in setrefs.c will (correctly) complain that it's being evaluated at the wrong level. But relying on it in this particular way at this particular point in the proceedings gives the wrong conclusion. >> I'm a bit inclined to argue that we should put in the no-clone-quals >> limitation anyway, to protect future callers that might not manage >> to dodge the problem. > Now I'm inclined to agree with you. It seems we've been lucky so far > not to encounter issues with IS NULL deductions on clone clauses, but > this feels like a problem waiting to happen in the future. Yes, I'm on board with that too. I think the comments need more work than you've done though. I think they should say something about "nullingrel bits in clone clauses may not reflect reality, so we dare not draw conclusions from clones about whether Vars are guaranteed not-null". Also, I'm not convinced any more that this has anything to do with missing clone members, so we should remove that claim. > The no-clone-quals limitation in restriction_is_always_false makes the > fix in f00ab1fd1 unnecessary. I think we should revert those changes. Hmm, I'm less on board with that. The saving-and-restoring of last_rinfo_serial looks correct to me, even though this fix might cause the test case you found before to not tickle that bug. > Attached is the updated patch. I'm confused by the changes to existing test cases in this? It looks like you're undoing some test changes that are not from f00ab1fd1, so I don't understand why. My inclination here would be to add a new test that exposes the current bug, but leave the rest alone, even though they may no longer be testing exactly what they were originally meant to do. regards, tom lane
I wrote: > Richard Guo <guofenglinux@gmail.com> writes: >> Yeah, I think this is the cause of the failure. Specifically, when we >> start with (A/B)/C and transform to A/(B/C) per OJ identity 3, we need >> a way to prevent upper clauses that use only C Vars from being pushed >> down and applied as a filter clause at the lower B/C join. To achieve >> this, we consider that the pushed-down B/C join has not completed, and >> hence do not include B/C join's ojrelid in its relid set (it will be >> added when we form the A/B join). >> This is why, in this case, the version of 'customer.cid IS NOT NULL' >> that is not marked as nullable by the left join to customer is chosen: >> we've assumed that that left join has not completed yet. > Oh! So I found the right solution through the wrong chain of > reasoning ;=). It's not that the clone group is missed during qual > selection: it's that the selected representative lacks a nullingrel > bit, so when add_join_clause_to_rel acts on it later, it thinks it's > okay to throw away. I found the time to trace through this example in detail, and your description is accurate. This is explained (perhaps not at sufficient length) in optimizer/README: As usual, outer join identity 3 complicates matters. If we start with (A leftjoin B on (Pab)) leftjoin C on (Pbc) then the parser will have marked any C Vars appearing above these joins with the RT index of the B/C join. If we now transform to A leftjoin (B leftjoin C on (Pbc)) on (Pab) then it would appear that a clause using only such Vars could be pushed down and applied as a filter clause (not a join clause) at the lower B/C join. But *this might not give the right answer* since the clause might see a non-null value for the C Var that will be replaced by null once the A/B join is performed. We handle this by saying that the pushed-down join hasn't completely performed the work of the B/C join and hence is not entitled to include that outer join relid in its relid set. When we form the A/B join, both outer joins' relids will be added to its relid set, and then the upper clause will be applied at the correct join level. (Note there is no problem when identity 3 is applied in the other direction: if we started with the second form then upper C Vars are marked with both outer join relids, so they cannot drop below whichever join is applied second.) Similarly, Vars representing the output of a pushed-down join do not acquire ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ nullingrel bits for that join until after the upper join is performed. ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ So in the case at hand, we generate a clone of the "rc3.id IS NOT NULL" clause that doesn't have a nullingrel bit for the left join to rc3, but nonetheless is expected to be applied after that join is performed. There is also a clone that does have a nullingrel bit, but that doesn't save us in this example because it will only be applied in join orders other than the one that gets chosen. It'd certainly be nice to make that cleaner, but I'm not sure how, and we wouldn't risk back-patching the nontrivial changes that would be needed. So I think rejecting has_clone/is_clone clauses is the only path forward for now. Also, I found a test case proving that restriction_is_always_false needs it too. If we patch only restriction_is_always_true, then try this modification of the submitted repro: regression=# explain (costs off) SELECT * FROM rc1 LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference LEFT JOIN LATERAL rc_select(rc3.id) rs ON rc3.id IS NOT NULL and rc3.rc2_reference IS NULL; QUERY PLAN -------------------------------------------------------------------- Hash Right Join Hash Cond: ((rc2.rc1_reference)::text = (rc1.description)::text) -> Nested Loop Left Join Join Filter: ((rc3.id IS NOT NULL) AND false) -> Hash Right Join Hash Cond: (rc3.rc2_reference = rc2.id) -> Seq Scan on rc3 -> Hash -> Seq Scan on rc2 -> Result One-Time Filter: false -> Hash -> Seq Scan on rc1 (13 rows) restriction_is_always_false has decided that "rc3.rc2_reference IS NULL" is always false, although for exactly the same reasons discussed here, it isn't. One other interesting point here is that the code has also decided that the scan on rc_select can be reduced to dummy because the ON qual is unsatisfiable. I think that is actually correct in this case, but the conclusion is based on very faulty logic, so I don't have any faith that there aren't nearby cases where that would be another bug. With both is_always functions patched, I get regression=# explain (costs off) SELECT * FROM rc1 LEFT JOIN rc2 ON rc2.rc1_reference = rc1.description LEFT JOIN rc3 ON rc2.id = rc3.rc2_reference LEFT JOIN LATERAL rc_select(rc3.id) rs ON rc3.id IS NOT NULL and rc3.rc2_reference IS NULL; QUERY PLAN ----------------------------------------------------------------------------- Hash Right Join Hash Cond: ((rc2.rc1_reference)::text = (rc1.description)::text) -> Nested Loop Left Join Join Filter: ((rc3.id IS NOT NULL) AND (rc3.rc2_reference IS NULL)) -> Hash Right Join Hash Cond: (rc3.rc2_reference = rc2.id) -> Seq Scan on rc3 -> Hash -> Seq Scan on rc2 -> Function Scan on rc_select rs -> Hash -> Seq Scan on rc1 (12 rows) Maybe that's leaving something on the table, but I'm content to call it good for today. regards, tom lane
On Sat, Mar 1, 2025 at 12:48 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Now I'm inclined to agree with you. It seems we've been lucky so far > > not to encounter issues with IS NULL deductions on clone clauses, but > > this feels like a problem waiting to happen in the future. > > Yes, I'm on board with that too. I think the comments need more work > than you've done though. I think they should say something about > "nullingrel bits in clone clauses may not reflect reality, so we dare > not draw conclusions from clones about whether Vars are guaranteed > not-null". Also, I'm not convinced any more that this has anything to > do with missing clone members, so we should remove that claim. Agreed. Thanks for the suggestion. > > The no-clone-quals limitation in restriction_is_always_false makes the > > fix in f00ab1fd1 unnecessary. I think we should revert those changes. > > Hmm, I'm less on board with that. The saving-and-restoring of > last_rinfo_serial looks correct to me, even though this fix might > cause the test case you found before to not tickle that bug. Fair point. > > Attached is the updated patch. > > I'm confused by the changes to existing test cases in this? It looks > like you're undoing some test changes that are not from f00ab1fd1, so > I don't understand why. My inclination here would be to add a new > test that exposes the current bug, but leave the rest alone, even > though they may no longer be testing exactly what they were originally > meant to do. Yeah, the patch modifies some existing test cases that are intended to test a join clause that is IS_NOT_NULL/IS_NULL qual can or cannot be reduced depending on whether its Var is nullable by any outer joins. And it happens that the join clauses in these test cases are clone clauses, which are excluded from reduction by the fix here. As a result, they are no longer testing what they are intended to. This patch modifies them to ensure that those join clauses are not clone clauses. This patch also adds a new test that exposes the current bug, and tests that clone clauses are not reduced to constant true/false. Here is an updated patch, which also includes the test case you found proving the no-clone-quals limitation in restriction_is_always_false is also needed. Thanks Richard
Attachment
On Sat, Mar 1, 2025 at 7:16 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > So in the case at hand, we generate a clone of the "rc3.id IS NOT > NULL" clause that doesn't have a nullingrel bit for the left join to > rc3, but nonetheless is expected to be applied after that join is > performed. There is also a clone that does have a nullingrel bit, but > that doesn't save us in this example because it will only be applied > in join orders other than the one that gets chosen. Exactly! > It'd certainly be nice to make that cleaner, but I'm not sure how, > and we wouldn't risk back-patching the nontrivial changes that would > be needed. So I think rejecting has_clone/is_clone clauses is the > only path forward for now. Yeah, a better fix might be to find a way to perform this NullTest deduction during eval_const_expressions, as you mentioned in the other thread. However, that would also require some nontrivial changes. > Also, I found a test case proving that restriction_is_always_false > needs it too. Thanks. I've included this test case in v3 patch. Thanks Richard
Richard Guo <guofenglinux@gmail.com> writes: > Here is an updated patch, which also includes the test case you found > proving the no-clone-quals limitation in restriction_is_always_false > is also needed. LGTM. regards, tom lane
On Tue, Mar 4, 2025 at 12:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Richard Guo <guofenglinux@gmail.com> writes: > > Here is an updated patch, which also includes the test case you found > > proving the no-clone-quals limitation in restriction_is_always_false > > is also needed. > LGTM. Pushed and back-patched to v17. The fix will be included in next minor release (17.5). Thanks Richard
On Tue, Mar 4, 2025 at 04:30:26PM +0900, Richard Guo wrote: > On Tue, Mar 4, 2025 at 12:22 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Richard Guo <guofenglinux@gmail.com> writes: > > > Here is an updated patch, which also includes the test case you found > > > proving the no-clone-quals limitation in restriction_is_always_false > > > is also needed. > > > LGTM. > > Pushed and back-patched to v17. The fix will be included in next > minor release (17.5). Uh, if we fixed this for NULL optimization and backpatched, should we do the same for: Parameter NOT NULL to CREATE DOMAIN not the same as CHECK (VALUE IS NOT NULL) https://www.postgresql.org/message-id/flat/173591158454.714.7664064332419606037%40wrigleys.postgresql.org -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Bruce Momjian <bruce@momjian.us> writes: > Uh, if we fixed this for NULL optimization and backpatched, should we do > the same for: > Parameter NOT NULL to CREATE DOMAIN not the same as CHECK (VALUE IS NOT NULL) > https://www.postgresql.org/message-id/flat/173591158454.714.7664064332419606037%40wrigleys.postgresql.org This is unrelated to that. regards, tom lane
On Tue, Mar 11, 2025 at 06:19:49PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Uh, if we fixed this for NULL optimization and backpatched, should we do > > the same for: > > Parameter NOT NULL to CREATE DOMAIN not the same as CHECK (VALUE IS NOT NULL) > > https://www.postgresql.org/message-id/flat/173591158454.714.7664064332419606037%40wrigleys.postgresql.org > > This is unrelated to that. Well, I was trying to figure out if we should be changing the NULL handling in 17.X, but now that you have done it with one patch, it seems the above URL patch should also be applied and backpatched to 17, no? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.
Bruce Momjian <bruce@momjian.us> writes: > Well, I was trying to figure out if we should be changing the NULL > handling in 17.X, but now that you have done it with one patch, it seems > the above URL patch should also be applied and backpatched to 17, no? If you want somebody to review that patch, shouldn't it be in the CF? regards, tom lane
On Tue, Mar 11, 2025 at 08:57:06PM -0400, Tom Lane wrote: > Bruce Momjian <bruce@momjian.us> writes: > > Well, I was trying to figure out if we should be changing the NULL > > handling in 17.X, but now that you have done it with one patch, it seems > > the above URL patch should also be applied and backpatched to 17, no? > > If you want somebody to review that patch, shouldn't it be in the CF? Uh, I wasn't sure if it would make it in the next minor release if I did that. Is it a good idea? -- Bruce Momjian <bruce@momjian.us> https://momjian.us EDB https://enterprisedb.com Do not let urgent matters crowd out time for investment in the future.