Thread: v16 regression - wrong query results with LEFT JOINs + join removal
While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7 through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my colleague Rushabh Lathia and I noticed that the merge caused certain queries to return wrong answers. Rushabh then discovered that this also happens in unmodified PostgreSQL. I believe the problem is most likely with one of these three commits: 3bef56e116 Invent "join domains" to replace the below_outer_join hack. b448f1c8d8 Do assorted mop-up in the planner. 2489d76c49 Make Vars be outer-join-aware. The following test case, mostly due to Rushabh, demonstrates the problem on current master: rhaas=# create table one_row as values (1); SELECT 1 rhaas=# create table three_oids as select * from (values (10::oid), (20::oid), (30::oid)) v(oid); SELECT 3 rhaas=# create table just_one_oid as select * from (values (10::oid)) v(oid); SELECT 1 rhaas=# SELECT r.oid, s.oid FROM just_one_oid s LEFT JOIN (select three_oids.oid from one_row, three_oids LEFT JOIN pg_db_role_setting s ON three_oids.oid = s.setrole AND s.setdatabase = 0::oid) r ON r.oid = s.oid; oid | oid -----+----- 10 | 10 20 | 10 30 | 10 (3 rows) The answer is clearly wrong, because the join clause requires r.oid and s.oid to be equal, but in the displayed output, they are not. I'm not quite sure what's happening here. The join to pg_db_role_setting gets removed, which seems correct, because it has a primary key index on (setrole, setdatabase), and the join clause requires both of those columns to be equal to some specific value. The output of the whole subselect is correct. But the join between the subselect and just_one_oid somehow goes wrong: the join filter that ought to be there is missing. Nested Loop Left Join (cost=0.00..272137387.88 rows=16581375000 width=8) -> Seq Scan on just_one_oid s (cost=0.00..35.50 rows=2550 width=4) -> Materialize (cost=0.00..139272.12 rows=6502500 width=4) -> Nested Loop (cost=0.00..81358.62 rows=6502500 width=4) -> Seq Scan on one_row (cost=0.00..35.50 rows=2550 width=0) -> Materialize (cost=0.00..48.25 rows=2550 width=4) -> Seq Scan on three_oids (cost=0.00..35.50 rows=2550 width=4) -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7 > through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my > colleague Rushabh Lathia and I noticed that the merge caused certain > queries to return wrong answers. I believe this is a variant of the existing open issue about the outer-join-Vars stuff: https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru although perhaps different in detail --- EXPLAIN shows that the join condition has gone missing completely, rather than just being put at the wrong join level. Possibly related to the join removal that happened to pg_db_role_setting? I've been incredibly distracted by $real-life over the past few weeks, but am hoping to get the outer-join-Vars issues resolved in the next day or two. regards, tom lane
On Thu, May 11, 2023 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7 > > through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my > > colleague Rushabh Lathia and I noticed that the merge caused certain > > queries to return wrong answers. > > I believe this is a variant of the existing open issue about > the outer-join-Vars stuff: > > https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru Ouch, so we've had a known queries-returning-wrong-answers bug for more than two months. That's not great. I'll try to find time today to check whether the patches on that thread resolve this issue. > I've been incredibly distracted by $real-life over the past few > weeks, but am hoping to get the outer-join-Vars issues resolved > in the next day or two. Thanks. Hope the real life distractions are nothing too awful, but best wishes either way. -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote: > Ouch, so we've had a known queries-returning-wrong-answers bug for > more than two months. That's not great. I'll try to find time today to > check whether the patches on that thread resolve this issue. I tried out the v3 patches from that thread and they don't seem to make any difference for me in this test case. So either (1) it's a different issue or (2) those patches don't fully fix it or (3) I'm bad at testing things. -- Robert Haas EDB: http://www.enterprisedb.com
Robert Haas <robertmhaas@gmail.com> writes: > On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote: >> Ouch, so we've had a known queries-returning-wrong-answers bug for >> more than two months. That's not great. I'll try to find time today to >> check whether the patches on that thread resolve this issue. > I tried out the v3 patches from that thread and they don't seem to > make any difference for me in this test case. So either (1) it's a > different issue or (2) those patches don't fully fix it or (3) I'm bad > at testing things. Yeah, I've just traced the problem to remove_rel_from_query() deciding that it can drop the qual of interest :-(. I'd done this: - if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids)) + if (bms_is_member(ojrelid, rinfo->required_relids)) as part of an unfinished effort at getting rid of RestrictInfos' is_pushed_down flags, and this example shows that the replacement condition is faulty. What I'm inclined to do about it is just revert this particular change. But I'd better run around and see where else I did that, because the idea is evidently not ready for prime time. regards, tom lane
I wrote: > Yeah, I've just traced the problem to remove_rel_from_query() deciding > that it can drop the qual of interest :-(. I'd done this: > - if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids)) > + if (bms_is_member(ojrelid, rinfo->required_relids)) > as part of an unfinished effort at getting rid of RestrictInfos' > is_pushed_down flags, and this example shows that the replacement > condition is faulty. > What I'm inclined to do about it is just revert this particular > change. But I'd better run around and see where else I did that, > because the idea is evidently not ready for prime time. Looks like that was the only such change in 2489d76c4 or follow-ons, so done, with a test case based on your example. regards, tom lane
On Thu, May 11, 2023 at 1:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > What I'm inclined to do about it is just revert this particular > > change. But I'd better run around and see where else I did that, > > because the idea is evidently not ready for prime time. > > Looks like that was the only such change in 2489d76c4 or > follow-ons, so done, with a test case based on your example. Thanks, Tom! -- Robert Haas EDB: http://www.enterprisedb.com
On Thu, May 11, 2023 at 11:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Ouch, so we've had a known queries-returning-wrong-answers bug for
> more than two months. That's not great. I'll try to find time today to
> check whether the patches on that thread resolve this issue.
I tried out the v3 patches from that thread and they don't seem to
make any difference for me in this test case. So either (1) it's a
different issue or (2) those patches don't fully fix it or (3) I'm bad
at testing things.
--
Robert Haas
EDB: http://www.enterprisedb.com
Forgive the noob question... But does this trigger a regression test to be created?
And who tracks/pushes that?
Kirk...
On Thu, May 11, 2023 at 4:16 PM Kirk Wolak <wolakk@gmail.com> wrote: > Forgive the noob question... But does this trigger a regression test to be created? > And who tracks/pushes that? Tom included one in the commit. -- Robert Haas EDB: http://www.enterprisedb.com