Thread: v16 regression - wrong query results with LEFT JOINs + join removal

v16 regression - wrong query results with LEFT JOINs + join removal

From
Robert Haas
Date:
While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7
through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my
colleague Rushabh Lathia and I noticed that the merge caused certain
queries to return wrong answers. Rushabh then discovered that this
also happens in unmodified PostgreSQL. I believe the problem is most
likely with one of these three commits:

3bef56e116 Invent "join domains" to replace the below_outer_join hack.
b448f1c8d8 Do assorted mop-up in the planner.
2489d76c49 Make Vars be outer-join-aware.

The following test case, mostly due to Rushabh, demonstrates the
problem on current master:

rhaas=# create table one_row as values (1);
SELECT 1
rhaas=# create table three_oids as select * from (values (10::oid),
(20::oid), (30::oid)) v(oid);
SELECT 3
rhaas=# create table just_one_oid as select * from (values (10::oid)) v(oid);
SELECT 1
rhaas=# SELECT r.oid, s.oid FROM just_one_oid s LEFT JOIN (select
three_oids.oid from one_row, three_oids LEFT JOIN pg_db_role_setting s
ON three_oids.oid = s.setrole AND s.setdatabase = 0::oid) r ON r.oid =
s.oid;
 oid | oid
-----+-----
  10 |  10
  20 |  10
  30 |  10
(3 rows)

The answer is clearly wrong, because the join clause requires r.oid
and s.oid to be equal, but in the displayed output, they are not. I'm
not quite sure what's happening here. The join to pg_db_role_setting
gets removed, which seems correct, because it has a primary key index
on (setrole, setdatabase), and the join clause requires both of those
columns to be equal to some specific value. The output of the whole
subselect is correct. But the join between the subselect and
just_one_oid somehow goes wrong: the join filter that ought to be
there is missing.

 Nested Loop Left Join  (cost=0.00..272137387.88 rows=16581375000 width=8)
   ->  Seq Scan on just_one_oid s  (cost=0.00..35.50 rows=2550 width=4)
   ->  Materialize  (cost=0.00..139272.12 rows=6502500 width=4)
         ->  Nested Loop  (cost=0.00..81358.62 rows=6502500 width=4)
               ->  Seq Scan on one_row  (cost=0.00..35.50 rows=2550 width=0)
               ->  Materialize  (cost=0.00..48.25 rows=2550 width=4)
                     ->  Seq Scan on three_oids  (cost=0.00..35.50
rows=2550 width=4)

-- 
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7
> through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my
> colleague Rushabh Lathia and I noticed that the merge caused certain
> queries to return wrong answers.

I believe this is a variant of the existing open issue about
the outer-join-Vars stuff:

https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru

although perhaps different in detail --- EXPLAIN shows that the
join condition has gone missing completely, rather than just
being put at the wrong join level.  Possibly related to the
join removal that happened to pg_db_role_setting?

I've been incredibly distracted by $real-life over the past few
weeks, but am hoping to get the outer-join-Vars issues resolved
in the next day or two.

            regards, tom lane



Re: v16 regression - wrong query results with LEFT JOINs + join removal

From
Robert Haas
Date:
On Thu, May 11, 2023 at 9:57 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
> > While merging commits from c9f7f926484d69e2806e35343af7e472fadfede7
> > through 3db72ebcbe20debc6552500ee9ccb4b2007f12f8 into a fork, my
> > colleague Rushabh Lathia and I noticed that the merge caused certain
> > queries to return wrong answers.
>
> I believe this is a variant of the existing open issue about
> the outer-join-Vars stuff:
>
> https://www.postgresql.org/message-id/flat/0b819232-4b50-f245-1c7d-c8c61bf41827%40postgrespro.ru

Ouch, so we've had a known queries-returning-wrong-answers bug for
more than two months. That's not great. I'll try to find time today to
check whether the patches on that thread resolve this issue.

> I've been incredibly distracted by $real-life over the past few
> weeks, but am hoping to get the outer-join-Vars issues resolved
> in the next day or two.

Thanks. Hope the real life distractions are nothing too awful, but
best wishes either way.

--
Robert Haas
EDB: http://www.enterprisedb.com



Re: v16 regression - wrong query results with LEFT JOINs + join removal

From
Robert Haas
Date:
On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Ouch, so we've had a known queries-returning-wrong-answers bug for
> more than two months. That's not great. I'll try to find time today to
> check whether the patches on that thread resolve this issue.

I tried out the v3 patches from that thread and they don't seem to
make any difference for me in this test case. So either (1) it's a
different issue or (2) those patches don't fully fix it or (3) I'm bad
at testing things.

--
Robert Haas
EDB: http://www.enterprisedb.com



Robert Haas <robertmhaas@gmail.com> writes:
> On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
>> Ouch, so we've had a known queries-returning-wrong-answers bug for
>> more than two months. That's not great. I'll try to find time today to
>> check whether the patches on that thread resolve this issue.

> I tried out the v3 patches from that thread and they don't seem to
> make any difference for me in this test case. So either (1) it's a
> different issue or (2) those patches don't fully fix it or (3) I'm bad
> at testing things.

Yeah, I've just traced the problem to remove_rel_from_query() deciding
that it can drop the qual of interest :-(.  I'd done this:

-       if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
+       if (bms_is_member(ojrelid, rinfo->required_relids))

as part of an unfinished effort at getting rid of RestrictInfos'
is_pushed_down flags, and this example shows that the replacement
condition is faulty.

What I'm inclined to do about it is just revert this particular
change.  But I'd better run around and see where else I did that,
because the idea is evidently not ready for prime time.

            regards, tom lane



I wrote:
> Yeah, I've just traced the problem to remove_rel_from_query() deciding
> that it can drop the qual of interest :-(.  I'd done this:
> -       if (RINFO_IS_PUSHED_DOWN(rinfo, joinrelids))
> +       if (bms_is_member(ojrelid, rinfo->required_relids))
> as part of an unfinished effort at getting rid of RestrictInfos'
> is_pushed_down flags, and this example shows that the replacement
> condition is faulty.

> What I'm inclined to do about it is just revert this particular
> change.  But I'd better run around and see where else I did that,
> because the idea is evidently not ready for prime time.

Looks like that was the only such change in 2489d76c4 or
follow-ons, so done, with a test case based on your example.

            regards, tom lane



Re: v16 regression - wrong query results with LEFT JOINs + join removal

From
Robert Haas
Date:
On Thu, May 11, 2023 at 1:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > What I'm inclined to do about it is just revert this particular
> > change.  But I'd better run around and see where else I did that,
> > because the idea is evidently not ready for prime time.
>
> Looks like that was the only such change in 2489d76c4 or
> follow-ons, so done, with a test case based on your example.

Thanks, Tom!

--
Robert Haas
EDB: http://www.enterprisedb.com



On Thu, May 11, 2023 at 11:50 AM Robert Haas <robertmhaas@gmail.com> wrote:
On Thu, May 11, 2023 at 10:14 AM Robert Haas <robertmhaas@gmail.com> wrote:
> Ouch, so we've had a known queries-returning-wrong-answers bug for
> more than two months. That's not great. I'll try to find time today to
> check whether the patches on that thread resolve this issue.

I tried out the v3 patches from that thread and they don't seem to
make any difference for me in this test case. So either (1) it's a
different issue or (2) those patches don't fully fix it or (3) I'm bad
at testing things.

--
Robert Haas
EDB: http://www.enterprisedb.com

Forgive the noob question...  But does this trigger a regression test to be created?
And who tracks/pushes that?

Kirk... 

Re: v16 regression - wrong query results with LEFT JOINs + join removal

From
Robert Haas
Date:
On Thu, May 11, 2023 at 4:16 PM Kirk Wolak <wolakk@gmail.com> wrote:
> Forgive the noob question...  But does this trigger a regression test to be created?
> And who tracks/pushes that?

Tom included one in the commit.

--
Robert Haas
EDB: http://www.enterprisedb.com