Thread: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN

BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18187
Logged by:          Zuming Jiang
Email address:      zuming.jiang@inf.ethz.ch
PostgreSQL version: 16.0
Operating system:   Ubuntu 20.04
Description:

My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected
error.

--- Set up database ---
create table t2 (vkey int4, c9 text, primary key(vkey));
create view t4 as select 1 as c_2, 1 as c_3;
create view t5 as 
select  
    1 as c_0, 
    case when '1' ~<=~ ref_0.c9 then 1 else 1 end as c_3 
  from 
    ((t2 as ref_0
        inner join t2 as ref_1
        on (ref_0.vkey = ref_1.vkey))
      right outer join t2 as ref_2
      on (ref_1.vkey = ref_2.vkey ));
------------------------

The fuzzer generates a test case:

--- Test case ---
select  
  ref_5.c_3
from 
  (((select ref_1.c_3 as c_0 from t4 as ref_1) as subq_0
      right outer join t5 as ref_5
      on (subq_0.c_0 = ref_5.c_0))
    right outer join t4 as ref_6
    on (subq_0.c_0 = ref_6.c_2));
------------------------

--- Expected behavior ---
The test case should not trigger any error.

--- Actual behavior ---
The test case trigger an error: 

ERROR:  variable not found in subplan target lists

--- Postgres version ---
Github commit: 3c551ebede46194237f82062b54b92e474b5c743
Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit

--- Platform information ---
Platform: Ubuntu 20.04
Kernel: Linux 5.4.0-147-generic


On 8/11/2023 00:04, PG Bug reporting form wrote:
> The following bug has been logged on the website:
> 
> Bug reference:      18187
> Logged by:          Zuming Jiang
> Email address:      zuming.jiang@inf.ethz.ch
> PostgreSQL version: 16.0
> Operating system:   Ubuntu 20.04
> Description:
> 
> My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected
> error.
> 
> --- Set up database ---
> create table t2 (vkey int4, c9 text, primary key(vkey));
> create view t4 as select 1 as c_2, 1 as c_3;
> create view t5 as
> select
>      1 as c_0,
>      case when '1' ~<=~ ref_0.c9 then 1 else 1 end as c_3
>    from
>      ((t2 as ref_0
>          inner join t2 as ref_1
>          on (ref_0.vkey = ref_1.vkey))
>        right outer join t2 as ref_2
>        on (ref_1.vkey = ref_2.vkey ));
> ------------------------
> 
> The fuzzer generates a test case:
> 
> --- Test case ---
> select
>    ref_5.c_3
> from
>    (((select ref_1.c_3 as c_0 from t4 as ref_1) as subq_0
>        right outer join t5 as ref_5
>        on (subq_0.c_0 = ref_5.c_0))
>      right outer join t4 as ref_6
>      on (subq_0.c_0 = ref_6.c_2));
> ------------------------
> 
> --- Expected behavior ---
> The test case should not trigger any error.
> 
> --- Actual behavior ---
> The test case trigger an error:
> 
> ERROR:  variable not found in subplan target lists
> 
> --- Postgres version ---
> Github commit: 3c551ebede46194237f82062b54b92e474b5c743
> Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu
> 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit
> 
> --- Platform information ---
> Platform: Ubuntu 20.04
> Kernel: Linux 5.4.0-147-generic
> 
Great,
I would like to have such fuzzer at the stage of development ;)
The issue looks like previous one, related to new SJE feature.

Reduced case:
CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey));
SELECT * FROM (
   SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3
FROM t2 as ref_0
   JOIN t2 AS ref_1
   ON ref_0.vkey = ref_1.vkey
     RIGHT OUTER JOIN t2 AS ref_2
     ON ref_1.vkey = ref_2.vkey) AS t5
   RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4
   ON t4.c_2 IS NOT NULL;

The key problem lies in the 'CASE' statement.

-- 
regards,
Andrei Lepikhov
Postgres Professional




> Great,
> I would like to have such fuzzer at the stage of development 😉 
Thanks for your appreciation!
I would like to open source the tool after my research paper on this 
tool get accepted.




On Wed, Nov 8, 2023 at 11:42 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
The issue looks like previous one, related to new SJE feature.

Reduced case:
CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey));
SELECT * FROM (
   SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3
FROM t2 as ref_0
   JOIN t2 AS ref_1
   ON ref_0.vkey = ref_1.vkey
     RIGHT OUTER JOIN t2 AS ref_2
     ON ref_1.vkey = ref_2.vkey) AS t5
   RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4
   ON t4.c_2 IS NOT NULL;

The key problem lies in the 'CASE' statement.

I've looked into this a little bit.  I think it's caused by the SJE
logic not properly removing references from PHVs.  Specifically, it
fails to replace the ref_0's Vars within phv->phexpr, leading them to be
added in ref_2/ref_1 join's targetlist.

Also, I noticed that in remove_rel_from_query() we perform replace_relid
for phv->phrels twice at line 475 and 478, which seems not right to me.

 475   phv->phrels = replace_relid(phv->phrels, relid, subst);
 476   phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
 477   phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
 478   phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);

Attached is a hotfix.

Thanks
Richard
Attachment
On 9/11/2023 10:54, Richard Guo wrote:
> 
> On Wed, Nov 8, 2023 at 11:42 PM Andrei Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> 
>     The issue looks like previous one, related to new SJE feature.
> 
>     Reduced case:
>     CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey));
>     SELECT * FROM (
>         SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3
>     FROM t2 as ref_0
>         JOIN t2 AS ref_1
>         ON ref_0.vkey = ref_1.vkey
>           RIGHT OUTER JOIN t2 AS ref_2
>           ON ref_1.vkey = ref_2.vkey) AS t5
>         RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4
>         ON t4.c_2 IS NOT NULL;
> 
>     The key problem lies in the 'CASE' statement.
> 
> 
> I've looked into this a little bit.  I think it's caused by the SJE
> logic not properly removing references from PHVs.  Specifically, it
> fails to replace the ref_0's Vars within phv->phexpr, leading them to be
> added in ref_2/ref_1 join's targetlist.
> 
> Also, I noticed that in remove_rel_from_query() we perform replace_relid
> for phv->phrels twice at line 475 and 478, which seems not right to me.
> 
>   475   phv->phrels = replace_relid(phv->phrels, relid, subst);
>   476   phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
>   477   phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, 
> subst);
>   478   phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, 
> relid, subst);

Thanks a lot!
Your patch looks correct. We really have been missing replacing the 
PlaceHolderVar expression references for all the development time.
The test in join.sql works correctly. Should we add a reference to the 
bug that triggered the issue as a comment to the test? Also, to be sure, 
maybe add column t4.code into the list of the coalesce parameters?

-- 
regards,
Andrei Lepikhov
Postgres Professional





On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
... Should we add a reference to the
bug that triggered the issue as a comment to the test?

Yeah, we can do that.  I guess something like:

-- Check that SJE removes references from PHVs correctly (bug #18187)
 
Also, to be sure,
maybe add column t4.code into the list of the coalesce parameters?

I don't think it's necessary.  It is t3 that SJE would remove, so it's
sufficient to have t3's Vars in the PHV expression to trigger this
error.

Thanks
Richard
On 9/11/2023 18:08, Richard Guo wrote:
> 
> On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov 
> <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> 
>     ... Should we add a reference to the
>     bug that triggered the issue as a comment to the test?
> 
> 
> Yeah, we can do that.  I guess something like:
> 
> -- Check that SJE removes references from PHVs correctly (bug #18187)
> 
>     Also, to be sure,
>     maybe add column t4.code into the list of the coalesce parameters?
> 
> 
> I don't think it's necessary.  It is t3 that SJE would remove, so it's
> sufficient to have t3's Vars in the PHV expression to trigger this
> error.

Agree. I've considered the situations when something in the internal 
planner logic changes, and relations could arrive in a different order. 
In that case, we sort relids, and have determined behaviour. So, do we 
wait for Alexander's glance?

-- 
regards,
Andrei Lepikhov
Postgres Professional




On Thu, Nov 9, 2023 at 1:21 PM Andrei Lepikhov
<a.lepikhov@postgrespro.ru> wrote:
> On 9/11/2023 18:08, Richard Guo wrote:
> >
> > On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov
> > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote:
> >
> >     ... Should we add a reference to the
> >     bug that triggered the issue as a comment to the test?
> >
> >
> > Yeah, we can do that.  I guess something like:
> >
> > -- Check that SJE removes references from PHVs correctly (bug #18187)
> >
> >     Also, to be sure,
> >     maybe add column t4.code into the list of the coalesce parameters?
> >
> >
> > I don't think it's necessary.  It is t3 that SJE would remove, so it's
> > sufficient to have t3's Vars in the PHV expression to trigger this
> > error.
>
> Agree. I've considered the situations when something in the internal
> planner logic changes, and relations could arrive in a different order.
> In that case, we sort relids, and have determined behaviour. So, do we
> wait for Alexander's glance?

LGMT, pushed!

------
Regards,
Alexander Korotkov




On Thu, Nov 9, 2023 at 11:54 AM Richard Guo <guofenglinux@gmail.com> wrote:
I've looked into this a little bit.  I think it's caused by the SJE
logic not properly removing references from PHVs.

Looking closer at the codes, I think we still have some loose ends
regarding how SJE handles PHVs.

1) When we check PHVs in remove_self_joins_one_group() to see if there
is any PHV that prevents us from removing the self join, we do not check
ph_lateral.  This is dangerous because it leaves us with no guarantee of
what the Assert in remove_rel_from_query() asserts:

    Assert(!bms_is_member(relid, phinfo->ph_lateral));

To illustrate this problem, look at the query below which would trigger
this Assert.

create table t (a int primary key, b int);

explain (costs off)
select * from t t1 join t t2 on t1.a = t2.a left join
    lateral (select t1.a as t1a, * from generate_series(1,1) t3) s on true;
server closed the connection unexpectedly

2) Currently remove_self_joins_one_group() checks PHVs as below

    /* there isn't any other place to eval PHV */
    if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
        bms_is_subset(phinfo->ph_needed, joinrelids))
        break;

I'm wondering if we can relax this restriction because it seems to me
that a PHV evaluated/needed at or below the self join should not have
problem if we remove the self join.  For instance,

explain (costs off)
select * from generate_series(1,10) t1(a) left join lateral
    (select t1.a as t1a, t2.a from t t2 join t t3 on t2.a = t3.a) on true;
                QUERY PLAN
-------------------------------------------
 Nested Loop Left Join
   ->  Function Scan on generate_series t1
   ->  Hash Join
         Hash Cond: (t2.a = t3.a)
         ->  Seq Scan on t t2
         ->  Hash
               ->  Seq Scan on t t3
(7 rows)

I think the t2/t3 join can actually be removed.

And now I have a vague feeling that PHVs should not impose any
constraints on removing self joins.  But I'm not sure.

It would be great if we could have Tom's perspective on this.  So I've
added Tom in the cc list.

FWIW, attached is a band-aid fix for the Assert failure issue in 1), in
case we want to fix it first before we discuss this topic further.

Thanks
Richard
Attachment
On Fri, Nov 10, 2023 at 10:21 AM Richard Guo <guofenglinux@gmail.com> wrote:
> FWIW, attached is a band-aid fix for the Assert failure issue in 1), in
> case we want to fix it first before we discuss this topic further.

I've pushed this, thank you.  I think we should keep the source tree
in working order, while waiting for SJ removal in lateral to be fixed.

------
Regards,
Alexander Korotkov




On Fri, Nov 10, 2023 at 4:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
Looking closer at the codes, I think we still have some loose ends
regarding how SJE handles PHVs.

...

2) Currently remove_self_joins_one_group() checks PHVs as below

    /* there isn't any other place to eval PHV */
    if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
        bms_is_subset(phinfo->ph_needed, joinrelids))
        break;

I'm wondering if we can relax this restriction because it seems to me
that a PHV evaluated/needed at or below the self join should not have
problem if we remove the self join.

After some more thought, I think PHVs should not impose any constraints
on removing self joins.  If the removed rel is contained in the relids
that a PHV is evaluated/needed at or laterally references, it can just
be replaced with the rel that is kept.

Attached is a patch to remove such constraints.  Any comments or
feedback are welcome.

Thanks
Richard
Attachment
On 20/11/2023 15:49, Richard Guo wrote:
>     I'm wondering if we can relax this restriction because it seems to me
>     that a PHV evaluated/needed at or below the self join should not have
>     problem if we remove the self join.
> 
> 
> After some more thought, I think PHVs should not impose any constraints
> on removing self joins.  If the removed rel is contained in the relids
> that a PHV is evaluated/needed at or laterally references, it can just
> be replaced with the rel that is kept.
> 
> Attached is a patch to remove such constraints.  Any comments or
> feedback are welcome.

Carrying out excavations, I found that it was Teodor Sigaev who added 
this limitation in v.20 of the patch in an earlier stage of the development.
I also had analyzed this part of code and came to the conclusion that it 
doesn't needed. Only a paranoid approach and the absence of an 
alternative opinion are the reasons why this code is still exists.
Your patch is ok.
I think the comment, 'PHVs should not impose ...' looks redundant. It 
may be enough to have it in the commit comment.

-- 
regards,
Andrei Lepikhov
Postgres Professional





On Wed, Nov 22, 2023 at 10:32 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
Your patch is ok.

Thanks for looking into this patch.
 
I think the comment, 'PHVs should not impose ...' looks redundant. It
may be enough to have it in the commit comment.

Hmm, I'm not sure.  I'm thinking that when people read this part of the
code, they might be confused as to why we don't consider PHVs.  And this
comment will give them some clues.  This is just like how we explain why
we need not worry about LATERAL nor PHVs in is_simple_values().

Thanks
Richard
Richard Guo <guofenglinux@gmail.com> writes:
> After some more thought, I think PHVs should not impose any constraints
> on removing self joins.  If the removed rel is contained in the relids
> that a PHV is evaluated/needed at or laterally references, it can just
> be replaced with the rel that is kept.

I experimented with trying to break this patch using this test case:

create table t (id int primary key, f1 text, f2 text);
create table i (f0 int);

explain verbose
select * from i left join (
  (select *, 1 as x from t tss1) t1 join
  (select *, 2 as y from t tss2) t2
  on t1.id = t2.id
) on i.f0 = t1.id;

It seems to successfully remove the self-join between tss1 and tss2,
but there is something wrong.  If you examine the PlannerInfo just
after remove_useless_self_joins, you will notice that the two
PlaceHolderVars (for x and y) in the placeholder_list both have

         :phrels (b 7)
         :phnullingrels (b)

where beforehand x had

         :phrels (b 6)
         :phnullingrels (b 5)

and y had

         :phrels (b 7)
         :phnullingrels (b 5)

It's correct to move both phrels locations to rel 7 (the surviving
self-joined rel) but what became of the reference to nullingrel 5?
That seems clearly wrong, since we have not removed the left join.

Much worse, if you look around elsewhere in the data structure,
particularly at the processed_tlist, you find instances of the
PlaceHolderVars that have not been updated and still have the
old values such as ":phrels (b 6)".

It's not apparent to me why the cross-checks we have in setrefs.c
and elsewhere don't detect a problem with this.  But it seems
clear that remove_useless_self_joins is still several bricks
shy of a load in terms of fully updating the data structure for a
join removal.  Probably with some more work to add complication
to this test case, we could demonstrate an observable failure.

BTW, why is it that it seems to prefer to remove the first of
the two self-joined rels, rather than the second?  That seems
jarringly bizarre.

            regards, tom lane




On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> After some more thought, I think PHVs should not impose any constraints
> on removing self joins.  If the removed rel is contained in the relids
> that a PHV is evaluated/needed at or laterally references, it can just
> be replaced with the rel that is kept.

I experimented with trying to break this patch using this test case:

create table t (id int primary key, f1 text, f2 text);
create table i (f0 int);

explain verbose
select * from i left join (
  (select *, 1 as x from t tss1) t1 join
  (select *, 2 as y from t tss2) t2
  on t1.id = t2.id
) on i.f0 = t1.id;

Thanks for the query!
 
It seems to successfully remove the self-join between tss1 and tss2,
but there is something wrong.  If you examine the PlannerInfo just
after remove_useless_self_joins, you will notice that the two
PlaceHolderVars (for x and y) in the placeholder_list both have

         :phrels (b 7)
         :phnullingrels (b)

where beforehand x had

         :phrels (b 6)
         :phnullingrels (b 5)

and y had

         :phrels (b 7)
         :phnullingrels (b 5)

It's correct to move both phrels locations to rel 7 (the surviving
self-joined rel) but what became of the reference to nullingrel 5?
That seems clearly wrong, since we have not removed the left join.

What you noticed before remove_useless_self_joins seems not right to me.
The PHVs in placeholder_list are supposed to have empty phnullingrels by
convention, as explained in find_placeholder_info.

* By convention, phinfo->ph_var->phnullingrels is always empty, since the
* PlaceHolderInfo represents the initially-calculated state of the
* PlaceHolderVar.
 
Much worse, if you look around elsewhere in the data structure,
particularly at the processed_tlist, you find instances of the
PlaceHolderVars that have not been updated and still have the
old values such as ":phrels (b 6)".

Good catch!  We fail to handle PlaceHolderVar case in sje_walker and
replace_varno_walker, which is wrong.  Fixed in v3 patch.
 
It's not apparent to me why the cross-checks we have in setrefs.c
and elsewhere don't detect a problem with this.  But it seems
clear that remove_useless_self_joins is still several bricks
shy of a load in terms of fully updating the data structure for a
join removal.  Probably with some more work to add complication
to this test case, we could demonstrate an observable failure.

Fair point.  I have the same concern that we still have other loose ends
in SJE updating necessary data structures in self join removal.  And I
noticed that Andres expressed the same concern in [1].  I think we
should come up with some solution to detect/avoid such problems, but I
imagine that that should be a seperate patch.
 
BTW, why is it that it seems to prefer to remove the first of
the two self-joined rels, rather than the second?  That seems
jarringly bizarre.

Hmm, I'm not sure either.  Alexander and Andrei, could you please share
your insights?

[1] https://www.postgresql.org/message-id/20231127180705.svzrlahdmqvosqfz%40awork3.anarazel.de

Thanks
Richard
Attachment

On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
BTW, why is it that it seems to prefer to remove the first of
the two self-joined rels, rather than the second?  That seems
jarringly bizarre.

Hmm, I'm not sure either.  Alexander and Andrei, could you please share
your insights?

BTW, while reading the codes, I noticed this commit of
remove_self_joins_recurse.

* ... To avoid complexity, limit the max power of this set by a GUC.

But where is the GUC?   I guess that it refers to
self_join_search_limit, which has been removed during development.

So we should revise this commit to at least remove any mention of the
GUC.  Maybe it'd better to add a new commit explaining why we are not
concerned about cases where the number of self joins is too large.

Thanks
Richard
On 28/11/2023 15:03, Richard Guo wrote:
> On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
>     On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
>         BTW, why is it that it seems to prefer to remove the first of
>         the two self-joined rels, rather than the second?  That seems
>         jarringly bizarre.
>     Hmm, I'm not sure either.  Alexander and Andrei, could you please share
>     your insights?

I was thinking about that choice and, in earlier versions, removed outer 
relations. I did not find a difference in removing a specific side of 
the JOIN. Rewriting the patch right before the commit, I found out that 
remove_useless_joins removed the inner join and chose to do the same, 
like a convention.

> BTW, while reading the codes, I noticed this commit of
> remove_self_joins_recurse.
> 
> * ... To avoid complexity, limit the max power of this set by a GUC.
> 
> But where is the GUC?   I guess that it refers to
> self_join_search_limit, which has been removed during development.
> 
> So we should revise this commit to at least remove any mention of the
> GUC.  Maybe it'd better to add a new commit explaining why we are not
> concerned about cases where the number of self joins is too large.

Maybe. But actually, we just did some benchmarking to see the influence. 
Moreover, Alexander's code on preliminary sorting the relations and the 
fact that inheritance isn't still expanded were reasons we aren't afraid 
of significant degradation in practical cases.

-- 
regards,
Andrei Lepikhov
Postgres Professional




On 28/11/2023 14:03, Richard Guo wrote:
> 
> On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us 
> <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>     Richard Guo <guofenglinux@gmail.com <mailto:guofenglinux@gmail.com>>
>     writes:
>      > After some more thought, I think PHVs should not impose any
>     constraints
>      > on removing self joins.  If the removed rel is contained in the
>     relids
>      > that a PHV is evaluated/needed at or laterally references, it can
>     just
>      > be replaced with the rel that is kept.
> 
>     I experimented with trying to break this patch using this test case:
> 
>     create table t (id int primary key, f1 text, f2 text);
>     create table i (f0 int);
> 
>     explain verbose
>     select * from i left join (
>        (select *, 1 as x from t tss1) t1 join
>        (select *, 2 as y from t tss2) t2
>        on t1.id <http://t1.id> = t2.id <http://t2.id>
>     ) on i.f0 = t1.id <http://t1.id>;
> 
> 
> Thanks for the query!
> 
>     It seems to successfully remove the self-join between tss1 and tss2,
>     but there is something wrong.  If you examine the PlannerInfo just
>     after remove_useless_self_joins, you will notice that the two
>     PlaceHolderVars (for x and y) in the placeholder_list both have
> 
>               :phrels (b 7)
>               :phnullingrels (b)
> 
>     where beforehand x had
> 
>               :phrels (b 6)
>               :phnullingrels (b 5)
> 
>     and y had
> 
>               :phrels (b 7)
>               :phnullingrels (b 5)
> 
>     It's correct to move both phrels locations to rel 7 (the surviving
>     self-joined rel) but what became of the reference to nullingrel 5?
>     That seems clearly wrong, since we have not removed the left join.
> 
> 
> What you noticed before remove_useless_self_joins seems not right to me.
> The PHVs in placeholder_list are supposed to have empty phnullingrels by
> convention, as explained in find_placeholder_info.
> 
> * By convention, phinfo->ph_var->phnullingrels is always empty, since the
> * PlaceHolderInfo represents the initially-calculated state of the
> * PlaceHolderVar.
> 
>     Much worse, if you look around elsewhere in the data structure,
>     particularly at the processed_tlist, you find instances of the
>     PlaceHolderVars that have not been updated and still have the
>     old values such as ":phrels (b 6)".
> 
> 
> Good catch!  We fail to handle PlaceHolderVar case in sje_walker and
> replace_varno_walker, which is wrong.  Fixed in v3 patch.
> 
>     It's not apparent to me why the cross-checks we have in setrefs.c
>     and elsewhere don't detect a problem with this.  But it seems
>     clear that remove_useless_self_joins is still several bricks
>     shy of a load in terms of fully updating the data structure for a
>     join removal.  Probably with some more work to add complication
>     to this test case, we could demonstrate an observable failure.
> 
> 
> Fair point.  I have the same concern that we still have other loose ends
> in SJE updating necessary data structures in self join removal.  And I
> noticed that Andres expressed the same concern in [1].  I think we
> should come up with some solution to detect/avoid such problems, but I
> imagine that that should be a seperate patch.

Agree. PlannerInfo is a hot structure where many new features try to 
save some data. The history of this patch has shown, for example, 
JoinDomains. Finding field jd_relids, which could contain the link to 
the removing relation, was tricky.
I still have no idea how to check it internally except by attaching a 
comment to the PlannerInfo structure.

-- 
regards,
Andrei Lepikhov
Postgres Professional





On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Much worse, if you look around elsewhere in the data structure,
particularly at the processed_tlist, you find instances of the
PlaceHolderVars that have not been updated and still have the
old values such as ":phrels (b 6)".

Good catch!  We fail to handle PlaceHolderVar case in sje_walker and
replace_varno_walker, which is wrong.  Fixed in v3 patch.

I noticed that this issue exists on master even without the
Don't-constrain-SJE-due-to-PHVs patch, and it can be illustrated with a
query from regression test.

-- Test that placeholders are updated correctly after join removal
explain (costs off)
select * from (values (1)) x
left join (select coalesce(y.q1, 1) from int8_tbl y
    right join sj j1 inner join sj j2 on j1.a = j2.a
    on true) z
on true;

In this query it successfully removes the self-join between j1 (6) and
j2 (7).  But in the processed_tlist, you can still find PlaceHolderVar
with ":phrels (b 5 6 7 9)".  So I think we need to handle PlaceHolderVar
case in sje_walker and replace_varno_walker, whether or not we
incorporate the Don't-constrain-SJE-due-to-PHVs patch.

Also, it seems to me that sje_walker is redundant.  Currently it is used
to walk the query tree for varno replacement, a task that can be
fulfilled with replace_varno_walker.

0001 adds the handling of PlaceHolderVar case in replace_varno_walker,
and meanwhile retires sje_walker.

0002 is the original Don't-constrain-SJE-due-to-PHVs patch.

0003 is a fix of the comment for remove_self_joins_recurse, as described
upthread.

Thanks
Richard
Attachment
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
 >          :phnullingrels (b 5)

 > It's correct to move both phrels locations to rel 7 (the surviving
 > self-joined rel) but what became of the reference to nullingrel 5?
 > That seems clearly wrong, since we have not removed the left join.

I must confess I couldn't reproduce this issue. I watched only the value 
':phnullingrels (b)' everywhere.

On 5/12/2023 09:35, Richard Guo wrote:
> 
> On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com 
> <mailto:guofenglinux@gmail.com>> wrote:
> 
>     On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us
>     <mailto:tgl@sss.pgh.pa.us>> wrote:
> 
>         Much worse, if you look around elsewhere in the data structure,
>         particularly at the processed_tlist, you find instances of the
>         PlaceHolderVars that have not been updated and still have the
>         old values such as ":phrels (b 6)".
> 
> 
>     Good catch!  We fail to handle PlaceHolderVar case in sje_walker and
>     replace_varno_walker, which is wrong.  Fixed in v3 patch.

Yes, it was my blunder. I didn't recognize that phinfo->ph_var points to 
a copy of the PlaceHolderVar and thought it was enough to pass through 
the root->placeholder_list only. Adding replacement code into the walker 
is the proper correction.

> I noticed that this issue exists on master even without the
> Don't-constrain-SJE-due-to-PHVs patch, and it can be illustrated with a
> query from regression test.
> 
> -- Test that placeholders are updated correctly after join removal
> explain (costs off)
> select * from (values (1)) x
> left join (select coalesce(y.q1, 1) from int8_tbl y
>      right join sj j1 inner join sj j2 on j1.a = j2.a
>      on true) z
> on true;
> Also, it seems to me that sje_walker is redundant.  Currently it is used
> to walk the query tree for varno replacement, a task that can be
> fulfilled with replace_varno_walker.

Agree.

 >>     It's not apparent to me why the cross-checks we have in setrefs.c
 >>    and elsewhere don't detect a problem with this.  But it seems
 >>    clear that remove_useless_self_joins is still several bricks
 >>    shy of a load in terms of fully updating the data structure for a
 >>    join removal.  Probably with some more work to add complication
 >>    to this test case, we could demonstrate an observable failure.
 > Fair point.  I have the same concern that we still have other loose > 
ends
 > in SJE updating necessary data structures in self join removal.  And I
 > noticed that Andres expressed the same concern in [1].  I think we
 > should come up with some solution to detect/avoid such problems, but I
 > imagine that that should be a separate patch.

It is an excellent task to ponder. At least, we should answer later why 
the setrefs.c didn't detect that, and could we do something here. Maybe 
all we could do - is add comments to the PlannerInfo structure.

According to the patches:
0001 - looks good for me.
0002 - I don't understand why to use 'explain' in VERBOSE mode in tests. 
What do you try to detect here?
0003 - ok, but too short. Maybe squash all these patches into one?

I think these corrections are good enough for commit.

-- 
regards,
Andrei Lepikhov
Postgres Professional





On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
According to the patches:
0001 - looks good for me.
0002 - I don't understand why to use 'explain' in VERBOSE mode in tests.
What do you try to detect here?

The VERBOSE mode is used to observe that PHVs are adjusted correctly and
put in the right targetlist.
 
0003 - ok, but too short. Maybe squash all these patches into one?

It seems to me that these three patches are addressing separate issues
and are independent of each other.  For instance, the issue addressed by
0001 exists on master even without other changes, and 0002 is an
improvement that does not depend on others, while 0003 is a fix of an
incorrect comment.  I don't think it's a good idea to squash them into
one.
 
I think these corrections are good enough for commit.

Thank you for reviewing these patches.  Alexander, what are your
thoughts on these patches?

Thanks
Richard
On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
> On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
>> According to the patches:
>> 0001 - looks good for me.
>> 0002 - I don't understand why to use 'explain' in VERBOSE mode in tests.
>> What do you try to detect here?
>
> The VERBOSE mode is used to observe that PHVs are adjusted correctly and
> put in the right targetlist.
>
>> 0003 - ok, but too short. Maybe squash all these patches into one?
>
> It seems to me that these three patches are addressing separate issues
> and are independent of each other.  For instance, the issue addressed by
> 0001 exists on master even without other changes, and 0002 is an
> improvement that does not depend on others, while 0003 is a fix of an
> incorrect comment.  I don't think it's a good idea to squash them into
> one.
>
>> I think these corrections are good enough for commit.
>
> Thank you for reviewing these patches.  Alexander, what are your
> thoughts on these patches?

Looks good to me.  I'm going to push this if there are no objections.

------
Regards,
Alexander Korotkov



On Sat, Dec 23, 2023 at 1:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
> >> According to the patches:
> >> 0001 - looks good for me.
> >> 0002 - I don't understand why to use 'explain' in VERBOSE mode in tests.
> >> What do you try to detect here?
> >
> > The VERBOSE mode is used to observe that PHVs are adjusted correctly and
> > put in the right targetlist.
> >
> >> 0003 - ok, but too short. Maybe squash all these patches into one?
> >
> > It seems to me that these three patches are addressing separate issues
> > and are independent of each other.  For instance, the issue addressed by
> > 0001 exists on master even without other changes, and 0002 is an
> > improvement that does not depend on others, while 0003 is a fix of an
> > incorrect comment.  I don't think it's a good idea to squash them into
> > one.
> >
> >> I think these corrections are good enough for commit.
> >
> > Thank you for reviewing these patches.  Alexander, what are your
> > thoughts on these patches?
>
> Looks good to me.  I'm going to push this if there are no objections.

Pushed!  Thanks to all the participants.

------
Regards,
Alexander Korotkov




On Mon, Dec 25, 2023 at 7:38 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sat, Dec 23, 2023 at 1:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > Thank you for reviewing these patches.  Alexander, what are your
> > thoughts on these patches?
>
> Looks good to me.  I'm going to push this if there are no objections.

Pushed!  Thanks to all the participants.

Thanks for pushing!

Thanks
Richard