Thread: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
PG Bug reporting form
Date:
The following bug has been logged on the website: Bug reference: 18187 Logged by: Zuming Jiang Email address: zuming.jiang@inf.ethz.ch PostgreSQL version: 16.0 Operating system: Ubuntu 20.04 Description: My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected error. --- Set up database --- create table t2 (vkey int4, c9 text, primary key(vkey)); create view t4 as select 1 as c_2, 1 as c_3; create view t5 as select 1 as c_0, case when '1' ~<=~ ref_0.c9 then 1 else 1 end as c_3 from ((t2 as ref_0 inner join t2 as ref_1 on (ref_0.vkey = ref_1.vkey)) right outer join t2 as ref_2 on (ref_1.vkey = ref_2.vkey )); ------------------------ The fuzzer generates a test case: --- Test case --- select ref_5.c_3 from (((select ref_1.c_3 as c_0 from t4 as ref_1) as subq_0 right outer join t5 as ref_5 on (subq_0.c_0 = ref_5.c_0)) right outer join t4 as ref_6 on (subq_0.c_0 = ref_6.c_2)); ------------------------ --- Expected behavior --- The test case should not trigger any error. --- Actual behavior --- The test case trigger an error: ERROR: variable not found in subplan target lists --- Postgres version --- Github commit: 3c551ebede46194237f82062b54b92e474b5c743 Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit --- Platform information --- Platform: Ubuntu 20.04 Kernel: Linux 5.4.0-147-generic
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 8/11/2023 00:04, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 18187 > Logged by: Zuming Jiang > Email address: zuming.jiang@inf.ethz.ch > PostgreSQL version: 16.0 > Operating system: Ubuntu 20.04 > Description: > > My fuzzer finds a bug in Postgres 17devel, which triggers an unexpected > error. > > --- Set up database --- > create table t2 (vkey int4, c9 text, primary key(vkey)); > create view t4 as select 1 as c_2, 1 as c_3; > create view t5 as > select > 1 as c_0, > case when '1' ~<=~ ref_0.c9 then 1 else 1 end as c_3 > from > ((t2 as ref_0 > inner join t2 as ref_1 > on (ref_0.vkey = ref_1.vkey)) > right outer join t2 as ref_2 > on (ref_1.vkey = ref_2.vkey )); > ------------------------ > > The fuzzer generates a test case: > > --- Test case --- > select > ref_5.c_3 > from > (((select ref_1.c_3 as c_0 from t4 as ref_1) as subq_0 > right outer join t5 as ref_5 > on (subq_0.c_0 = ref_5.c_0)) > right outer join t4 as ref_6 > on (subq_0.c_0 = ref_6.c_2)); > ------------------------ > > --- Expected behavior --- > The test case should not trigger any error. > > --- Actual behavior --- > The test case trigger an error: > > ERROR: variable not found in subplan target lists > > --- Postgres version --- > Github commit: 3c551ebede46194237f82062b54b92e474b5c743 > Version: PostgreSQL 17devel on x86_64-pc-linux-gnu, compiled by gcc (Ubuntu > 9.4.0-1ubuntu1~20.04.1) 9.4.0, 64-bit > > --- Platform information --- > Platform: Ubuntu 20.04 > Kernel: Linux 5.4.0-147-generic > Great, I would like to have such fuzzer at the stage of development ;) The issue looks like previous one, related to new SJE feature. Reduced case: CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey)); SELECT * FROM ( SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3 FROM t2 as ref_0 JOIN t2 AS ref_1 ON ref_0.vkey = ref_1.vkey RIGHT OUTER JOIN t2 AS ref_2 ON ref_1.vkey = ref_2.vkey) AS t5 RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4 ON t4.c_2 IS NOT NULL; The key problem lies in the 'CASE' statement. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Zu-Ming Jiang
Date:
> Great, > I would like to have such fuzzer at the stage of development 😉 Thanks for your appreciation! I would like to open source the tool after my research paper on this tool get accepted.
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Wed, Nov 8, 2023 at 11:42 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
The issue looks like previous one, related to new SJE feature.
Reduced case:
CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey));
SELECT * FROM (
SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3
FROM t2 as ref_0
JOIN t2 AS ref_1
ON ref_0.vkey = ref_1.vkey
RIGHT OUTER JOIN t2 AS ref_2
ON ref_1.vkey = ref_2.vkey) AS t5
RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4
ON t4.c_2 IS NOT NULL;
The key problem lies in the 'CASE' statement.
I've looked into this a little bit. I think it's caused by the SJE
logic not properly removing references from PHVs. Specifically, it
fails to replace the ref_0's Vars within phv->phexpr, leading them to be
added in ref_2/ref_1 join's targetlist.
Also, I noticed that in remove_rel_from_query() we perform replace_relid
for phv->phrels twice at line 475 and 478, which seems not right to me.
475 phv->phrels = replace_relid(phv->phrels, relid, subst);
476 phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
477 phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
478 phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Attached is a hotfix.
Thanks
Richard
logic not properly removing references from PHVs. Specifically, it
fails to replace the ref_0's Vars within phv->phexpr, leading them to be
added in ref_2/ref_1 join's targetlist.
Also, I noticed that in remove_rel_from_query() we perform replace_relid
for phv->phrels twice at line 475 and 478, which seems not right to me.
475 phv->phrels = replace_relid(phv->phrels, relid, subst);
476 phv->phrels = replace_relid(phv->phrels, ojrelid, subst);
477 phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, subst);
478 phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, relid, subst);
Attached is a hotfix.
Thanks
Richard
Attachment
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 9/11/2023 10:54, Richard Guo wrote: > > On Wed, Nov 8, 2023 at 11:42 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > The issue looks like previous one, related to new SJE feature. > > Reduced case: > CREATE TABLE t2 (vkey int4, c9 text, primary key(vkey)); > SELECT * FROM ( > SELECT CASE WHEN '1' = ref_0.c9 THEN 1 ELSE 1 END AS c_3 > FROM t2 as ref_0 > JOIN t2 AS ref_1 > ON ref_0.vkey = ref_1.vkey > RIGHT OUTER JOIN t2 AS ref_2 > ON ref_1.vkey = ref_2.vkey) AS t5 > RIGHT OUTER JOIN (SELECT 1 AS c_2) AS t4 > ON t4.c_2 IS NOT NULL; > > The key problem lies in the 'CASE' statement. > > > I've looked into this a little bit. I think it's caused by the SJE > logic not properly removing references from PHVs. Specifically, it > fails to replace the ref_0's Vars within phv->phexpr, leading them to be > added in ref_2/ref_1 join's targetlist. > > Also, I noticed that in remove_rel_from_query() we perform replace_relid > for phv->phrels twice at line 475 and 478, which seems not right to me. > > 475 phv->phrels = replace_relid(phv->phrels, relid, subst); > 476 phv->phrels = replace_relid(phv->phrels, ojrelid, subst); > 477 phinfo->ph_lateral = replace_relid(phinfo->ph_lateral, relid, > subst); > 478 phinfo->ph_var->phrels = replace_relid(phinfo->ph_var->phrels, > relid, subst); Thanks a lot! Your patch looks correct. We really have been missing replacing the PlaceHolderVar expression references for all the development time. The test in join.sql works correctly. Should we add a reference to the bug that triggered the issue as a comment to the test? Also, to be sure, maybe add column t4.code into the list of the coalesce parameters? -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
... Should we add a reference to the
bug that triggered the issue as a comment to the test?
Yeah, we can do that. I guess something like:
-- Check that SJE removes references from PHVs correctly (bug #18187)
-- Check that SJE removes references from PHVs correctly (bug #18187)
Also, to be sure,
maybe add column t4.code into the list of the coalesce parameters?
I don't think it's necessary. It is t3 that SJE would remove, so it's
sufficient to have t3's Vars in the PHV expression to trigger this
error.
Thanks
Richard
sufficient to have t3's Vars in the PHV expression to trigger this
error.
Thanks
Richard
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 9/11/2023 18:08, Richard Guo wrote: > > On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > ... Should we add a reference to the > bug that triggered the issue as a comment to the test? > > > Yeah, we can do that. I guess something like: > > -- Check that SJE removes references from PHVs correctly (bug #18187) > > Also, to be sure, > maybe add column t4.code into the list of the coalesce parameters? > > > I don't think it's necessary. It is t3 that SJE would remove, so it's > sufficient to have t3's Vars in the PHV expression to trigger this > error. Agree. I've considered the situations when something in the internal planner logic changes, and relations could arrive in a different order. In that case, we sort relids, and have determined behaviour. So, do we wait for Alexander's glance? -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Alexander Korotkov
Date:
On Thu, Nov 9, 2023 at 1:21 PM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > On 9/11/2023 18:08, Richard Guo wrote: > > > > On Thu, Nov 9, 2023 at 12:51 PM Andrei Lepikhov > > <a.lepikhov@postgrespro.ru <mailto:a.lepikhov@postgrespro.ru>> wrote: > > > > ... Should we add a reference to the > > bug that triggered the issue as a comment to the test? > > > > > > Yeah, we can do that. I guess something like: > > > > -- Check that SJE removes references from PHVs correctly (bug #18187) > > > > Also, to be sure, > > maybe add column t4.code into the list of the coalesce parameters? > > > > > > I don't think it's necessary. It is t3 that SJE would remove, so it's > > sufficient to have t3's Vars in the PHV expression to trigger this > > error. > > Agree. I've considered the situations when something in the internal > planner logic changes, and relations could arrive in a different order. > In that case, we sort relids, and have determined behaviour. So, do we > wait for Alexander's glance? LGMT, pushed! ------ Regards, Alexander Korotkov
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Thu, Nov 9, 2023 at 11:54 AM Richard Guo <guofenglinux@gmail.com> wrote:
I've looked into this a little bit. I think it's caused by the SJE
logic not properly removing references from PHVs.
Looking closer at the codes, I think we still have some loose ends
regarding how SJE handles PHVs.
1) When we check PHVs in remove_self_joins_one_group() to see if there
is any PHV that prevents us from removing the self join, we do not check
ph_lateral. This is dangerous because it leaves us with no guarantee of
what the Assert in remove_rel_from_query() asserts:
Assert(!bms_is_member(relid, phinfo->ph_lateral));
To illustrate this problem, look at the query below which would trigger
this Assert.
create table t (a int primary key, b int);
explain (costs off)
select * from t t1 join t t2 on t1.a = t2.a left join
lateral (select t1.a as t1a, * from generate_series(1,1) t3) s on true;
server closed the connection unexpectedly
2) Currently remove_self_joins_one_group() checks PHVs as below
/* there isn't any other place to eval PHV */
if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
bms_is_subset(phinfo->ph_needed, joinrelids))
break;
I'm wondering if we can relax this restriction because it seems to me
that a PHV evaluated/needed at or below the self join should not have
problem if we remove the self join. For instance,
explain (costs off)
select * from generate_series(1,10) t1(a) left join lateral
(select t1.a as t1a, t2.a from t t2 join t t3 on t2.a = t3.a) on true;
QUERY PLAN
-------------------------------------------
Nested Loop Left Join
-> Function Scan on generate_series t1
-> Hash Join
Hash Cond: (t2.a = t3.a)
-> Seq Scan on t t2
-> Hash
-> Seq Scan on t t3
(7 rows)
I think the t2/t3 join can actually be removed.
And now I have a vague feeling that PHVs should not impose any
constraints on removing self joins. But I'm not sure.
It would be great if we could have Tom's perspective on this. So I've
added Tom in the cc list.
FWIW, attached is a band-aid fix for the Assert failure issue in 1), in
case we want to fix it first before we discuss this topic further.
Thanks
Richard
regarding how SJE handles PHVs.
1) When we check PHVs in remove_self_joins_one_group() to see if there
is any PHV that prevents us from removing the self join, we do not check
ph_lateral. This is dangerous because it leaves us with no guarantee of
what the Assert in remove_rel_from_query() asserts:
Assert(!bms_is_member(relid, phinfo->ph_lateral));
To illustrate this problem, look at the query below which would trigger
this Assert.
create table t (a int primary key, b int);
explain (costs off)
select * from t t1 join t t2 on t1.a = t2.a left join
lateral (select t1.a as t1a, * from generate_series(1,1) t3) s on true;
server closed the connection unexpectedly
2) Currently remove_self_joins_one_group() checks PHVs as below
/* there isn't any other place to eval PHV */
if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
bms_is_subset(phinfo->ph_needed, joinrelids))
break;
I'm wondering if we can relax this restriction because it seems to me
that a PHV evaluated/needed at or below the self join should not have
problem if we remove the self join. For instance,
explain (costs off)
select * from generate_series(1,10) t1(a) left join lateral
(select t1.a as t1a, t2.a from t t2 join t t3 on t2.a = t3.a) on true;
QUERY PLAN
-------------------------------------------
Nested Loop Left Join
-> Function Scan on generate_series t1
-> Hash Join
Hash Cond: (t2.a = t3.a)
-> Seq Scan on t t2
-> Hash
-> Seq Scan on t t3
(7 rows)
I think the t2/t3 join can actually be removed.
And now I have a vague feeling that PHVs should not impose any
constraints on removing self joins. But I'm not sure.
It would be great if we could have Tom's perspective on this. So I've
added Tom in the cc list.
FWIW, attached is a band-aid fix for the Assert failure issue in 1), in
case we want to fix it first before we discuss this topic further.
Thanks
Richard
Attachment
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Alexander Korotkov
Date:
On Fri, Nov 10, 2023 at 10:21 AM Richard Guo <guofenglinux@gmail.com> wrote: > FWIW, attached is a band-aid fix for the Assert failure issue in 1), in > case we want to fix it first before we discuss this topic further. I've pushed this, thank you. I think we should keep the source tree in working order, while waiting for SJ removal in lateral to be fixed. ------ Regards, Alexander Korotkov
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Fri, Nov 10, 2023 at 4:20 PM Richard Guo <guofenglinux@gmail.com> wrote:
Looking closer at the codes, I think we still have some loose ends
regarding how SJE handles PHVs.
...
2) Currently remove_self_joins_one_group() checks PHVs as below
/* there isn't any other place to eval PHV */
if (bms_is_subset(phinfo->ph_eval_at, joinrelids) ||
bms_is_subset(phinfo->ph_needed, joinrelids))
break;
I'm wondering if we can relax this restriction because it seems to me
that a PHV evaluated/needed at or below the self join should not have
problem if we remove the self join.
After some more thought, I think PHVs should not impose any constraints
on removing self joins. If the removed rel is contained in the relids
that a PHV is evaluated/needed at or laterally references, it can just
be replaced with the rel that is kept.
Attached is a patch to remove such constraints. Any comments or
feedback are welcome.
Thanks
Richard
on removing self joins. If the removed rel is contained in the relids
that a PHV is evaluated/needed at or laterally references, it can just
be replaced with the rel that is kept.
Attached is a patch to remove such constraints. Any comments or
feedback are welcome.
Thanks
Richard
Attachment
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 20/11/2023 15:49, Richard Guo wrote: > I'm wondering if we can relax this restriction because it seems to me > that a PHV evaluated/needed at or below the self join should not have > problem if we remove the self join. > > > After some more thought, I think PHVs should not impose any constraints > on removing self joins. If the removed rel is contained in the relids > that a PHV is evaluated/needed at or laterally references, it can just > be replaced with the rel that is kept. > > Attached is a patch to remove such constraints. Any comments or > feedback are welcome. Carrying out excavations, I found that it was Teodor Sigaev who added this limitation in v.20 of the patch in an earlier stage of the development. I also had analyzed this part of code and came to the conclusion that it doesn't needed. Only a paranoid approach and the absence of an alternative opinion are the reasons why this code is still exists. Your patch is ok. I think the comment, 'PHVs should not impose ...' looks redundant. It may be enough to have it in the commit comment. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Wed, Nov 22, 2023 at 10:32 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
Your patch is ok.
Thanks for looking into this patch.
I think the comment, 'PHVs should not impose ...' looks redundant. It
may be enough to have it in the commit comment.
Hmm, I'm not sure. I'm thinking that when people read this part of the
code, they might be confused as to why we don't consider PHVs. And this
comment will give them some clues. This is just like how we explain why
we need not worry about LATERAL nor PHVs in is_simple_values().
Thanks
Richard
code, they might be confused as to why we don't consider PHVs. And this
comment will give them some clues. This is just like how we explain why
we need not worry about LATERAL nor PHVs in is_simple_values().
Thanks
Richard
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Tom Lane
Date:
Richard Guo <guofenglinux@gmail.com> writes: > After some more thought, I think PHVs should not impose any constraints > on removing self joins. If the removed rel is contained in the relids > that a PHV is evaluated/needed at or laterally references, it can just > be replaced with the rel that is kept. I experimented with trying to break this patch using this test case: create table t (id int primary key, f1 text, f2 text); create table i (f0 int); explain verbose select * from i left join ( (select *, 1 as x from t tss1) t1 join (select *, 2 as y from t tss2) t2 on t1.id = t2.id ) on i.f0 = t1.id; It seems to successfully remove the self-join between tss1 and tss2, but there is something wrong. If you examine the PlannerInfo just after remove_useless_self_joins, you will notice that the two PlaceHolderVars (for x and y) in the placeholder_list both have :phrels (b 7) :phnullingrels (b) where beforehand x had :phrels (b 6) :phnullingrels (b 5) and y had :phrels (b 7) :phnullingrels (b 5) It's correct to move both phrels locations to rel 7 (the surviving self-joined rel) but what became of the reference to nullingrel 5? That seems clearly wrong, since we have not removed the left join. Much worse, if you look around elsewhere in the data structure, particularly at the processed_tlist, you find instances of the PlaceHolderVars that have not been updated and still have the old values such as ":phrels (b 6)". It's not apparent to me why the cross-checks we have in setrefs.c and elsewhere don't detect a problem with this. But it seems clear that remove_useless_self_joins is still several bricks shy of a load in terms of fully updating the data structure for a join removal. Probably with some more work to add complication to this test case, we could demonstrate an observable failure. BTW, why is it that it seems to prefer to remove the first of the two self-joined rels, rather than the second? That seems jarringly bizarre. regards, tom lane
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Richard Guo <guofenglinux@gmail.com> writes:
> After some more thought, I think PHVs should not impose any constraints
> on removing self joins. If the removed rel is contained in the relids
> that a PHV is evaluated/needed at or laterally references, it can just
> be replaced with the rel that is kept.
I experimented with trying to break this patch using this test case:
create table t (id int primary key, f1 text, f2 text);
create table i (f0 int);
explain verbose
select * from i left join (
(select *, 1 as x from t tss1) t1 join
(select *, 2 as y from t tss2) t2
on t1.id = t2.id
) on i.f0 = t1.id;
Thanks for the query!
It seems to successfully remove the self-join between tss1 and tss2,
but there is something wrong. If you examine the PlannerInfo just
after remove_useless_self_joins, you will notice that the two
PlaceHolderVars (for x and y) in the placeholder_list both have
:phrels (b 7)
:phnullingrels (b)
where beforehand x had
:phrels (b 6)
:phnullingrels (b 5)
and y had
:phrels (b 7)
:phnullingrels (b 5)
It's correct to move both phrels locations to rel 7 (the surviving
self-joined rel) but what became of the reference to nullingrel 5?
That seems clearly wrong, since we have not removed the left join.
What you noticed before remove_useless_self_joins seems not right to me.
The PHVs in placeholder_list are supposed to have empty phnullingrels by
convention, as explained in find_placeholder_info.
* By convention, phinfo->ph_var->phnullingrels is always empty, since the
* PlaceHolderInfo represents the initially-calculated state of the
* PlaceHolderVar.
The PHVs in placeholder_list are supposed to have empty phnullingrels by
convention, as explained in find_placeholder_info.
* By convention, phinfo->ph_var->phnullingrels is always empty, since the
* PlaceHolderInfo represents the initially-calculated state of the
* PlaceHolderVar.
Much worse, if you look around elsewhere in the data structure,
particularly at the processed_tlist, you find instances of the
PlaceHolderVars that have not been updated and still have the
old values such as ":phrels (b 6)".
Good catch! We fail to handle PlaceHolderVar case in sje_walker and
replace_varno_walker, which is wrong. Fixed in v3 patch.
replace_varno_walker, which is wrong. Fixed in v3 patch.
It's not apparent to me why the cross-checks we have in setrefs.c
and elsewhere don't detect a problem with this. But it seems
clear that remove_useless_self_joins is still several bricks
shy of a load in terms of fully updating the data structure for a
join removal. Probably with some more work to add complication
to this test case, we could demonstrate an observable failure.
Fair point. I have the same concern that we still have other loose ends
in SJE updating necessary data structures in self join removal. And I
noticed that Andres expressed the same concern in [1]. I think we
should come up with some solution to detect/avoid such problems, but I
imagine that that should be a seperate patch.
in SJE updating necessary data structures in self join removal. And I
noticed that Andres expressed the same concern in [1]. I think we
should come up with some solution to detect/avoid such problems, but I
imagine that that should be a seperate patch.
BTW, why is it that it seems to prefer to remove the first of
the two self-joined rels, rather than the second? That seems
jarringly bizarre.
Hmm, I'm not sure either. Alexander and Andrei, could you please share
your insights?
[1] https://www.postgresql.org/message-id/20231127180705.svzrlahdmqvosqfz%40awork3.anarazel.de
Thanks
Richard
your insights?
[1] https://www.postgresql.org/message-id/20231127180705.svzrlahdmqvosqfz%40awork3.anarazel.de
Thanks
Richard
Attachment
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:BTW, why is it that it seems to prefer to remove the first of
the two self-joined rels, rather than the second? That seems
jarringly bizarre.Hmm, I'm not sure either. Alexander and Andrei, could you please share
your insights?
BTW, while reading the codes, I noticed this commit of
remove_self_joins_recurse.
* ... To avoid complexity, limit the max power of this set by a GUC.
But where is the GUC? I guess that it refers to
self_join_search_limit, which has been removed during development.
So we should revise this commit to at least remove any mention of the
GUC. Maybe it'd better to add a new commit explaining why we are not
concerned about cases where the number of self joins is too large.
Thanks
Richard
remove_self_joins_recurse.
* ... To avoid complexity, limit the max power of this set by a GUC.
But where is the GUC? I guess that it refers to
self_join_search_limit, which has been removed during development.
So we should revise this commit to at least remove any mention of the
GUC. Maybe it'd better to add a new commit explaining why we are not
concerned about cases where the number of self joins is too large.
Thanks
Richard
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 28/11/2023 15:03, Richard Guo wrote: > On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com > <mailto:guofenglinux@gmail.com>> wrote: > On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > BTW, why is it that it seems to prefer to remove the first of > the two self-joined rels, rather than the second? That seems > jarringly bizarre. > Hmm, I'm not sure either. Alexander and Andrei, could you please share > your insights? I was thinking about that choice and, in earlier versions, removed outer relations. I did not find a difference in removing a specific side of the JOIN. Rewriting the patch right before the commit, I found out that remove_useless_joins removed the inner join and chose to do the same, like a convention. > BTW, while reading the codes, I noticed this commit of > remove_self_joins_recurse. > > * ... To avoid complexity, limit the max power of this set by a GUC. > > But where is the GUC? I guess that it refers to > self_join_search_limit, which has been removed during development. > > So we should revise this commit to at least remove any mention of the > GUC. Maybe it'd better to add a new commit explaining why we are not > concerned about cases where the number of self joins is too large. Maybe. But actually, we just did some benchmarking to see the influence. Moreover, Alexander's code on preliminary sorting the relations and the fact that inheritance isn't still expanded were reasons we aren't afraid of significant degradation in practical cases. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On 28/11/2023 14:03, Richard Guo wrote: > > On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Richard Guo <guofenglinux@gmail.com <mailto:guofenglinux@gmail.com>> > writes: > > After some more thought, I think PHVs should not impose any > constraints > > on removing self joins. If the removed rel is contained in the > relids > > that a PHV is evaluated/needed at or laterally references, it can > just > > be replaced with the rel that is kept. > > I experimented with trying to break this patch using this test case: > > create table t (id int primary key, f1 text, f2 text); > create table i (f0 int); > > explain verbose > select * from i left join ( > (select *, 1 as x from t tss1) t1 join > (select *, 2 as y from t tss2) t2 > on t1.id <http://t1.id> = t2.id <http://t2.id> > ) on i.f0 = t1.id <http://t1.id>; > > > Thanks for the query! > > It seems to successfully remove the self-join between tss1 and tss2, > but there is something wrong. If you examine the PlannerInfo just > after remove_useless_self_joins, you will notice that the two > PlaceHolderVars (for x and y) in the placeholder_list both have > > :phrels (b 7) > :phnullingrels (b) > > where beforehand x had > > :phrels (b 6) > :phnullingrels (b 5) > > and y had > > :phrels (b 7) > :phnullingrels (b 5) > > It's correct to move both phrels locations to rel 7 (the surviving > self-joined rel) but what became of the reference to nullingrel 5? > That seems clearly wrong, since we have not removed the left join. > > > What you noticed before remove_useless_self_joins seems not right to me. > The PHVs in placeholder_list are supposed to have empty phnullingrels by > convention, as explained in find_placeholder_info. > > * By convention, phinfo->ph_var->phnullingrels is always empty, since the > * PlaceHolderInfo represents the initially-calculated state of the > * PlaceHolderVar. > > Much worse, if you look around elsewhere in the data structure, > particularly at the processed_tlist, you find instances of the > PlaceHolderVars that have not been updated and still have the > old values such as ":phrels (b 6)". > > > Good catch! We fail to handle PlaceHolderVar case in sje_walker and > replace_varno_walker, which is wrong. Fixed in v3 patch. > > It's not apparent to me why the cross-checks we have in setrefs.c > and elsewhere don't detect a problem with this. But it seems > clear that remove_useless_self_joins is still several bricks > shy of a load in terms of fully updating the data structure for a > join removal. Probably with some more work to add complication > to this test case, we could demonstrate an observable failure. > > > Fair point. I have the same concern that we still have other loose ends > in SJE updating necessary data structures in self join removal. And I > noticed that Andres expressed the same concern in [1]. I think we > should come up with some solution to detect/avoid such problems, but I > imagine that that should be a seperate patch. Agree. PlannerInfo is a hot structure where many new features try to save some data. The history of this patch has shown, for example, JoinDomains. Finding field jd_relids, which could contain the link to the removing relation, was tricky. I still have no idea how to check it internally except by attaching a comment to the PlannerInfo structure. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com> wrote:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:Much worse, if you look around elsewhere in the data structure,
particularly at the processed_tlist, you find instances of the
PlaceHolderVars that have not been updated and still have the
old values such as ":phrels (b 6)".Good catch! We fail to handle PlaceHolderVar case in sje_walker and
replace_varno_walker, which is wrong. Fixed in v3 patch.
I noticed that this issue exists on master even without the
Don't-constrain-SJE-due-to-PHVs patch, and it can be illustrated with a
query from regression test.
-- Test that placeholders are updated correctly after join removal
explain (costs off)
select * from (values (1)) x
left join (select coalesce(y.q1, 1) from int8_tbl y
right join sj j1 inner join sj j2 on j1.a = j2.a
on true) z
on true;
In this query it successfully removes the self-join between j1 (6) and
j2 (7). But in the processed_tlist, you can still find PlaceHolderVar
with ":phrels (b 5 6 7 9)". So I think we need to handle PlaceHolderVar
case in sje_walker and replace_varno_walker, whether or not we
incorporate the Don't-constrain-SJE-due-to-PHVs patch.
Also, it seems to me that sje_walker is redundant. Currently it is used
to walk the query tree for varno replacement, a task that can be
fulfilled with replace_varno_walker.
0001 adds the handling of PlaceHolderVar case in replace_varno_walker,
and meanwhile retires sje_walker.
0002 is the original Don't-constrain-SJE-due-to-PHVs patch.
0003 is a fix of the comment for remove_self_joins_recurse, as described
upthread.
Thanks
Richard
Don't-constrain-SJE-due-to-PHVs patch, and it can be illustrated with a
query from regression test.
-- Test that placeholders are updated correctly after join removal
explain (costs off)
select * from (values (1)) x
left join (select coalesce(y.q1, 1) from int8_tbl y
right join sj j1 inner join sj j2 on j1.a = j2.a
on true) z
on true;
In this query it successfully removes the self-join between j1 (6) and
j2 (7). But in the processed_tlist, you can still find PlaceHolderVar
with ":phrels (b 5 6 7 9)". So I think we need to handle PlaceHolderVar
case in sje_walker and replace_varno_walker, whether or not we
incorporate the Don't-constrain-SJE-due-to-PHVs patch.
Also, it seems to me that sje_walker is redundant. Currently it is used
to walk the query tree for varno replacement, a task that can be
fulfilled with replace_varno_walker.
0001 adds the handling of PlaceHolderVar case in replace_varno_walker,
and meanwhile retires sje_walker.
0002 is the original Don't-constrain-SJE-due-to-PHVs patch.
0003 is a fix of the comment for remove_self_joins_recurse, as described
upthread.
Thanks
Richard
Attachment
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Andrei Lepikhov
Date:
On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > :phnullingrels (b 5) > It's correct to move both phrels locations to rel 7 (the surviving > self-joined rel) but what became of the reference to nullingrel 5? > That seems clearly wrong, since we have not removed the left join. I must confess I couldn't reproduce this issue. I watched only the value ':phnullingrels (b)' everywhere. On 5/12/2023 09:35, Richard Guo wrote: > > On Tue, Nov 28, 2023 at 3:03 PM Richard Guo <guofenglinux@gmail.com > <mailto:guofenglinux@gmail.com>> wrote: > > On Tue, Nov 28, 2023 at 1:42 AM Tom Lane <tgl@sss.pgh.pa.us > <mailto:tgl@sss.pgh.pa.us>> wrote: > > Much worse, if you look around elsewhere in the data structure, > particularly at the processed_tlist, you find instances of the > PlaceHolderVars that have not been updated and still have the > old values such as ":phrels (b 6)". > > > Good catch! We fail to handle PlaceHolderVar case in sje_walker and > replace_varno_walker, which is wrong. Fixed in v3 patch. Yes, it was my blunder. I didn't recognize that phinfo->ph_var points to a copy of the PlaceHolderVar and thought it was enough to pass through the root->placeholder_list only. Adding replacement code into the walker is the proper correction. > I noticed that this issue exists on master even without the > Don't-constrain-SJE-due-to-PHVs patch, and it can be illustrated with a > query from regression test. > > -- Test that placeholders are updated correctly after join removal > explain (costs off) > select * from (values (1)) x > left join (select coalesce(y.q1, 1) from int8_tbl y > right join sj j1 inner join sj j2 on j1.a = j2.a > on true) z > on true; > Also, it seems to me that sje_walker is redundant. Currently it is used > to walk the query tree for varno replacement, a task that can be > fulfilled with replace_varno_walker. Agree. >> It's not apparent to me why the cross-checks we have in setrefs.c >> and elsewhere don't detect a problem with this. But it seems >> clear that remove_useless_self_joins is still several bricks >> shy of a load in terms of fully updating the data structure for a >> join removal. Probably with some more work to add complication >> to this test case, we could demonstrate an observable failure. > Fair point. I have the same concern that we still have other loose > ends > in SJE updating necessary data structures in self join removal. And I > noticed that Andres expressed the same concern in [1]. I think we > should come up with some solution to detect/avoid such problems, but I > imagine that that should be a separate patch. It is an excellent task to ponder. At least, we should answer later why the setrefs.c didn't detect that, and could we do something here. Maybe all we could do - is add comments to the PlannerInfo structure. According to the patches: 0001 - looks good for me. 0002 - I don't understand why to use 'explain' in VERBOSE mode in tests. What do you try to detect here? 0003 - ok, but too short. Maybe squash all these patches into one? I think these corrections are good enough for commit. -- regards, Andrei Lepikhov Postgres Professional
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote:
According to the patches:
0001 - looks good for me.
0002 - I don't understand why to use 'explain' in VERBOSE mode in tests.
What do you try to detect here?
The VERBOSE mode is used to observe that PHVs are adjusted correctly and
put in the right targetlist.
put in the right targetlist.
0003 - ok, but too short. Maybe squash all these patches into one?
It seems to me that these three patches are addressing separate issues
and are independent of each other. For instance, the issue addressed by
0001 exists on master even without other changes, and 0002 is an
improvement that does not depend on others, while 0003 is a fix of an
incorrect comment. I don't think it's a good idea to squash them into
one.
and are independent of each other. For instance, the issue addressed by
0001 exists on master even without other changes, and 0002 is an
improvement that does not depend on others, while 0003 is a fix of an
incorrect comment. I don't think it's a good idea to squash them into
one.
I think these corrections are good enough for commit.
Thank you for reviewing these patches. Alexander, what are your
thoughts on these patches?
Thanks
Richard
thoughts on these patches?
Thanks
Richard
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Alexander Korotkov
Date:
On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote: > On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: >> According to the patches: >> 0001 - looks good for me. >> 0002 - I don't understand why to use 'explain' in VERBOSE mode in tests. >> What do you try to detect here? > > The VERBOSE mode is used to observe that PHVs are adjusted correctly and > put in the right targetlist. > >> 0003 - ok, but too short. Maybe squash all these patches into one? > > It seems to me that these three patches are addressing separate issues > and are independent of each other. For instance, the issue addressed by > 0001 exists on master even without other changes, and 0002 is an > improvement that does not depend on others, while 0003 is a fix of an > incorrect comment. I don't think it's a good idea to squash them into > one. > >> I think these corrections are good enough for commit. > > Thank you for reviewing these patches. Alexander, what are your > thoughts on these patches? Looks good to me. I'm going to push this if there are no objections. ------ Regards, Alexander Korotkov
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Alexander Korotkov
Date:
On Sat, Dec 23, 2023 at 1:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote: > On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote: > > On Thu, Dec 7, 2023 at 10:55 AM Andrei Lepikhov <a.lepikhov@postgrespro.ru> wrote: > >> According to the patches: > >> 0001 - looks good for me. > >> 0002 - I don't understand why to use 'explain' in VERBOSE mode in tests. > >> What do you try to detect here? > > > > The VERBOSE mode is used to observe that PHVs are adjusted correctly and > > put in the right targetlist. > > > >> 0003 - ok, but too short. Maybe squash all these patches into one? > > > > It seems to me that these three patches are addressing separate issues > > and are independent of each other. For instance, the issue addressed by > > 0001 exists on master even without other changes, and 0002 is an > > improvement that does not depend on others, while 0003 is a fix of an > > incorrect comment. I don't think it's a good idea to squash them into > > one. > > > >> I think these corrections are good enough for commit. > > > > Thank you for reviewing these patches. Alexander, what are your > > thoughts on these patches? > > Looks good to me. I'm going to push this if there are no objections. Pushed! Thanks to all the participants. ------ Regards, Alexander Korotkov
Re: BUG #18187: Unexpected error: "variable not found in subplan target lists" triggered by JOIN
From
Richard Guo
Date:
On Mon, Dec 25, 2023 at 7:38 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
On Sat, Dec 23, 2023 at 1:27 AM Alexander Korotkov <aekorotkov@gmail.com> wrote:
> On Tue, Dec 12, 2023 at 8:41 AM Richard Guo <guofenglinux@gmail.com> wrote:
> > Thank you for reviewing these patches. Alexander, what are your
> > thoughts on these patches?
>
> Looks good to me. I'm going to push this if there are no objections.
Pushed! Thanks to all the participants.
Thanks for pushing!
Thanks
Richard
Thanks
Richard