Thread: "variable not found in subplan target list"
I have to run now so can't dissect it, but while running sqlsmith on the SQL/JSON patch after Justin's report, I got $SUBJECT in this query: MERGE INTO public.target_parted as target_0 USING (select subq_0.c5 as c0, subq_0.c0 as c1, ref_0.a as c2, subq_0.c1 as c3, subq_0.c9 as c4, (select c from public.prt2_m_p3 limit 1 offset 1) as c5, subq_0.c8 as c6, ref_0.a as c7, subq_0.c7 as c8, subq_0.c1 as c9, pg_catalog.system_user() as c10 from public.itest1 as ref_0 left join (select ref_1.matches as c0, ref_1.typ as c1, ref_1.colname as c2, (select slotname from public.iface limit 1 offset 44) as c3, ref_1.matches as c4, ref_1.op as c5, ref_1.matches as c6, ref_1.value as c7, ref_1.op as c8, ref_1.op as c9, ref_1.typ as c10 from public.brinopers_multi as ref_1 where cast(null as polygon) <@ (select polygon from public.tab_core_types limit 1 offset 22) ) as subq_0 on (cast(null as macaddr8) >= cast(null as macaddr8)) where subq_0.c10 > subq_0.c2 limit 49) as subq_1 ON target_0.b = subq_1.c2 WHEN MATCHED AND (cast(null as box) |>> cast(null as box)) or (cast(null as lseg) ?-| (select s from public.lseg_tbl limit 1 offset 6) ) THEN DELETE WHEN NOT MATCHED AND (EXISTS ( select 21 as c0, subq_2.c0 as c1 from public.itest14 as sample_0 tablesample system (3.6) inner join public.num_exp_sqrt as sample_1 tablesample bernoulli (0.3) on (cast(null as "char") <= cast(null as "char")), lateral (select sample_1.id as c0 from public.a as ref_2 where (cast(null as lseg) <@ cast(null as line)) or ((select b3 from public.bit_defaults limit 1 offset 80) <> (select b3 from public.bit_defaults limit 1 offset 4) ) limit 158) as subq_2 where (cast(null as name) !~ (select t from public.test_tsvector limit 1 offset 5) ) and ((select bool from public.tab_core_types limit 1 offset 61) < (select pg_catalog.bool_or(v) from public.rtest_view1) ))) or (18 is NULL) THEN INSERT VALUES ( pg_catalog.int4um( cast(public.func_with_bad_set() as int4)), 13) WHEN MATCHED AND ((24 is not NULL) or (true)) or (cast(null as "timestamp") <= cast(null as timestamptz)) THEN UPDATE set b = target_0.b Ugh. I got no more SQL/JSON related crashes so far. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "No hay ausente sin culpa ni presente sin disculpa" (Prov. francés)
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I have to run now so can't dissect it, but while running sqlsmith on the > SQL/JSON patch after Justin's report, I got $SUBJECT in this query: Reproduces in HEAD and v15 too (once you replace pg_catalog.system_user with some function that exists in v15). So it's not the fault of the JSON patch, nor of my outer-join hacking which had been my first thought. regards, tom lane
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > I have to run now so can't dissect it, but while running sqlsmith on the > SQL/JSON patch after Justin's report, I got $SUBJECT in this query: I reduced this down to MERGE INTO public.target_parted as target_0 USING public.itest1 as ref_0 ON target_0.b = ref_0.a WHEN NOT MATCHED THEN INSERT VALUES (42, 13); The critical moving part seems to just be that the MERGE target is a partitioned table ... but surely somebody tested that before? regards, tom lane
I wrote: > I reduced this down to > MERGE INTO public.target_parted as target_0 > USING public.itest1 as ref_0 > ON target_0.b = ref_0.a > WHEN NOT MATCHED > THEN INSERT VALUES (42, 13); > The critical moving part seems to just be that the MERGE target > is a partitioned table ... but surely somebody tested that before? Oh, it's not just any partitioned table: regression=# \d+ target_parted Partitioned table "public.target_parted" Column | Type | Collation | Nullable | Default | Storage | Compression | Stats target | Description --------+---------+-----------+----------+---------+---------+-------------+--------------+------------- a | integer | | | | plain | | | b | integer | | | | plain | | | Partition key: LIST (a) Number of partitions: 0 The planner is reducing the scan of target_parted to a dummy scan, as is reasonable, but it forgets to provide ctid as an output from that scan; then the parent join node is unhappy because it does have a ctid output. So it looks like the problem is some shortcut we take while creating the dummy scan. I suppose that without the planner bug, this'd fail at runtime for lack of a partition to put (42,13) into. Because of that, the case isn't really interesting for production, which may explain the lack of reports. regards, tom lane
I wrote: > The planner is reducing the scan of target_parted to > a dummy scan, as is reasonable, but it forgets to > provide ctid as an output from that scan; then the > parent join node is unhappy because it does have > a ctid output. So it looks like the problem is some > shortcut we take while creating the dummy scan. Oh, actually the problem is in distribute_row_identity_vars, which is supposed to handle this case, but it thinks it doesn't have to back-fill the rel's reltarget. Wrong. Now that I see the problem, I wonder if we can't reproduce a similar symptom without MERGE, which would mean that v14 has the issue too. The attached seems to fix it, but I'm going to look for a non-MERGE test case before pushing. regards, tom lane diff --git a/src/backend/optimizer/util/appendinfo.c b/src/backend/optimizer/util/appendinfo.c index 9d377385f1..c1b1557570 100644 --- a/src/backend/optimizer/util/appendinfo.c +++ b/src/backend/optimizer/util/appendinfo.c @@ -21,6 +21,7 @@ #include "nodes/nodeFuncs.h" #include "optimizer/appendinfo.h" #include "optimizer/pathnode.h" +#include "optimizer/planmain.h" #include "parser/parsetree.h" #include "utils/lsyscache.h" #include "utils/rel.h" @@ -994,9 +995,10 @@ distribute_row_identity_vars(PlannerInfo *root) * certainly process no rows. Handle this edge case by re-opening the top * result relation and adding the row identity columns it would have used, * as preprocess_targetlist() would have done if it weren't marked "inh". - * (This is a bit ugly, but it seems better to confine the ugliness and - * extra cycles to this unusual corner case.) We needn't worry about - * fixing the rel's reltarget, as that won't affect the finished plan. + * Then re-run build_base_rel_tlists() to ensure that the added columns + * get propagated to the relation's reltarget. (This is a bit ugly, but + * it seems better to confine the ugliness and extra cycles to this + * unusual corner case.) */ if (root->row_identity_vars == NIL) { @@ -1006,6 +1008,8 @@ distribute_row_identity_vars(PlannerInfo *root) add_row_identity_columns(root, result_relation, target_rte, target_relation); table_close(target_relation, NoLock); + build_base_rel_tlists(root, root->processed_tlist); + /* There are no ROWID_VAR Vars in this case, so we're done. */ return; }
So I'm back home and found a couple more weird errors in the log: MERGE INTO public.idxpart2 as target_0 USING (select 1 from public.xmltest2 as ref_0 inner join public.prt1_l_p1 as sample_0 inner join fkpart4.droppk as ref_1 on (sample_0.a = ref_1.a ) on (true) limit 50) as subq_0 left join information_schema.transforms as ref_2 left join public.transition_table_status as sample_1 on (ref_2.transform_type is not NULL) on (true) ON target_0.a = sample_1.level WHEN MATCHED THEN UPDATE set a = target_0.a; ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 DETALLE: plan node relids (b 1), pruneinfo relids (b 36) This one is probably my fault, will look later. select pg_catalog.pg_stat_get_buf_fsync_backend() as c9 from public.tenk2 as ref_0 where (ref_0.stringu2 is NULL) and (EXISTS ( select 1 from fkpart5.fk1 as ref_1 where pg_catalog.current_date() < (select pg_catalog.max(filler3) from public.mcv_lists))) ; ERROR: subplan "InitPlan 1 (returns $1)" was not initialized CONTEXTO: parallel worker select 1 as c0 from (select subq_0.c9 as c5, subq_0.c8 as c9 from public.iso8859_5_inputs as ref_0, lateral (select ref_1.ident as c2, ref_0.description as c8, ref_1.used_bytes as c9 from pg_catalog.pg_backend_memory_contexts as ref_1 where true ) as subq_0 where subq_0.c2 is not NULL) as subq_1 inner join pg_catalog.pg_class as sample_0 on (subq_1.c5 = public.int8alias1in( cast(case when subq_1.c9 is not NULL then null end as cstring))) where true; ERROR: could not find commutator for operator 53286 There were quite a few of those "variable not found" ones, both mentioning singular "targetlist" and others that said "targetlists". I reran them with your patch and they no longer error out, so I guess it's all the same bug. -- Álvaro Herrera PostgreSQL Developer — https://www.EnterpriseDB.com/ "I must say, I am absolutely impressed with what pgsql's implementation of VALUES allows me to do. It's kind of ridiculous how much "work" goes away in my code. Too bad I can't do this at work (Oracle 8/9)." (Tom Allison) http://archives.postgresql.org/pgsql-general/2007-06/msg00016.php
Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > So I'm back home and found a couple more weird errors in the log: > ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 > DETALLE: plan node relids (b 1), pruneinfo relids (b 36) This one reproduces for me. > select > pg_catalog.pg_stat_get_buf_fsync_backend() as c9 > from > public.tenk2 as ref_0 > where (ref_0.stringu2 is NULL) > and (EXISTS ( > select 1 from fkpart5.fk1 as ref_1 > where pg_catalog.current_date() < (select pg_catalog.max(filler3) from public.mcv_lists))) ; > ERROR: subplan "InitPlan 1 (returns $1)" was not initialized > CONTEXTO: parallel worker Hmph, I couldn't reproduce that, not even with other settings of debug_parallel_query. Are you running it with non-default planner parameters? > select 1 as c0 > ... > ERROR: could not find commutator for operator 53286 I got a slightly different error: ERROR: missing support function 1(195306,195306) in opfamily 1976 where regression=# select 195306::regtype; regtype ------------ int8alias1 (1 row) So that one is related to the intentionally-somewhat-broken int8 opclass configuration that equivclass.sql leaves behind. I've always had mixed emotions about whether leaving that set up that way was a good idea or not. In principle nothing really bad should happen, but it can lead to confusing errors like this one. Maybe it'd be better to roll that back? regards, tom lane
On Wed, Mar 29, 2023 at 3:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > So I'm back home and found a couple more weird errors in the log: > > > ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 > > DETALLE: plan node relids (b 1), pruneinfo relids (b 36) > > This one reproduces for me. I've looked into this one and the attached patch fixes it for me. Turns out set_plan_refs()'s idea of when the entries from PlannerInfo.partPruneInfos are transferred into PlannerGlobal.partPruneInfo was wrong. Though, I wonder if we need to keep ec386948948 that introduced the notion of part_prune_index around if the project that needed it [1] has moved on to an entirely different approach altogether, one that doesn't require hacking up the pruning code. -- Thanks, Amit Langote EDB: http://www.enterprisedb.com [1] https://commitfest.postgresql.org/42/3478/
Attachment
On 2023-Mar-29, Amit Langote wrote: > On Wed, Mar 29, 2023 at 3:39 AM Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Alvaro Herrera <alvherre@alvh.no-ip.org> writes: > > > So I'm back home and found a couple more weird errors in the log: > > > > > ERROR: mismatching PartitionPruneInfo found at part_prune_index 0 > > > DETALLE: plan node relids (b 1), pruneinfo relids (b 36) > > > > This one reproduces for me. > > I've looked into this one and the attached patch fixes it for me. > Turns out set_plan_refs()'s idea of when the entries from > PlannerInfo.partPruneInfos are transferred into > PlannerGlobal.partPruneInfo was wrong. Thanks for the patch. I've pushed it to github for CI testing, and if there are no problems I'll put it in. > Though, I wonder if we need to keep ec386948948 that introduced the > notion of part_prune_index around if the project that needed it [1] > has moved on to an entirely different approach altogether, one that > doesn't require hacking up the pruning code. Hmm, that's indeed tempting. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hi Amit, On 2023-Mar-30, Alvaro Herrera wrote: > On 2023-Mar-29, Amit Langote wrote: > > Though, I wonder if we need to keep ec386948948 that introduced the > > notion of part_prune_index around if the project that needed it [1] > > has moved on to an entirely different approach altogether, one that > > doesn't require hacking up the pruning code. > > Hmm, that's indeed tempting. We have an open item about this, and I see no reason not to do it. I checked, and putting things back is just a matter of reverting 589bb816499e and ec386948948, cleaning up some trivial pgindent-induced conflicts, and bumping catversion once more. Would you like to do that yourself, or do you prefer that I do it? Ideally, we'd do it before beta1. -- Álvaro Herrera Breisgau, Deutschland — https://www.EnterpriseDB.com/
On 2023-May-02, Alvaro Herrera wrote: > We have an open item about this, and I see no reason not to do it. I > checked, and putting things back is just a matter of reverting > 589bb816499e and ec386948948, cleaning up some trivial pgindent-induced > conflicts, and bumping catversion once more. Would you like to do that > yourself, or do you prefer that I do it? Ideally, we'd do it before > beta1. I have pushed the revert now. -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/
Hi Alvaro,
On Thu, May 4, 2023 at 19:44 Alvaro Herrera <alvherre@alvh.no-ip.org> wrote:
On 2023-May-02, Alvaro Herrera wrote:
> We have an open item about this, and I see no reason not to do it. I
> checked, and putting things back is just a matter of reverting
> 589bb816499e and ec386948948, cleaning up some trivial pgindent-induced
> conflicts, and bumping catversion once more. Would you like to do that
> yourself, or do you prefer that I do it? Ideally, we'd do it before
> beta1.
I have pushed the revert now.
Thanks for taking care of it.
(Wouldn’t have been able to get to it till Monday myself.)
Thanks, Amit Langote
EDB: http://www.enterprisedb.com
EDB: http://www.enterprisedb.com