Thread: BUG #18634: Wrong varnullingrels with merge ... when not matched by source

BUG #18634: Wrong varnullingrels with merge ... when not matched by source

From
PG Bug reporting form
Date:
The following bug has been logged on the website:

Bug reference:      18634
Logged by:          Alexander Lakhin
Email address:      exclusion@gmail.com
PostgreSQL version: 17.0
Operating system:   Ubuntu 22.04
Description:

The following script:
CREATE TABLE t (a int);
INSERT INTO t VALUES(1), (2);
CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);

MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
  ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;

produces:
ERROR:  XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
LOCATION:  search_indexed_tlist_for_var, setrefs.c:2847

At the same time,
MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
  ON s.a = v.a WHEN MATCHED THEN DELETE;
planned and executed with no error.


PG Bug reporting form <noreply@postgresql.org> writes:
> The following script:
> CREATE TABLE t (a int);
> INSERT INTO t VALUES(1), (2);
> CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);

> MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
>   ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;

> produces:
> ERROR:  XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
> LOCATION:  search_indexed_tlist_for_var, setrefs.c:2847

I haven't run this fully to ground, but what it looks like
is that preprocess_targetlist is generating row identity
Vars that lack required varnullingrels.  I don't understand
though why this only seems to affect MERGE.

            regards, tom lane



On Fri, Sep 27, 2024 at 6:41 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> PG Bug reporting form <noreply@postgresql.org> writes:
> > The following script:
> > CREATE TABLE t (a int);
> > INSERT INTO t VALUES(1), (2);
> > CREATE VIEW v AS SELECT a FROM t WHERE EXISTS (SELECT 1 FROM t);
>
> > MERGE INTO v USING (SELECT * FROM generate_series(1,1)) AS s(a)
> >   ON s.a = v.a WHEN NOT MATCHED BY SOURCE THEN DELETE;
>
> > produces:
> > ERROR:  XX000: wrong varnullingrels (b) (expected (b 4)) for Var 5/1
> > LOCATION:  search_indexed_tlist_for_var, setrefs.c:2847
>
> I haven't run this fully to ground, but what it looks like
> is that preprocess_targetlist is generating row identity
> Vars that lack required varnullingrels.  I don't understand
> though why this only seems to affect MERGE.

It looks like that preprocess_targetlist will add any vars used in
parse->mergeJoinCondition that belong to the source relation to the
processed tlist.  This logic was introduced to support WHEN NOT
MATCHED BY SOURCE actions (see 0294df2f1).  For such actions, the
source relation is on the nullable side of the outer join.  But when
adding the vars used in the join condition that belong to source
relation to the tlist, we fail to mark them as nullable by the join.

I think we can check the jointype of the join between the target and
the source relation when adding the vars in mergeJoinCondition.  If it
is JOIN_LEFT, we mark the vars that belong to source as nullable by
this join.

With this routine, ISTM we'd need a way for preprocess_targetlist to
access the JoinExpr that we build in transform_MERGE_to_join.

Thanks
Richard



Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Fri, 27 Sept 2024 at 13:52, Richard Guo <guofenglinux@gmail.com> wrote:
>> I think we can check the jointype of the join between the target and
>> the source relation when adding the vars in mergeJoinCondition.  If it
>> is JOIN_LEFT, we mark the vars that belong to source as nullable by
>> this join.

> Another option is to do it in transform_MERGE_to_join(). That feels
> safer, because the jointree might have been modified by the time we
> reach preprocess_targetlist().

Yeah, I think it's critical that these Vars be already correctly
marked before we engage in all the slicing-and-dicing that
prepjointree et al will do.  As an example, it seems not impossible
for join removal to make wrong decisions if they aren't.

> Something like the attached.

Could use some comments ... but actually, now I'm confused about why
any of this is the right thing at all:

+     * Similarly, any non-target Vars in the join condition will be added to
+     * the targetlist by preprocess_targetlist(), and so must be marked as
+     * nullable by the join, for LEFT and FULL joins.

Why do we need these Vars in the tlist?  If they're for re-evaluating
the join condition, isn't the already-nulled form of them the wrong
thing?

            regards, tom lane



On Fri, Sep 27, 2024 at 11:43 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Could use some comments ... but actually, now I'm confused about why
> any of this is the right thing at all:
>
> +        * Similarly, any non-target Vars in the join condition will be added to
> +        * the targetlist by preprocess_targetlist(), and so must be marked as
> +        * nullable by the join, for LEFT and FULL joins.
>
> Why do we need these Vars in the tlist?  If they're for re-evaluating
> the join condition, isn't the already-nulled form of them the wrong
> thing?

I have the same concern.  I think we should NOT mark the vars in
mergeJoinCondition as nullable, as mergeJoinCondition acts as join
quals rather than filter quals at that outer join.  Instead, we should
mark them nullable when they are pulled out and ready to be added to
the targetlist, if they are really needed in the targetlist.

Thanks
Richard



On Sun, Sep 29, 2024 at 3:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
> On Sat, 28 Sept 2024 at 01:40, Richard Guo <guofenglinux@gmail.com> wrote:
> > I have the same concern.  I think we should NOT mark the vars in
> > mergeJoinCondition as nullable, as mergeJoinCondition acts as join
> > quals rather than filter quals at that outer join.  Instead, we should
> > mark them nullable when they are pulled out and ready to be added to
> > the targetlist, if they are really needed in the targetlist.
>
> Actually, the marking is done after building the join node, so it's
> only marking a copy of the join condition, for use above the join. The
> original condition inside the join node remains unmarked, so I think
> it's right.

Ah yes, you are right.  I thought the join node we build for MERGE
would use the marked mergeJoinCondition as join quals, but that's not
the case.

> I spent some time trying to figure out why none of the existing tests
> hit this error, and I think the reason is that all the previous tests
> involved a plan where the ModifyTable node is directly on top of the
> join node, so the top targetlist was the join node's targetlist, and
> therefore wasn't marked. But in the example here, there is a one-time
> filter Result node between the ModifyTable node and the join node,
> which means the ModifyTable node pulls from the Result node, whose
> output is marked as nullable, because it's above the join. That makes
> the error somewhat rare, though maybe there are other cases that can
> lead to a plan node being inserted between the ModifyTable node and
> the join node.
>
> It feels a bit unsatisfactory that this wasn't detectable with a
> ModifyTable node directly on top of the join node, making the bug hard
> to spot, but I don't know whether it would be feasible to do anything
> about that.

For an outer join, any vars appearing in its targetlist (and qpqual)
should be marked nullable if they are from the nullable side, because
they are logically above the join.  However, when we fix up the
targetlist and qpqual, we don't have enough info available to identify
the nullingrel bits added by the outer join.  So we have to use
superset matches rather than exact matches.

This is why we don't hit this error in cases where the ModifyTable
node is directly on top of the join node, even though we fail to mark
the vars in targetlist correctly.

In Alexander's case, there is a Result node in between.  When we fix
up the targetlist for the Result node, we perform exact matches for
the nullingrel bits.  That's how this issue is revealed.

Yeah, it's a bit unsatisfying that we can only perform superset
matches rather than exact matches for the Vars in the targetlist and
qpqual of an outer join.  Maybe we can record the ojrelid of the outer
join in Join node, and then match these Vars with input Vars'
nullingrels plus this ojrelid.  But when the outer joins are
re-ordered according to identity 3, it becomes very tricky to figure
out what the correct ojrelid(s) we should use for the outer join.

Thanks
Richard



On Mon, 30 Sept 2024 at 03:16, Richard Guo <guofenglinux@gmail.com> wrote:
>
> On Sun, Sep 29, 2024 at 3:49 PM Dean Rasheed <dean.a.rasheed@gmail.com> wrote:
>
> > I spent some time trying to figure out why none of the existing tests
> > hit this error, and I think the reason is that all the previous tests
> > involved a plan where the ModifyTable node is directly on top of the
> > join node, so the top targetlist was the join node's targetlist, and
> > therefore wasn't marked. But in the example here, there is a one-time
> > filter Result node between the ModifyTable node and the join node,
> > which means the ModifyTable node pulls from the Result node, whose
> > output is marked as nullable, because it's above the join. That makes
> > the error somewhat rare, though maybe there are other cases that can
> > lead to a plan node being inserted between the ModifyTable node and
> > the join node.
> >
> > It feels a bit unsatisfactory that this wasn't detectable with a
> > ModifyTable node directly on top of the join node, making the bug hard
> > to spot, but I don't know whether it would be feasible to do anything
> > about that.
>
> For an outer join, any vars appearing in its targetlist (and qpqual)
> should be marked nullable if they are from the nullable side, because
> they are logically above the join.  However, when we fix up the
> targetlist and qpqual, we don't have enough info available to identify
> the nullingrel bits added by the outer join.  So we have to use
> superset matches rather than exact matches.
>
> This is why we don't hit this error in cases where the ModifyTable
> node is directly on top of the join node, even though we fail to mark
> the vars in targetlist correctly.
>

OK, that makes sense, I think.

I have pushed and back-patched both fixes.

Regards,
Dean