Thread: Another multi-row VALUES bug

Another multi-row VALUES bug

From

Dean Rasheed

Date:

23 November 2022, 12:43:47

I was thinking some more about the recent fix to multi-row VALUES
handling in the rewriter (b8f2687fdc), and I realised that there is
another bug in the way DEFAULT values are handled:

In RewriteQuery(), the code assumes that in a multi-row INSERT query,
the VALUES RTE will be the only thing in the query's fromlist. That's
true for the original query, but it's not necessarily the case for
product queries, if the rule action performs a multi-row insert,
leading to a new VALUES RTE that the DEFAULT-handling code might fail
to process. For example:

CREATE TABLE foo(a int);
INSERT INTO foo VALUES (1);

CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text);
CREATE RULE foo_r AS ON UPDATE TO foo
  DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'),
                                     (DEFAULT, new.a, 'new');

UPDATE foo SET a = 2 WHERE a = 1;

ERROR:  unrecognized node type: 43

There's a similar example to this in the regression tests, but it
doesn't test DEFAULT-handling.

It's also possible for the current code to cause the same VALUES RTE
to be rewritten multiple times, when recursing into product queries
(if the rule action doesn't add any more stuff to the query's
fromlist). That turns out to be harmless, because the second time
round it will no longer contain any defaults, but it's technically
incorrect, and certainly a waste of cycles.

So I think what the code needs to do is examine the targetlist, and
identify the VALUES RTE that the current query is using as a source,
and rewrite just that RTE (so any original VALUES RTE is rewritten at
the top level, and any VALUES RTEs from rule actions are rewritten
while recursing, and none are rewritten more than once), as in the
attached patch.

While at it, I noticed an XXX code comment questioning whether any of
this applies to MERGE. The answer is "no", because MERGE actions don't
allow multi-row inserts, so I think it's worth updating that comment
to make that clearer.

Regards,
Dean

Attachment

multi-row-values.patch

Re: Another multi-row VALUES bug

From

Tom Lane

Date:

23 November 2022, 15:30:45

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> In RewriteQuery(), the code assumes that in a multi-row INSERT query,
> the VALUES RTE will be the only thing in the query's fromlist. That's
> true for the original query, but it's not necessarily the case for
> product queries, if the rule action performs a multi-row insert,
> leading to a new VALUES RTE that the DEFAULT-handling code might fail
> to process. For example:

> CREATE TABLE foo(a int);
> INSERT INTO foo VALUES (1);

> CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text);
> CREATE RULE foo_r AS ON UPDATE TO foo
>   DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'),
>                                      (DEFAULT, new.a, 'new');

> UPDATE foo SET a = 2 WHERE a = 1;

> ERROR:  unrecognized node type: 43

Ugh.

> So I think what the code needs to do is examine the targetlist, and
> identify the VALUES RTE that the current query is using as a source,
> and rewrite just that RTE (so any original VALUES RTE is rewritten at
> the top level, and any VALUES RTEs from rule actions are rewritten
> while recursing, and none are rewritten more than once), as in the
> attached patch.

Hmm ... this patch does not feel any more principled or future-proof
than what it replaces, because now instead of making assumptions
about what's in the jointree, you're making assumptions about what's
in the targetlist.  I wonder if there is some other way to identify
the target VALUES RTE.

Looking at the parsetree in gdb, I see that in this example the
VALUES RTE is still the first entry in the fromlist, it's just not
the only one there.  So I wonder whether it'd be sufficient to do

-            if (list_length(parsetree->jointree->fromlist) == 1)
+            if (parsetree->jointree->fromlist != NIL)

I'm not 100% sure that product-query rewriting would always produce
a FROM-list in this order, but I think it might be true.

Another idea is to identify the VALUES RTE before we start rewriting,
and pass that information on.  That should be pretty bulletproof,
but of course more invasive.

Or ... maybe we should perform this particular step before we build
product queries?  Just because we stuck it into QueryRewrite
originally doesn't mean that's the right place.

            regards, tom lane

Re: Another multi-row VALUES bug

From

Dean Rasheed

Date:

23 November 2022, 18:43:58

On Wed, 23 Nov 2022 at 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> > So I think what the code needs to do is examine the targetlist, and
> > identify the VALUES RTE that the current query is using as a source,
> > and rewrite just that RTE (so any original VALUES RTE is rewritten at
> > the top level, and any VALUES RTEs from rule actions are rewritten
> > while recursing, and none are rewritten more than once), as in the
> > attached patch.
>
> Hmm ... this patch does not feel any more principled or future-proof
> than what it replaces, because now instead of making assumptions
> about what's in the jointree, you're making assumptions about what's
> in the targetlist.

True, but it's consistent with what rewriteValuesRTE() does -- it has
to examine the targetlist to work out how items in the VALUES lists
are mapped to attributes of the target relation.

> I wonder if there is some other way to identify
> the target VALUES RTE.
>
> Looking at the parsetree in gdb, I see that in this example the
> VALUES RTE is still the first entry in the fromlist, it's just not
> the only one there.  So I wonder whether it'd be sufficient to do
>
> -            if (list_length(parsetree->jointree->fromlist) == 1)
> +            if (parsetree->jointree->fromlist != NIL)
>
> I'm not 100% sure that product-query rewriting would always produce
> a FROM-list in this order, but I think it might be true.

No, the test case using rule r3 is a counter-example. In that case,
the product query has 2 VALUES RTEs, both of which appear in the
fromlist, and it's the second one that needs rewriting when it
recurses into the product query.

In fact, looking at what rewriteRuleAction() does, the relevant VALUES
RTE will be the last or last-but-one entry in the fromlist, depending
on whether the rule action refers to OLD. Relying on a particular
ordering of the fromlist seems quite fragile though.

> Another idea is to identify the VALUES RTE before we start rewriting,
> and pass that information on.  That should be pretty bulletproof,
> but of course more invasive.
>
> Or ... maybe we should perform this particular step before we build
> product queries?  Just because we stuck it into QueryRewrite
> originally doesn't mean that's the right place.

Hmm, I'm not quite sure how that would work. Possibly we could
identify the VALUES RTE while building the product query, but that
looks pretty messy.

Regards,
Dean

Re: Another multi-row VALUES bug

From

Tom Lane

Date:

23 November 2022, 18:56:53

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> On Wed, 23 Nov 2022 at 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Hmm ... this patch does not feel any more principled or future-proof
>> than what it replaces, because now instead of making assumptions
>> about what's in the jointree, you're making assumptions about what's
>> in the targetlist.

> True, but it's consistent with what rewriteValuesRTE() does -- it has
> to examine the targetlist to work out how items in the VALUES lists
> are mapped to attributes of the target relation.

That argument seems a little circular, because rewriteValuesRTE
is taking it on faith that it's told the correct RTE to modify.

>> I'm not 100% sure that product-query rewriting would always produce
>> a FROM-list in this order, but I think it might be true.

> No, the test case using rule r3 is a counter-example. In that case,
> the product query has 2 VALUES RTEs, both of which appear in the
> fromlist, and it's the second one that needs rewriting when it
> recurses into the product query.

Ah, right.  I wonder if somehow we could just make one pass over
all the VALUES RTEs, and process each one as needed?  The problem
is to identify the relevant target relation, I guess.

            regards, tom lane

Re: Another multi-row VALUES bug

From

Dean Rasheed

Date:

28 November 2022, 10:29:41

On Wed, 23 Nov 2022 at 18:56, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> I wonder if somehow we could just make one pass over
> all the VALUES RTEs, and process each one as needed?  The problem
> is to identify the relevant target relation, I guess.
>

I have been thinking about that some more, but I think it would be
pretty difficult to achieve.

Part of the problem is that the targetlist processing and VALUES RTE
processing are quite closely coupled (because of things like GENERATED
ALWAYS columns). Both rewriteTargetListIU() and rewriteValuesRTE()
rely on being passed the VALUES RTE that the targetlist is reading
from, and rewriteValuesRTE() then relies on extra information returned
by rewriteTargetListIU().

Also, there's the way that DEFAULTs from updatable views work, which
means that the DEFAULTs in a VALUES RTE won't necessarily all come
from the same target relation.

So I think it would be much harder to do the VALUES RTE processing
anywhere other than where it's being done right now, and even if it
could be done elsewhere, it would be a very invasive change, and
therefore hard to back-patch.

That, of course, leaves the problem of identifying the right VALUES
RTE to process.

A different way to do this, without relying on the contents of the
targetlist, is to note that, while processing a product query, what we
really want to do is ignore any VALUES RTEs from the original query,
since they will have already been processed. There should then never
be more than one VALUES RTE left to process -- the one from the rule
action.

This can be done by exploiting the fact that in product queries, the
rtable always consists of the rtable from the original query followed
by the rtable from the rule action, so we just need to ignore the
right number of RTEs at the start of the rtable. Of course that would
break if we ever changed the way rewriteRuleAction() worked, but at
least it only depends on that one other place in the code, which has
been stable for a long time, so the risk of future breakage seems
managable.

Regards,
Dean

Attachment

multi-row-values-v2.patch

Re: Another multi-row VALUES bug

From

Tom Lane

Date:

28 November 2022, 18:52:39

Dean Rasheed <dean.a.rasheed@gmail.com> writes:
> A different way to do this, without relying on the contents of the
> targetlist, is to note that, while processing a product query, what we
> really want to do is ignore any VALUES RTEs from the original query,
> since they will have already been processed. There should then never
> be more than one VALUES RTE left to process -- the one from the rule
> action.

> This can be done by exploiting the fact that in product queries, the
> rtable always consists of the rtable from the original query followed
> by the rtable from the rule action, so we just need to ignore the
> right number of RTEs at the start of the rtable. Of course that would
> break if we ever changed the way rewriteRuleAction() worked, but at
> least it only depends on that one other place in the code, which has
> been stable for a long time, so the risk of future breakage seems
> managable.

This looks like a good solution.  I didn't actually test the patch,
but it passes an eyeball check.  I like the fact that we can verify
that we find only one candidate VALUES RTE.

            regards, tom lane