Thread: Another multi-row VALUES bug
I was thinking some more about the recent fix to multi-row VALUES handling in the rewriter (b8f2687fdc), and I realised that there is another bug in the way DEFAULT values are handled: In RewriteQuery(), the code assumes that in a multi-row INSERT query, the VALUES RTE will be the only thing in the query's fromlist. That's true for the original query, but it's not necessarily the case for product queries, if the rule action performs a multi-row insert, leading to a new VALUES RTE that the DEFAULT-handling code might fail to process. For example: CREATE TABLE foo(a int); INSERT INTO foo VALUES (1); CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text); CREATE RULE foo_r AS ON UPDATE TO foo DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'), (DEFAULT, new.a, 'new'); UPDATE foo SET a = 2 WHERE a = 1; ERROR: unrecognized node type: 43 There's a similar example to this in the regression tests, but it doesn't test DEFAULT-handling. It's also possible for the current code to cause the same VALUES RTE to be rewritten multiple times, when recursing into product queries (if the rule action doesn't add any more stuff to the query's fromlist). That turns out to be harmless, because the second time round it will no longer contain any defaults, but it's technically incorrect, and certainly a waste of cycles. So I think what the code needs to do is examine the targetlist, and identify the VALUES RTE that the current query is using as a source, and rewrite just that RTE (so any original VALUES RTE is rewritten at the top level, and any VALUES RTEs from rule actions are rewritten while recursing, and none are rewritten more than once), as in the attached patch. While at it, I noticed an XXX code comment questioning whether any of this applies to MERGE. The answer is "no", because MERGE actions don't allow multi-row inserts, so I think it's worth updating that comment to make that clearer. Regards, Dean
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > In RewriteQuery(), the code assumes that in a multi-row INSERT query, > the VALUES RTE will be the only thing in the query's fromlist. That's > true for the original query, but it's not necessarily the case for > product queries, if the rule action performs a multi-row insert, > leading to a new VALUES RTE that the DEFAULT-handling code might fail > to process. For example: > CREATE TABLE foo(a int); > INSERT INTO foo VALUES (1); > CREATE TABLE foo_log(t timestamptz DEFAULT now(), a int, c text); > CREATE RULE foo_r AS ON UPDATE TO foo > DO ALSO INSERT INTO foo_log VALUES (DEFAULT, old.a, 'old'), > (DEFAULT, new.a, 'new'); > UPDATE foo SET a = 2 WHERE a = 1; > ERROR: unrecognized node type: 43 Ugh. > So I think what the code needs to do is examine the targetlist, and > identify the VALUES RTE that the current query is using as a source, > and rewrite just that RTE (so any original VALUES RTE is rewritten at > the top level, and any VALUES RTEs from rule actions are rewritten > while recursing, and none are rewritten more than once), as in the > attached patch. Hmm ... this patch does not feel any more principled or future-proof than what it replaces, because now instead of making assumptions about what's in the jointree, you're making assumptions about what's in the targetlist. I wonder if there is some other way to identify the target VALUES RTE. Looking at the parsetree in gdb, I see that in this example the VALUES RTE is still the first entry in the fromlist, it's just not the only one there. So I wonder whether it'd be sufficient to do - if (list_length(parsetree->jointree->fromlist) == 1) + if (parsetree->jointree->fromlist != NIL) I'm not 100% sure that product-query rewriting would always produce a FROM-list in this order, but I think it might be true. Another idea is to identify the VALUES RTE before we start rewriting, and pass that information on. That should be pretty bulletproof, but of course more invasive. Or ... maybe we should perform this particular step before we build product queries? Just because we stuck it into QueryRewrite originally doesn't mean that's the right place. regards, tom lane
On Wed, 23 Nov 2022 at 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > > So I think what the code needs to do is examine the targetlist, and > > identify the VALUES RTE that the current query is using as a source, > > and rewrite just that RTE (so any original VALUES RTE is rewritten at > > the top level, and any VALUES RTEs from rule actions are rewritten > > while recursing, and none are rewritten more than once), as in the > > attached patch. > > Hmm ... this patch does not feel any more principled or future-proof > than what it replaces, because now instead of making assumptions > about what's in the jointree, you're making assumptions about what's > in the targetlist. True, but it's consistent with what rewriteValuesRTE() does -- it has to examine the targetlist to work out how items in the VALUES lists are mapped to attributes of the target relation. > I wonder if there is some other way to identify > the target VALUES RTE. > > Looking at the parsetree in gdb, I see that in this example the > VALUES RTE is still the first entry in the fromlist, it's just not > the only one there. So I wonder whether it'd be sufficient to do > > - if (list_length(parsetree->jointree->fromlist) == 1) > + if (parsetree->jointree->fromlist != NIL) > > I'm not 100% sure that product-query rewriting would always produce > a FROM-list in this order, but I think it might be true. No, the test case using rule r3 is a counter-example. In that case, the product query has 2 VALUES RTEs, both of which appear in the fromlist, and it's the second one that needs rewriting when it recurses into the product query. In fact, looking at what rewriteRuleAction() does, the relevant VALUES RTE will be the last or last-but-one entry in the fromlist, depending on whether the rule action refers to OLD. Relying on a particular ordering of the fromlist seems quite fragile though. > Another idea is to identify the VALUES RTE before we start rewriting, > and pass that information on. That should be pretty bulletproof, > but of course more invasive. > > Or ... maybe we should perform this particular step before we build > product queries? Just because we stuck it into QueryRewrite > originally doesn't mean that's the right place. Hmm, I'm not quite sure how that would work. Possibly we could identify the VALUES RTE while building the product query, but that looks pretty messy. Regards, Dean
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > On Wed, 23 Nov 2022 at 15:30, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Hmm ... this patch does not feel any more principled or future-proof >> than what it replaces, because now instead of making assumptions >> about what's in the jointree, you're making assumptions about what's >> in the targetlist. > True, but it's consistent with what rewriteValuesRTE() does -- it has > to examine the targetlist to work out how items in the VALUES lists > are mapped to attributes of the target relation. That argument seems a little circular, because rewriteValuesRTE is taking it on faith that it's told the correct RTE to modify. >> I'm not 100% sure that product-query rewriting would always produce >> a FROM-list in this order, but I think it might be true. > No, the test case using rule r3 is a counter-example. In that case, > the product query has 2 VALUES RTEs, both of which appear in the > fromlist, and it's the second one that needs rewriting when it > recurses into the product query. Ah, right. I wonder if somehow we could just make one pass over all the VALUES RTEs, and process each one as needed? The problem is to identify the relevant target relation, I guess. regards, tom lane
On Wed, 23 Nov 2022 at 18:56, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > I wonder if somehow we could just make one pass over > all the VALUES RTEs, and process each one as needed? The problem > is to identify the relevant target relation, I guess. > I have been thinking about that some more, but I think it would be pretty difficult to achieve. Part of the problem is that the targetlist processing and VALUES RTE processing are quite closely coupled (because of things like GENERATED ALWAYS columns). Both rewriteTargetListIU() and rewriteValuesRTE() rely on being passed the VALUES RTE that the targetlist is reading from, and rewriteValuesRTE() then relies on extra information returned by rewriteTargetListIU(). Also, there's the way that DEFAULTs from updatable views work, which means that the DEFAULTs in a VALUES RTE won't necessarily all come from the same target relation. So I think it would be much harder to do the VALUES RTE processing anywhere other than where it's being done right now, and even if it could be done elsewhere, it would be a very invasive change, and therefore hard to back-patch. That, of course, leaves the problem of identifying the right VALUES RTE to process. A different way to do this, without relying on the contents of the targetlist, is to note that, while processing a product query, what we really want to do is ignore any VALUES RTEs from the original query, since they will have already been processed. There should then never be more than one VALUES RTE left to process -- the one from the rule action. This can be done by exploiting the fact that in product queries, the rtable always consists of the rtable from the original query followed by the rtable from the rule action, so we just need to ignore the right number of RTEs at the start of the rtable. Of course that would break if we ever changed the way rewriteRuleAction() worked, but at least it only depends on that one other place in the code, which has been stable for a long time, so the risk of future breakage seems managable. Regards, Dean
Attachment
Dean Rasheed <dean.a.rasheed@gmail.com> writes: > A different way to do this, without relying on the contents of the > targetlist, is to note that, while processing a product query, what we > really want to do is ignore any VALUES RTEs from the original query, > since they will have already been processed. There should then never > be more than one VALUES RTE left to process -- the one from the rule > action. > This can be done by exploiting the fact that in product queries, the > rtable always consists of the rtable from the original query followed > by the rtable from the rule action, so we just need to ignore the > right number of RTEs at the start of the rtable. Of course that would > break if we ever changed the way rewriteRuleAction() worked, but at > least it only depends on that one other place in the code, which has > been stable for a long time, so the risk of future breakage seems > managable. This looks like a good solution. I didn't actually test the patch, but it passes an eyeball check. I like the fact that we can verify that we find only one candidate VALUES RTE. regards, tom lane