Thread: BUG #9227: Error on SELECT ROW OVERLAPS ROW with single ROW argument
BUG #9227: Error on SELECT ROW OVERLAPS ROW with single ROW argument
From
pythonesque@gmail.com
Date:
The following bug has been logged on the website: Bug reference: 9227 Logged by: Joshua Yanovski Email address: pythonesque@gmail.com PostgreSQL version: 9.3.2 Operating system: Ubuntu 12.0.4 Description: # SELECT ROW (1) OVERLAPS ROW (2); ERROR: XX000: unrecognized node type: 656 LOCATION: transformExprRecurse, parse_expr.c:359 I'm not sure why this even parses, as it's not documented to do so. I do notice, however, that in makeOverlaps in gram.y creates a recursive list when largs or args has only one argument, which seems wrong to me.
This becomes clearer when I perform this in a PREPARE: # PREPARE foo AS SELECT ROW(1) OVERLAPS ROW(2); ERROR: 54001: stack depth limit exceeded HINT: Increase the configuration parameter "max_stack_depth" (currently 2048kB), after ensuring the platform's stack depth limit is adequate. LOCATION: check_stack_depth, postgres.c:3090 On Fri, Feb 14, 2014 at 7:55 PM, <pythonesque@gmail.com> wrote: > The following bug has been logged on the website: > > Bug reference: 9227 > Logged by: Joshua Yanovski > Email address: pythonesque@gmail.com > PostgreSQL version: 9.3.2 > Operating system: Ubuntu 12.0.4 > Description: > > # SELECT ROW (1) OVERLAPS ROW (2); > ERROR: XX000: unrecognized node type: 656 > LOCATION: transformExprRecurse, parse_expr.c:359 > > I'm not sure why this even parses, as it's not documented to do so. I do > notice, however, that in makeOverlaps in gram.y creates a recursive list > when largs or args has only one argument, which seems wrong to me. > > > -- Josh
pythonesque@gmail.com writes: > # SELECT ROW (1) OVERLAPS ROW (2); > ERROR: XX000: unrecognized node type: 656 > LOCATION: transformExprRecurse, parse_expr.c:359 > I'm not sure why this even parses, as it's not documented to do so. I do > notice, however, that in makeOverlaps in gram.y creates a recursive list > when largs or args has only one argument, which seems wrong to me. Huh, yeah: if (list_length(largs) == 1) largs = lappend(largs, largs); else if (list_length(largs) != 2) ereport(ERROR, (errcode(ERRCODE_SYNTAX_ERROR), errmsg("wrong number of parameters on left side of OVERLAPS expression"), parser_errposition(location))); (and similarly for rargs). What this is evidently trying to do is convert (foo) overlaps ... to (foo, foo) overlaps ... which is perhaps sensible on the grounds that a point in time can be considered as a zero-width interval. However, as you say, this behavior is undocumented, and what's more I can find no support for it in any version of the SQL standard. My guess is that this code might've originally worked as intended, but it got broken when Neil Conway rewrote our List implementation, which was a darn long time ago now. A quick browse in our git history shows that OVERLAPS support was introduced in commit 64568100787a5d03d036e70b32147385a35245e2 of 2000-03-14, and the self-append-the-same-list behavior was in that version though it looked rather different. Neil's work dates to 2004. [ thinks some more ... ] Actually, even back when Lists didn't have headers, it's not immediately clear to me that this code could've worked. In the current code we end up with a List that includes itself, which surely ain't gonna work. While it would be a reasonably simple change to make this work as originally intended, I'm strongly tempted to just rip it out instead, and only support the SQL-mandated syntax. If anyone was using this undocumented feature, you'd think they'd have complained sometime in the last ten years. And making it work would really entail adding documentation and a regression test case, so it'd be substantially more effort than just killing the "list_length == 1" case. regards, tom lane
On Mon, Feb 17, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > While it would be a reasonably simple change to make this work as > originally intended, I'm strongly tempted to just rip it out instead, > and only support the SQL-mandated syntax. If anyone was using this > undocumented feature, you'd think they'd have complained sometime in > the last ten years. And making it work would really entail adding > documentation and a regression test case, so it'd be substantially > more effort than just killing the "list_length == 1" case. > > regards, tom lane For what it's worth, that seems reasonable to me as well. I would never have found this if I weren't actually looking through gram.y anyway. -- Josh
On Mon, Feb 17, 2014 at 5:07 PM, Joshua Yanovski <pythonesque@gmail.com> wrote: > On Mon, Feb 17, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> While it would be a reasonably simple change to make this work as >> originally intended, I'm strongly tempted to just rip it out instead, >> and only support the SQL-mandated syntax. If anyone was using this >> undocumented feature, you'd think they'd have complained sometime in >> the last ten years. And making it work would really entail adding >> documentation and a regression test case, so it'd be substantially >> more effort than just killing the "list_length == 1" case. >> >> regards, tom lane > > For what it's worth, that seems reasonable to me as well. I would > never have found this if I weren't actually looking through gram.y > anyway. > > -- > Josh Attaching a patch to just remove the n == 1 case. -- Josh
Attachment
Sorry, the MIME type seems to have gotten screwed up, hopefully this will work better. On Mon, Feb 17, 2014 at 9:33 PM, Joshua Yanovski <pythonesque@gmail.com> wrote: > On Mon, Feb 17, 2014 at 5:07 PM, Joshua Yanovski <pythonesque@gmail.com> wrote: >> On Mon, Feb 17, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> While it would be a reasonably simple change to make this work as >>> originally intended, I'm strongly tempted to just rip it out instead, >>> and only support the SQL-mandated syntax. If anyone was using this >>> undocumented feature, you'd think they'd have complained sometime in >>> the last ten years. And making it work would really entail adding >>> documentation and a regression test case, so it'd be substantially >>> more effort than just killing the "list_length == 1" case. >>> >>> regards, tom lane >> >> For what it's worth, that seems reasonable to me as well. I would >> never have found this if I weren't actually looking through gram.y >> anyway. >> >> -- >> Josh > > Attaching a patch to just remove the n == 1 case. > > -- > Josh -- Josh
Attachment
Joshua Yanovski <pythonesque@gmail.com> writes: > On Mon, Feb 17, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> While it would be a reasonably simple change to make this work as >> originally intended, I'm strongly tempted to just rip it out instead, >> and only support the SQL-mandated syntax. If anyone was using this >> undocumented feature, you'd think they'd have complained sometime in >> the last ten years. And making it work would really entail adding >> documentation and a regression test case, so it'd be substantially >> more effort than just killing the "list_length == 1" case. > Attaching a patch to just remove the n == 1 case. It occurred to me that there's an additional reason not to make the code work like Lockhart seems to have envisioned: if it did duplicate the argument like that, that'd lead to double evaluation of any volatile function that's present in the argument. So I've gone ahead and committed this, along with some further fooling around to make the error messages nicer. regards, tom lane
Great, thanks. Yeah, I was thinking about that too--I am not sure if there are any other examples of a time where Postgres deliberately duplicates an argument like that (maybe there could be a check for it to be a constexpr or something? But that information isn't available at this point in the analysis process). On Tue, Feb 18, 2014 at 9:49 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Joshua Yanovski <pythonesque@gmail.com> writes: >> On Mon, Feb 17, 2014 at 4:33 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote: >>> While it would be a reasonably simple change to make this work as >>> originally intended, I'm strongly tempted to just rip it out instead, >>> and only support the SQL-mandated syntax. If anyone was using this >>> undocumented feature, you'd think they'd have complained sometime in >>> the last ten years. And making it work would really entail adding >>> documentation and a regression test case, so it'd be substantially >>> more effort than just killing the "list_length == 1" case. > >> Attaching a patch to just remove the n == 1 case. > > It occurred to me that there's an additional reason not to make the > code work like Lockhart seems to have envisioned: if it did duplicate > the argument like that, that'd lead to double evaluation of any volatile > function that's present in the argument. So I've gone ahead and committed > this, along with some further fooling around to make the error messages > nicer. > > regards, tom lane -- Josh
Joshua Yanovski <pythonesque@gmail.com> writes: > Great, thanks. Yeah, I was thinking about that too--I am not sure if > there are any other examples of a time where Postgres deliberately > duplicates an argument like that (maybe there could be a check for it > to be a constexpr or something? But that information isn't available > at this point in the analysis process). Yeah, BETWEEN is like that. I'd like to fix it sometime, but it's kind of problematic because of the risk of losing index optimizations (which only understand x >= y and x <= z, not a hypothetical combined node). I think there are no other examples (CASE used to be one, but we fixed that long ago), but haven't trawled the grammar to be sure. regards, tom lane
I wrote: > Joshua Yanovski <pythonesque@gmail.com> writes: >> Great, thanks. Yeah, I was thinking about that too--I am not sure if >> there are any other examples of a time where Postgres deliberately >> duplicates an argument like that (maybe there could be a check for it >> to be a constexpr or something? But that information isn't available >> at this point in the analysis process). > Yeah, BETWEEN is like that. I'd like to fix it sometime, but it's > kind of problematic because of the risk of losing index optimizations > (which only understand x >= y and x <= z, not a hypothetical combined > node). Actually, it suddenly strikes me that there's a pretty simple answer to that. Have the parser generate a node representing BETWEEN, with three arguments. In the planner, *if* the first argument is non-volatile, replace the BETWEEN with "x >= y AND x <= z"; otherwise, leave it alone, and execute it as-is. This transformation is semantically correct and will still expose index-optimizable comparisons in all cases of interest (since a volatile expression isn't indexable). Moreover we get rid of the double evaluation risk for volatile first arguments, as well as the incredible inefficiency of the BETWEEN SYMMETRIC cases. There are some other issues still to be thought about, since the parser is currently willing to cast "x" differently in the two comparisons --- but frankly I think any case where that matters is probably erroneous SQL code in the first place. (See the thread referenced in the comment in the grammar for more info.) regards, tom lane
That sounds like it should work. I was also wondering whether it might be possible to do this more generally with some kind of "let" internal node: (let var = x in (var >= y and var <= z) I don't know enough about the planner or the SQL standard to know whether this would work, but it does seem a little nicer than special-casing BETWEEN to me. Is the issue there that the extra layer of indirection would cause problems or create artificial optimization boundaries? On Tue, Feb 18, 2014 at 10:39 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wrote: >> Joshua Yanovski <pythonesque@gmail.com> writes: >>> Great, thanks. Yeah, I was thinking about that too--I am not sure if >>> there are any other examples of a time where Postgres deliberately >>> duplicates an argument like that (maybe there could be a check for it >>> to be a constexpr or something? But that information isn't available >>> at this point in the analysis process). > >> Yeah, BETWEEN is like that. I'd like to fix it sometime, but it's >> kind of problematic because of the risk of losing index optimizations >> (which only understand x >= y and x <= z, not a hypothetical combined >> node). > > Actually, it suddenly strikes me that there's a pretty simple answer to > that. Have the parser generate a node representing BETWEEN, with three > arguments. In the planner, *if* the first argument is non-volatile, > replace the BETWEEN with "x >= y AND x <= z"; otherwise, leave it alone, > and execute it as-is. This transformation is semantically correct and > will still expose index-optimizable comparisons in all cases of interest > (since a volatile expression isn't indexable). Moreover we get rid of the > double evaluation risk for volatile first arguments, as well as the > incredible inefficiency of the BETWEEN SYMMETRIC cases. > > There are some other issues still to be thought about, since the > parser is currently willing to cast "x" differently in the two comparisons > --- but frankly I think any case where that matters is probably erroneous > SQL code in the first place. (See the thread referenced in the comment in > the grammar for more info.) > > regards, tom lane -- Josh
Joshua Yanovski <pythonesque@gmail.com> writes: > That sounds like it should work. I was also wondering whether it > might be possible to do this more generally with some kind of "let" > internal node: > (let var = x in (var >= y and var <= z) > I don't know enough about the planner or the SQL standard to know > whether this would work, but it does seem a little nicer than > special-casing BETWEEN to me. Hmm ... interesting idea, but I don't think it answers any of the fundamental issues about comparison semantics (ie, how much do we care if "var" is promoted differently in the two clauses). Also, as a grammar translation target for BETWEEN it would be a pretty bad choice, because now views that had been defined in a perfectly SQL-compliant manner would print out with very-PG-specific syntax. The existing translation is nasty because of the double eval issue, but at least what we print is standard SQL. regards, tom lane
> Hmm ... interesting idea, but I don't think it answers any of the > fundamental issues about comparison semantics (ie, how much do we > care if "var" is promoted differently in the two clauses). My first thought was that it essentially would work as you describe above--if the expression is volatile, evaluate the argument first and then evaluate the expression, otherwise replace the parameter within the subexpression. But then I realized that you didn't actually say that is what the BETWEEN node would be doing :) I suppose any extra logic would have to be present in some translation function to deal with the let node, so effectively it would be special-cased anyway. > > Also, as a grammar translation target for BETWEEN it would be a pretty > bad choice, because now views that had been defined in a perfectly > SQL-compliant manner would print out with very-PG-specific syntax. > The existing translation is nasty because of the double eval issue, > but at least what we print is standard SQL. > > regards, tom lane That's a good point, I suppose I'll just have to wait for this feature to be added to the SQL standard :) -- Josh