Thread: Regress tests reveal *serious* psql bug
Now that we can run the regress tests using the new psql, I was dismayed to discover that the arrays regress test fails with it. The perfectly valid query SELECT arrtest.a[1:3], arrtest.b[1:1][1:2][1:2], arrtest.c[1:2], arrtest.d[1:1][1:2] FROM arrtest; fails with ERROR: parser: parse error at or near "]" Turning on postmaster -d reveals that what is arriving at the backend is SELECT arrtest.a[1], arrtest.b[1][1][1], arrtest.c[1], arrtest.d[1][1]] FROM arrtest Needless to say, this transformation of the query is several miles to the south of acceptable. Over to you, Peter... regards, tom lane
I looked into the cause of the current failure of the array regress test, enough to realize that there is both a garden-variety code bug and a serious definitional problem there. The issue is that the new psql converts SELECT arrtest.a[1:3], arrtest.b[1:1][1:2][1:2], arrtest.c[1:2], arrtest.d[1:1][1:2] FROM arrtest; into SELECT arrtest.a[1], arrtest.b[1][1][1], arrtest.c[1], arrtest.d[1][1]] FROM arrtest --- or at least it tries to do so; on one machine I have handy, psql actually dumps core while running the array test. (It looks like that is because line mainloop.c:259 underestimates the amount of memory it needs to malloc for the changed string, but I haven't worked through the details. I suspect the extra ']' is the result of an off-by-one kind of bug in this same block of code.) The reason *why* it is doing this is that it thinks that ":3" and so forth are variables that it ought to substitute for, and since it has no definition for them, it happily substitutes empty strings. After fixing the outright bugs, we could make the array test work by changing "[1:3]" to "[1\:3]" and so forth, but I think that that is the wrong way to deal with it. I believe that psql's variable feature needs to be redefined, instead. I certainly don't feel that the regress tests are graven on stone tablets; but when an allegedly unrelated feature change breaks one, I think we need to treat that as a danger signal. If psql variables break a regress test, they will likely break existing user applications as well. The case at hand is particularly nasty because if psql is allowed to continue to behave this way, it will silently transform valid queries into other valid queries with different results. ("array[1:3]" and "array[1]" don't mean the same thing.) I don't think that's acceptable. I suggest that psql's variable facility needs to be redefined so that it is not possible to trigger it accidentally in scripts that don't even know what psql variables are. A minimum requirement is that psql should *not* substitute for :x unless x is the name of a psql variable that the user has explicitly defined. I would also suggest tightening up the allowed names of variables; conventional practice is that variable names have to start with a letter, and I think psql ought to follow that convention. (That wouldn't in itself stop the array-subscript problem, though, since an array subscript could be a simple field reference.) It might even be a good idea to require psql variables to contain only upper-case letters, so that they'd be less likely to be confused with SQL field names (which are usually lower case or at least mixed case) --- but I'm not convinced about that one. regards, tom lane
> The reason *why* it is doing this is that it thinks that ":3" and so > forth are variables that it ought to substitute for, and since it has > no definition for them, it happily substitutes empty strings. > > After fixing the outright bugs, we could make the array test work by > changing "[1:3]" to "[1\:3]" and so forth, but I think that that is the > wrong way to deal with it. I believe that psql's variable feature needs > to be redefined, instead. I know this is the only regression problem you found, and I am glad it is isolated to psql, and a design problem that can be easily addressed. I think the requriement that all variables begin with a letter is a good idea. We recommended the : in the first place because it is the standard for embedded SQL variable handling. I am sure Peter can address this. > I would also suggest tightening up the allowed names of variables; > conventional practice is that variable names have to start with a > letter, and I think psql ought to follow that convention. (That > wouldn't in itself stop the array-subscript problem, though, since > an array subscript could be a simple field reference.) > -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
On 2000-01-11, Bruce Momjian mentioned: > I think the requriement that all variables begin with a letter is a good > idea. But it doesn't fix anything really. I left those open to be used for some special purpose like addressing the fields of the last query of something. Perhaps I should take them out for now though. > We recommended the : in the first place because it is the standard for > embedded SQL variable handling. So the array syntax should be changed? > > I am sure Peter can address this. Addressed it will be ... ;) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Wow that sucks. Shame on me. On 2000-01-11, Tom Lane mentioned: > psql converts > > SELECT arrtest.a[1:3], ... > into ... > SELECT arrtest.a[1], In some earlier developmental stage I had the variable delimiter runtime definable since it became clear that the traditional $ wouldn't work. Later on someone pointed out that the SQL syntax for this is the colon deal. (And you were in that discussion, if I am not completely off.) Actually the colon deal only applies to embedded SQL, so ecpg should have the same problem, but I'm not familiar with it, so I don't know how it copes. The fact is that (besides the garden-variety bugs) this is indeed a problem of definition. I'm not sure if the following is valid by any standard or even makes sense, but how do you interpret something like this: SELECT arrtest.biggest_value_pos, arrtest.a[1:biggest_value_pos] ... ; There's no way you can disambiguate this unless you redefine everything. > A minimum requirement is that psql should *not* substitute for :x unless > x is the name of a psql variable that the user has explicitly defined. Then psql becomes no better than csh, and that's certainly one of the worse things one could do. Putting blame on other people's shoulders I would suggest changing the array syntax to arr[1][2] like everyone else has, but that would be short sighted. The best idea I have to get this working _now_ would be to once again make the variable delimiter run-time configurable so that you can set it to : or $ or # or whatever your query doesn't use, completely off by default If I'm going to hack around in that code, one related question: what should the deal be regarding variable interpolation into quoted strings? Yes/No/Maybe? -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
Peter Eisentraut <peter_e@gmx.net> writes: >> We recommended the : in the first place because it is the standard for >> embedded SQL variable handling. > So the array syntax should be changed? Bzzt, wrong answer. I'm open to alternative answers on this, but breaking existing application scripts is not one of the acceptable alternatives. *Especially* not if there's no obvious failure report, as there will not be if we don't change psql's behavior. regards, tom lane
> Peter Eisentraut <peter_e@gmx.net> writes: > >> We recommended the : in the first place because it is the standard for > >> embedded SQL variable handling. > > > So the array syntax should be changed? > > Bzzt, wrong answer. > > I'm open to alternative answers on this, but breaking existing > application scripts is not one of the acceptable alternatives. > > *Especially* not if there's no obvious failure report, as there > will not be if we don't change psql's behavior. I think we can live with requiring a variable name to start with an alphabetic or underscore. SELECT a[1:2] is clear and SELECT a[1:myvar] expands to SELECT a[1]. I think we can live with this since having a variable as an array element was never possible before 7.0. We could get fancy and not expand variables inside brackets. Of course, quoted strings have to be skipped. In fact, I think it should be an error to reference a variable that is not defined. This will catch accidental references too. If you reference a variable that does not exist like :myvar, it passes the literal :myvar to the backend. -- Bruce Momjian | http://www.op.net/~candle maillist@candle.pha.pa.us | (610) 853-3000+ If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania19026
Peter Eisentraut <peter_e@gmx.net> writes: > In some earlier developmental stage I had the variable delimiter runtime > definable since it became clear that the traditional $ wouldn't work. > Later on someone pointed out that the SQL syntax for this is the colon > deal. (And you were in that discussion, if I am not completely off.) Yah. I think at the time we were only thinking of colon as a user- definable operator (well, it's also a predefined operator, but not a very essential one). I plead guilty to forgetfulness in not remembering that it was also an essential component of the array grammar. Still, there it is. I think this raises the bar to the point where we must have a transparent backward-compatible approach to psql variables. I was not all that thrilled about blowing off colon as an operator, and blowing off array subscripts *too* is just too far above my threshold of pain. > Actually the colon deal only applies to embedded SQL, so ecpg should have > the same problem, but I'm not familiar with it, so I don't know how it > copes. Good question. Michael? > The fact is that (besides the garden-variety bugs) this is indeed a > problem of definition. I'm not sure if the following is valid by any > standard or even makes sense, but how do you interpret something like > this: > SELECT arrtest.biggest_value_pos, arrtest.a[1:biggest_value_pos] ... ; > There's no way you can disambiguate this unless you redefine everything. Huh? There was nothing in the least ambiguous about it, until you redefined psql. It's still not ambiguous in any other pgsql interface, except possibly ecpg... >> A minimum requirement is that psql should *not* substitute for :x unless >> x is the name of a psql variable that the user has explicitly defined. > Then psql becomes no better than csh, and that's certainly one of the > worse things one could do. csh isn't one of my favorite programming languages either, but failure to detect undefined substitution variables is a pretty venial sin compared to silently transforming *valid* queries into wrong queries. The former will not bring villagers with pitchforks to your doorstep, but the latter will. Furthermore, if you are trying to help the substitution-variable programmer detect his mistakes, then silently substituting (wrong) empty values is not my idea of helpfulness. You could maybe make a defensible case for rejecting the whole query with ":x is not defined". What you have chosen is the worst of all possible worlds, because it breaks existing scripts that are ignorant of the new feature without doing anything particularly helpful for people who *are* using the new feature. Finally, if the would-be user of psql variables misspells :foo as :fop, I think he's much more likely to realize he's made a mistake if psql passes :fop as-is to the backend rather than quietly discarding it. > Putting blame on other people's shoulders I would suggest changing the > array syntax to arr[1][2] like everyone else has, Say what? That's the syntax for a 2-D array, not the syntax for an array slice. > The best idea I have to get this working _now_ would be to once again make > the variable delimiter run-time configurable so that you can set it to > : or $ or # or whatever your query doesn't use, completely off by default If it's off by default, that would eliminate the backwards-compatibility problem --- but I still think the potential dual use of whatever character you happen to pick would just be a gotcha waiting to bite anyone who uses the feature. We still ought to think about tightening up the substitution rules so they are less likely to trap the unwary. > If I'm going to hack around in that code, one related question: what > should the deal be regarding variable interpolation into quoted > strings? Yes/No/Maybe? No bloody way, IMHO --- that increases the odds of unwanted substitutions by many orders of magnitude. If you want to allow substitutions into quoted strings, there should be a very special syntax for it, perhaps along the lines of 'a quoted ':foo' string' which could translate to 'a quoted foobar string' if :foo expands to foobar. regards, tom lane
Bruce Momjian <pgman@candle.pha.pa.us> writes: > I think we can live with requiring a variable name to start with an > alphabetic or underscore. > SELECT a[1:2] > is clear and > SELECT a[1:myvar] > expands to SELECT a[1]. No go --- SELECT a[1:b] where b is a field name is a valid query currently. I don't really see exactly what the benefit is of assuming that :foo should expand to nothing rather than :foo, in the absence of an actual definition for :foo. My feeling is that a safe solution would be (a) psql doesn't do variables at all without an explicitly enabling command line switch; and (b) if psql sees:foo where foo is not defined, it spits out an error message. Accepting undeclared variables went out with Basic and Fortran; why are we intent on re-inventing a concept that's so obviously dangerous? > In fact, I think it should be an error to reference a variable that is > not defined. This will catch accidental references too. If you > reference a variable that does not exist like :myvar, it passes the > literal :myvar to the backend. That's two different answers, not one... but I could live with either one of them... regards, tom lane
Tom Lane wrote: > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > I think we can live with requiring a variable name to start with an > > alphabetic or underscore. > > > SELECT a[1:2] > > > is clear and > > > SELECT a[1:myvar] > > > expands to SELECT a[1]. > > No go --- SELECT a[1:b] where b is a field name is a valid query > currently. The colon in array syntax is quite a special case. It should be relatively easy to figure out whether you are in a construct of the form [<token>:<token>]. And then there should be no problem in figuring out that in [1:b] b refers to a column and in [1::b] ':b' is a variable. Do colons apear anywhere else? Btw, i agree that variables should start with a letter and by default variables have to be declared. Adriaan
On 2000-01-12, Tom Lane mentioned: > Accepting undeclared variables went out with Basic and Fortran; > why are we intent on re-inventing a concept that's so obviously > dangerous? They came back with Perl, Python, Tcl, what-have-you. But okay, I'm beaten into submission. -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden
On 2000-01-12, Tom Lane mentioned: > I think this raises the bar to the point where we must > have a transparent backward-compatible approach to psql variables. > I was not all that thrilled about blowing off colon as an operator, > and blowing off array subscripts *too* is just too far above my > threshold of pain. To clear something out here: I'm with you all the way. I didn't make up that syntax, and I too was forgetful about the array issue. If y'all think that a variable must be defined to be substituted and that that will fix things to a reasonable state, thus it shall be. My concern was more that this would only work around this particular problem, while being short sighted. Just want to make sure we have a consensus. > > If I'm going to hack around in that code, one related question: what > > should the deal be regarding variable interpolation into quoted > > strings? Yes/No/Maybe? > > No bloody way, IMHO --- that increases the odds of unwanted I'll take that as a No. ;) -- Peter Eisentraut Sernanders väg 10:115 peter_e@gmx.net 75262 Uppsala http://yi.org/peter-e/ Sweden