Thread: Regress tests reveal *serious* psql bug

Regress tests reveal *serious* psql bug

From
Tom Lane
Date:
Now that we can run the regress tests using the new psql,
I was dismayed to discover that the arrays regress test
fails with it.  The perfectly valid query

SELECT arrtest.a[1:3],         arrtest.b[1:1][1:2][1:2],         arrtest.c[1:2],          arrtest.d[1:1][1:2]  FROM
arrtest;

fails with

ERROR:  parser: parse error at or near "]"

Turning on postmaster -d reveals that what is arriving at
the backend is

SELECT arrtest.a[1],         arrtest.b[1][1][1],         arrtest.c[1],          arrtest.d[1][1]]  FROM arrtest

Needless to say, this transformation of the query
is several miles to the south of acceptable.

Over to you, Peter...
        regards, tom lane


Re: Regress tests reveal *serious* psql bug

From
Tom Lane
Date:
I looked into the cause of the current failure of the array regress
test, enough to realize that there is both a garden-variety code bug
and a serious definitional problem there.  The issue is that the new
psql converts

SELECT arrtest.a[1:3],         arrtest.b[1:1][1:2][1:2],         arrtest.c[1:2],          arrtest.d[1:1][1:2]  FROM
arrtest;

into

SELECT arrtest.a[1],         arrtest.b[1][1][1],         arrtest.c[1],          arrtest.d[1][1]]  FROM arrtest

--- or at least it tries to do so; on one machine I have handy, psql
actually dumps core while running the array test.  (It looks like that
is because line mainloop.c:259 underestimates the amount of memory it
needs to malloc for the changed string, but I haven't worked through the
details.  I suspect the extra ']' is the result of an off-by-one kind of
bug in this same block of code.)

The reason *why* it is doing this is that it thinks that ":3" and so
forth are variables that it ought to substitute for, and since it has
no definition for them, it happily substitutes empty strings.

After fixing the outright bugs, we could make the array test work by
changing "[1:3]" to "[1\:3]" and so forth, but I think that that is the
wrong way to deal with it.  I believe that psql's variable feature needs
to be redefined, instead.

I certainly don't feel that the regress tests are graven on stone
tablets; but when an allegedly unrelated feature change breaks one,
I think we need to treat that as a danger signal.  If psql variables
break a regress test, they will likely break existing user applications
as well.

The case at hand is particularly nasty because if psql is allowed to
continue to behave this way, it will silently transform valid queries
into other valid queries with different results.  ("array[1:3]" and
"array[1]" don't mean the same thing.)  I don't think that's acceptable.

I suggest that psql's variable facility needs to be redefined so that
it is not possible to trigger it accidentally in scripts that don't
even know what psql variables are.

A minimum requirement is that psql should *not* substitute for :x unless
x is the name of a psql variable that the user has explicitly defined.

I would also suggest tightening up the allowed names of variables;
conventional practice is that variable names have to start with a
letter, and I think psql ought to follow that convention.  (That
wouldn't in itself stop the array-subscript problem, though, since
an array subscript could be a simple field reference.)

It might even be a good idea to require psql variables to contain only
upper-case letters, so that they'd be less likely to be confused with
SQL field names (which are usually lower case or at least mixed case)
--- but I'm not convinced about that one.
        regards, tom lane


Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Bruce Momjian
Date:
> The reason *why* it is doing this is that it thinks that ":3" and so
> forth are variables that it ought to substitute for, and since it has
> no definition for them, it happily substitutes empty strings.
> 
> After fixing the outright bugs, we could make the array test work by
> changing "[1:3]" to "[1\:3]" and so forth, but I think that that is the
> wrong way to deal with it.  I believe that psql's variable feature needs
> to be redefined, instead.

I know this is the only regression problem you found, and I am glad it
is isolated to psql, and a design problem that can be easily addressed. 
I think the requriement that all variables begin with a letter is a good
idea.

We recommended the : in the first place because it is the standard for
embedded SQL variable handling.

I am sure Peter can address this.

> I would also suggest tightening up the allowed names of variables;
> conventional practice is that variable names have to start with a
> letter, and I think psql ought to follow that convention.  (That
> wouldn't in itself stop the array-subscript problem, though, since
> an array subscript could be a simple field reference.)
> 


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Peter Eisentraut
Date:
On 2000-01-11, Bruce Momjian mentioned:

> I think the requriement that all variables begin with a letter is a good
> idea.

But it doesn't fix anything really. I left those open to be used for some
special purpose like addressing the fields of the last query of
something. Perhaps I should take them out for now though.

> We recommended the : in the first place because it is the standard for
> embedded SQL variable handling.

So the array syntax should be changed?

> 
> I am sure Peter can address this.

Addressed it will be ... ;)

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: Regress tests reveal *serious* psql bug

From
Peter Eisentraut
Date:
Wow that sucks. Shame on me.

On 2000-01-11, Tom Lane mentioned:

> psql converts
> 
> SELECT arrtest.a[1:3],
...
> into
...
> SELECT arrtest.a[1],

In some earlier developmental stage I had the variable delimiter runtime
definable since it became clear that the traditional $ wouldn't work.
Later on someone pointed out that the SQL syntax for this is the colon
deal. (And you were in that discussion, if I am not completely off.)  
Actually the colon deal only applies to embedded SQL, so ecpg should have
the same problem, but I'm not familiar with it, so I don't know how it
copes.

The fact is that (besides the garden-variety bugs) this is indeed a
problem of definition. I'm not sure if the following is valid by any
standard or even makes sense, but how do you interpret something like
this:

SELECT arrtest.biggest_value_pos, arrtest.a[1:biggest_value_pos] ... ;

There's no way you can disambiguate this unless you redefine everything.

> A minimum requirement is that psql should *not* substitute for :x unless
> x is the name of a psql variable that the user has explicitly defined.

Then psql becomes no better than csh, and that's certainly one of the
worse things one could do.


Putting blame on other people's shoulders I would suggest changing the
array syntax to arr[1][2] like everyone else has, but that would be short
sighted.

The best idea I have to get this working _now_ would be to once again make
the variable delimiter run-time configurable so that you can set it to
: or $ or # or whatever your query doesn't use, completely off by default

If I'm going to hack around in that code, one related question: what
should the deal be regarding variable interpolation into quoted
strings? Yes/No/Maybe?

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
>> We recommended the : in the first place because it is the standard for
>> embedded SQL variable handling.

> So the array syntax should be changed?

Bzzt, wrong answer.

I'm open to alternative answers on this, but breaking existing
application scripts is not one of the acceptable alternatives.

*Especially* not if there's no obvious failure report, as there
will not be if we don't change psql's behavior.
        regards, tom lane


Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Bruce Momjian
Date:
> Peter Eisentraut <peter_e@gmx.net> writes:
> >> We recommended the : in the first place because it is the standard for
> >> embedded SQL variable handling.
> 
> > So the array syntax should be changed?
> 
> Bzzt, wrong answer.
> 
> I'm open to alternative answers on this, but breaking existing
> application scripts is not one of the acceptable alternatives.
> 
> *Especially* not if there's no obvious failure report, as there
> will not be if we don't change psql's behavior.

I think we can live with requiring a variable name to start with an
alphabetic or underscore.
SELECT a[1:2]

is clear and
SELECT a[1:myvar]

expands to SELECT a[1].  I think we can live with this since having a
variable as an array element was never possible before 7.0.  We could
get fancy and not expand variables inside brackets.  Of course, quoted
strings have to be skipped.

In fact, I think it should be an error to reference a variable that is
not defined.  This will catch accidental references too.  If you
reference a variable that does not exist like :myvar, it passes the
literal :myvar to the backend.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Regress tests reveal *serious* psql bug

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> In some earlier developmental stage I had the variable delimiter runtime
> definable since it became clear that the traditional $ wouldn't work.
> Later on someone pointed out that the SQL syntax for this is the colon
> deal. (And you were in that discussion, if I am not completely off.)  

Yah.  I think at the time we were only thinking of colon as a user-
definable operator (well, it's also a predefined operator, but not a
very essential one).  I plead guilty to forgetfulness in not remembering
that it was also an essential component of the array grammar.  Still,
there it is.  I think this raises the bar to the point where we must
have a transparent backward-compatible approach to psql variables.
I was not all that thrilled about blowing off colon as an operator,
and blowing off array subscripts *too* is just too far above my
threshold of pain.

> Actually the colon deal only applies to embedded SQL, so ecpg should have
> the same problem, but I'm not familiar with it, so I don't know how it
> copes.

Good question.  Michael?

> The fact is that (besides the garden-variety bugs) this is indeed a
> problem of definition. I'm not sure if the following is valid by any
> standard or even makes sense, but how do you interpret something like
> this:
> SELECT arrtest.biggest_value_pos, arrtest.a[1:biggest_value_pos] ... ;
> There's no way you can disambiguate this unless you redefine everything.

Huh?  There was nothing in the least ambiguous about it, until you
redefined psql.  It's still not ambiguous in any other pgsql interface,
except possibly ecpg...

>> A minimum requirement is that psql should *not* substitute for :x unless
>> x is the name of a psql variable that the user has explicitly defined.

> Then psql becomes no better than csh, and that's certainly one of the
> worse things one could do.

csh isn't one of my favorite programming languages either, but failure
to detect undefined substitution variables is a pretty venial sin
compared to silently transforming *valid* queries into wrong queries.
The former will not bring villagers with pitchforks to your doorstep,
but the latter will.

Furthermore, if you are trying to help the substitution-variable
programmer detect his mistakes, then silently substituting (wrong)
empty values is not my idea of helpfulness.  You could maybe make a
defensible case for rejecting the whole query with ":x is not defined".
What you have chosen is the worst of all possible worlds, because it
breaks existing scripts that are ignorant of the new feature without
doing anything particularly helpful for people who *are* using the new
feature.

Finally, if the would-be user of psql variables misspells :foo as
:fop, I think he's much more likely to realize he's made a mistake if
psql passes :fop as-is to the backend rather than quietly discarding it.

> Putting blame on other people's shoulders I would suggest changing the
> array syntax to arr[1][2] like everyone else has,

Say what?  That's the syntax for a 2-D array, not the syntax for
an array slice.

> The best idea I have to get this working _now_ would be to once again make
> the variable delimiter run-time configurable so that you can set it to
> : or $ or # or whatever your query doesn't use, completely off by default

If it's off by default, that would eliminate the backwards-compatibility
problem --- but I still think the potential dual use of whatever
character you happen to pick would just be a gotcha waiting to bite
anyone who uses the feature.  We still ought to think about tightening
up the substitution rules so they are less likely to trap the unwary.

> If I'm going to hack around in that code, one related question: what
> should the deal be regarding variable interpolation into quoted
> strings? Yes/No/Maybe?

No bloody way, IMHO --- that increases the odds of unwanted
substitutions by many orders of magnitude.  If you want to allow
substitutions into quoted strings, there should be a very special
syntax for it, perhaps along the lines of    'a quoted ':foo' string'
which could translate to 'a quoted foobar string' if :foo expands
to foobar.
        regards, tom lane


Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Tom Lane
Date:
Bruce Momjian <pgman@candle.pha.pa.us> writes:
> I think we can live with requiring a variable name to start with an
> alphabetic or underscore.

>     SELECT a[1:2]

> is clear and

>     SELECT a[1:myvar]

> expands to SELECT a[1].

No go --- SELECT a[1:b] where b is a field name is a valid query
currently.

I don't really see exactly what the benefit is of assuming that
:foo should expand to nothing rather than :foo, in the absence
of an actual definition for :foo.  My feeling is that a safe solution
would be (a) psql doesn't do variables at all without an explicitly enabling     command line switch; and (b) if psql
sees:foo where foo is not defined, it spits out an     error message.
 
Accepting undeclared variables went out with Basic and Fortran;
why are we intent on re-inventing a concept that's so obviously
dangerous?

> In fact, I think it should be an error to reference a variable that is
> not defined.  This will catch accidental references too.  If you
> reference a variable that does not exist like :myvar, it passes the
> literal :myvar to the backend.

That's two different answers, not one... but I could live with either
one of them...
        regards, tom lane


Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Adriaan Joubert
Date:
Tom Lane wrote:

> Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > I think we can live with requiring a variable name to start with an
> > alphabetic or underscore.
>
> >       SELECT a[1:2]
>
> > is clear and
>
> >       SELECT a[1:myvar]
>
> > expands to SELECT a[1].
>
> No go --- SELECT a[1:b] where b is a field name is a valid query
> currently.

The colon in array syntax is quite a special case. It should be relatively
easy to figure out whether you are in a construct of the form
[<token>:<token>]. And then there should be no problem in figuring out that
in
[1:b] b refers to a column and in [1::b] ':b' is a variable. Do colons
apear anywhere else?

Btw, i agree that variables should start with a letter and by default
variables have to be declared.

Adriaan



Re: [HACKERS] Re: Regress tests reveal *serious* psql bug

From
Peter Eisentraut
Date:
On 2000-01-12, Tom Lane mentioned:

> Accepting undeclared variables went out with Basic and Fortran;
> why are we intent on re-inventing a concept that's so obviously
> dangerous?

They came back with Perl, Python, Tcl, what-have-you. But okay, I'm beaten
into submission.

-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden




Re: Regress tests reveal *serious* psql bug

From
Peter Eisentraut
Date:
On 2000-01-12, Tom Lane mentioned:

> I think this raises the bar to the point where we must
> have a transparent backward-compatible approach to psql variables.
> I was not all that thrilled about blowing off colon as an operator,
> and blowing off array subscripts *too* is just too far above my
> threshold of pain.

To clear something out here: I'm with you all the way. I didn't make up
that syntax, and I too was forgetful about the array issue. If y'all think
that a variable must be defined to be substituted and that that will fix
things to a reasonable state, thus it shall be. My concern was more that
this would only work around this particular problem, while being short
sighted. Just want to make sure we have a consensus.

> > If I'm going to hack around in that code, one related question: what
> > should the deal be regarding variable interpolation into quoted
> > strings? Yes/No/Maybe?
> 
> No bloody way, IMHO --- that increases the odds of unwanted

I'll take that as a No. ;)


-- 
Peter Eisentraut                  Sernanders väg 10:115
peter_e@gmx.net                   75262 Uppsala
http://yi.org/peter-e/            Sweden