Thread: Re: [GENERAL] Count(*) throws error

Re: [GENERAL] Count(*) throws error

From
"Simon Riggs"
Date:
On Wed, 2007-07-11 at 17:42 -0400, Tom Lane wrote:
> "Jasbinder Singh Bali" <jsbali@gmail.com> writes:
> > I'm using the following statement in my plpgsql function
> 
> > SELECT INTO no_rows COUNT(*) FROM tbl_concurrent;
> 
> > I have decalred no_rows int4 and initialized it to zero
> 
> > Running the function throws the following error:
> 
> > ERROR:  syntax error at or near "(" at character 13
> > QUERY:  SELECT   $1 (*) FROM tbl_concurrent
> 
> I'll bet a nickel you have a local variable named "count" in that
> function, and plpgsql is blindly trying to substitute its value into
> the SQL query.  The replacement of "COUNT" by " $1 " in the query
> text is the tip-off.

I came across a boat load of these the other day. Seems fairly naff that
we substitute variables blindly.

Seems like we could be slightly more friendly without too much bother:
at least only substitute after the VALUES clause in INSERT. We really
shouldn't substitute "var = var" to "$n = $n" either; am I right in
thinking the latter would happen silently and cause potential error?

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com



Re: [GENERAL] Count(*) throws error

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Seems like we could be slightly more friendly without too much bother:
> at least only substitute after the VALUES clause in INSERT.

Surely you jest.
        regards, tom lane


Re: [GENERAL] Count(*) throws error

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> Seems like we could be slightly more friendly without too much bother:

Actually, rather than get into that sort of AI-complete project,
it strikes me that the most useful response is user education.  There
ought to be a section in the plpgsql documentation explaining exactly
how the substitution works and showing a couple examples of the pitfalls.
I'll try to draft something up.
        regards, tom lane


Re: [GENERAL] Count(*) throws error

From
"Simon Riggs"
Date:
On Wed, 2007-07-11 at 18:13 -0400, Tom Lane wrote:
> "Simon Riggs" <simon@2ndquadrant.com> writes:
> > Seems like we could be slightly more friendly without too much bother:
> > at least only substitute after the VALUES clause in INSERT.
> 
> Surely you jest.

No. There are a places where parameters clearly aren't allowed, so
making the substitutions in those places can easily be prevented. The
remainder of the problem is as hard as you think, but getting half way
there seems very easy.

--  Simon Riggs EnterpriseDB  http://www.enterprisedb.com



Re: [GENERAL] Count(*) throws error

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> Seems like we could be slightly more friendly without too much bother:
>
> Actually, rather than get into that sort of AI-complete project,

Is it really AI-complete? ISTM the *only* place where a parameter is allowed
is where the parser inserts a ColumnRef node? 

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: [GENERAL] Count(*) throws error

From
Tom Lane
Date:
"Simon Riggs" <simon@2ndquadrant.com> writes:
> On Wed, 2007-07-11 at 18:13 -0400, Tom Lane wrote:
>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>> Seems like we could be slightly more friendly without too much bother:
>>> at least only substitute after the VALUES clause in INSERT.
>> 
>> Surely you jest.

> No. There are a places where parameters clearly aren't allowed, so
> making the substitutions in those places can easily be prevented. The
> remainder of the problem is as hard as you think, but getting half way
> there seems very easy.

It's not nearly as easy as you think.  Even for the limited case of
the column list for an INSERT, consider
DECLARE i int; j int;...INSERT INTO mytable (arraycol[i]) VALUES (j);

Considering that the current plpgsql parser doesn't have any knowledge
*at all* of the syntactic structure of individual SQL commands, even
teaching it to recognize an INSERT column list correctly would be a huge
addition of code.  The payback for that seems darn small considering
that (1) that's only a small portion of the trouble cases and (2) it is
very easy for users to recognize a failure in this context, whereas many
of the other trouble cases aren't mechanically recognizable at all.

Ultimately we have to do a better job of teaching users to recognize
this issue for themselves; which is why I think it's basically a
documentation problem.

The other problem with trying to inject a small amount of smarts is that
it complicates explaining the system.  I think the docs have to explain
*exactly* how the substitution works (which they fail to do right now,
but I intend to make 'em do so today), and putting in any special cases
will make that explanation longer and more confusing.
        regards, tom lane


Re: [GENERAL] Count(*) throws error

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Actually, rather than get into that sort of AI-complete project,

> Is it really AI-complete? ISTM the *only* place where a parameter is allowed
> is where the parser inserts a ColumnRef node? 

The first problem is that we have to make the decision far upstream of that.

The second is that whether we *can* insert a parameter symbol is not the
same as whether we *should*.  Even if we made that work, we'd only be
fixing the cases that currently result in trivial-for-users-to-recognize
syntactic errors.  The deeper semantic errors arising from substituting
a parameter where the user meant to refer to a similarly-named column
can only be dealt with through user education.
        regards, tom lane


Re: [GENERAL] Count(*) throws error

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> "Simon Riggs" <simon@2ndquadrant.com> writes:
>> On Wed, 2007-07-11 at 18:13 -0400, Tom Lane wrote:
>>> "Simon Riggs" <simon@2ndquadrant.com> writes:
>>>> Seems like we could be slightly more friendly without too much bother:
>>>> at least only substitute after the VALUES clause in INSERT.
>>> 
>>> Surely you jest.
>
>> No. There are a places where parameters clearly aren't allowed, so
>> making the substitutions in those places can easily be prevented. The
>> remainder of the problem is as hard as you think, but getting half way
>> there seems very easy.
>
> It's not nearly as easy as you think.  Even for the limited case of
> the column list for an INSERT, consider
>
>     DECLARE i int; j int;
>     ...
>     INSERT INTO mytable (arraycol[i]) VALUES (j);

huh, i hadn't actually seen that before -- it doesn't seem very useful. But
I've certainly seen and used "UPDATE SET arraycol[i] = val".


> Considering that the current plpgsql parser doesn't have any knowledge
> *at all* of the syntactic structure of individual SQL commands, even
> teaching it to recognize an INSERT column list correctly would be a huge
> addition of code.  

It does eventually call the SQL parser though doesn't it? I was thinking the
way to do it would be to delay substituting the formal parameters to later,
after the parsing. 

So instead of substituting them as the tokens are lexed, instead suck in the
tokens, run the parser -- which we currently do anyways just to check the
syntax -- then walk the tree looking for ColumnRefs where the name matches a
variable name. Then keep around that parse tree instead of just the series of
lex tokens to later call analyze on and execute.

> The other problem with trying to inject a small amount of smarts is that
> it complicates explaining the system.  

Well that's quite true, but then that's the basic difficulty with any part of
plpgsql.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Re: [GENERAL] Count(*) throws error

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> So instead of substituting them as the tokens are lexed, instead suck in the
> tokens, run the parser -- which we currently do anyways just to check the
> syntax -- then walk the tree looking for ColumnRefs where the name matches a
> variable name. Then keep around that parse tree instead of just the series of
> lex tokens to later call analyze on and execute.

Hmm.  That sounds cool but I think it actually has a pretty substantial
disadvantage --- there is then no easily user-readable representation of
the query that shows *which* occurrences got substituted.  With the
textual replacement method you have a string you can look at, though
it's true that we don't always show it to the user if we are not aware
there's a problem.

Again, I'm trying to look at the big picture of both syntactic and
semantic errors.  If we solve only the syntactic end of it I think we'd
actually be worse off, because then users would be even more lost when
they hit a semantic error (unwanted substitution).
        regards, tom lane


Re: [GENERAL] Count(*) throws error

From
Richard Huxton
Date:
Tom Lane wrote:
> Again, I'm trying to look at the big picture of both syntactic and
> semantic errors.  If we solve only the syntactic end of it I think we'd
> actually be worse off, because then users would be even more lost when
> they hit a semantic error (unwanted substitution).

The only real solution is to have some clear syntactic sugar denoting 
where variables are to be interpolated:  SELECT * FROM foo WHERE (bar + 1) = bar   -- Which is the var?  SELECT * FROM
fooWHERE (bar + 1) = $bar  -- OK  SELECT * FROM foo WHERE (bar + 1) = {bar} -- OK
 
It's not clear to plpgsql because it's *not clear*.

In any other namespace-conflict situation I can think of it's always 
inner-definition-is-visible. This would of course solve the problem, but  only by preventing you from substituting in
variablesthat conflict 
 
with columns. Unless you generate a warning at function compile-time 
that doesn't seem much better.

--   Richard Huxton  Archonet Ltd