Thread: Exposing keywords to clients

Exposing keywords to clients

From
"Dave Page"
Date:
Hi,

The attached patch implements a new function, pg_get_keywords(), which
returns a set of records describing the keywords recognised by the
server. This allows clients such as pgAdmin to get quoting rules
correct, and helps with other tasks such as syntax highlighting where
we need to support multiple server versions.

Example output (edited of course):

postgres=# select * from pg_get_keywords();
       word        |       category
-------------------+-----------------------
 all               | Reserved
 binary            | Type or function name
 xmlserialize      | Column name
 zone              | Unreserved
(372 rows)

I wasn't sure about the best way to describe the categories -
obviously they need to be non-translatable (for client software to
interpret), but human readable is also nice. I'm happy to hear
alternate suggestions.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Attachment

Re: Exposing keywords to clients

From
Alvaro Herrera
Date:
Dave Page wrote:
> Hi,
>
> The attached patch implements a new function, pg_get_keywords(), which
> returns a set of records describing the keywords recognised by the
> server. This allows clients such as pgAdmin to get quoting rules
> correct, and helps with other tasks such as syntax highlighting where
> we need to support multiple server versions.

FWIW pg_dump has fmtId() which does something related.

I think it's a bit bogus to be using the list as compiled client-side,
precisely due to the theoretical chance that it could change from one
server version to the next, but it's probably not very likely that we
ever remove a keyword from the server grammar.  And highlighting a
keyword that's not really a keyword is unlikely to be a problem in
practice -- in fact it makes it obvious that the user is likely to be in
trouble later when they upgrade.


> postgres=# select * from pg_get_keywords();
>        word        |       category
> -------------------+-----------------------
>  all               | Reserved
>  binary            | Type or function name
>  xmlserialize      | Column name
>  zone              | Unreserved
> (372 rows)
>
> I wasn't sure about the best way to describe the categories -
> obviously they need to be non-translatable (for client software to
> interpret), but human readable is also nice. I'm happy to hear
> alternate suggestions.

Perhaps use a separate string for machine parse (say R, T, C, U), and
let the string be translatable.


--
Alvaro Herrera                                http://www.CommandPrompt.com/
The PostgreSQL Company - Command Prompt, Inc.

Re: Exposing keywords to clients

From
Tom Lane
Date:
Alvaro Herrera <alvherre@commandprompt.com> writes:
> FWIW pg_dump has fmtId() which does something related.

> I think it's a bit bogus to be using the list as compiled client-side,
> precisely due to the theoretical chance that it could change from one
> server version to the next, but it's probably not very likely that we
> ever remove a keyword from the server grammar.

Actually, it's 100% intentional that pg_dump does it that way --- I
would not support modifying it to use this function (even if it existed
in the back branches).  The reason is exactly that pg_dump wants to
generate output that is correct for its own PG version, not that of the
server it's dumping from.

The tradeoffs are probably different for pgAdmin, but it is important to
realize that either way might be the best thing for a particular case.

            regards, tom lane

Re: Exposing keywords to clients

From
"Dave Page"
Date:
On Sat, May 3, 2008 at 12:24 AM, Alvaro Herrera
<alvherre@commandprompt.com> wrote:
> Dave Page wrote:
> > Hi,
> >
> > The attached patch implements a new function, pg_get_keywords(), which
> > returns a set of records describing the keywords recognised by the
> > server. This allows clients such as pgAdmin to get quoting rules
> > correct, and helps with other tasks such as syntax highlighting where
> > we need to support multiple server versions.
>
> FWIW pg_dump has fmtId() which does something related.
>
> I think it's a bit bogus to be using the list as compiled client-side,
> precisely due to the theoretical chance that it could change from one
> server version to the next, but it's probably not very likely that we
> ever remove a keyword from the server grammar.  And highlighting a
> keyword that's not really a keyword is unlikely to be a problem in
> practice -- in fact it makes it obvious that the user is likely to be in
> trouble later when they upgrade.

Yeah, we currently lift a copy of keywords.c into the pgAdmin source
and use that in our own qtIdent() function, but it's clearly only
correct for the version of Postgres we pull it from, whilst pgAdmin
supports back to 7.3 in current releases. It's also a pain because we
try to support some of the derivative servers like those offered by
Greenplum and EnterpriseDB which may have additional keywords (though
that obviously is not a problem for this community).

> > postgres=# select * from pg_get_keywords();
> >        word        |       category
> > -------------------+-----------------------
> >  all               | Reserved
> >  binary            | Type or function name
> >  xmlserialize      | Column name
> >  zone              | Unreserved
> > (372 rows)
> >
> > I wasn't sure about the best way to describe the categories -
> > obviously they need to be non-translatable (for client software to
> > interpret), but human readable is also nice. I'm happy to hear
> > alternate suggestions.
>
> Perhaps use a separate string for machine parse (say R, T, C, U), and
> let the string be translatable.

I considered that, but wasn't sure if folks would like the redundancy.
It's trivial to do of course - any objections?

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Re: Exposing keywords to clients

From
Peter Eisentraut
Date:
Dave Page wrote:
> > Perhaps use a separate string for machine parse (say R, T, C, U), and
> > let the string be translatable.
>
> I considered that, but wasn't sure if folks would like the redundancy.
> It's trivial to do of course - any objections?

Is there anything useful you would do with this information?  Or would you
just quote all listed words anyway?

Re: Exposing keywords to clients

From
Tom Lane
Date:
Peter Eisentraut <peter_e@gmx.net> writes:
> Dave Page wrote:
>>> Perhaps use a separate string for machine parse (say R, T, C, U), and
>>> let the string be translatable.
>>
>> I considered that, but wasn't sure if folks would like the redundancy.
>> It's trivial to do of course - any objections?

> Is there anything useful you would do with this information?  Or would you
> just quote all listed words anyway?

I think the practical application would be to avoid quoting unreserved
keywords, as pg_dump for instance already does.  I doubt anyone would
bother distinguishing the different types of partially/wholly reserved
words.  So maybe a boolean would be sufficient --- but I have nothing
against the R/T/C/U suggestion.

A more radical alternative is just to omit unreserved words from the
view altogether.

            regards, tom lane

Re: Exposing keywords to clients

From
"Dave Page"
Date:
On Sat, May 3, 2008 at 6:12 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Peter Eisentraut <peter_e@gmx.net> writes:
> > Is there anything useful you would do with this information?  Or would you
> > just quote all listed words anyway?

Currently, yes, we just quote all listed words.

> I think the practical application would be to avoid quoting unreserved
> keywords, as pg_dump for instance already does.  I doubt anyone would
> bother distinguishing the different types of partially/wholly reserved
> words.  So maybe a boolean would be sufficient --- but I have nothing
> against the R/T/C/U suggestion.
>
> A more radical alternative is just to omit unreserved words from the
> view altogether.

Well my thinking is that it costs nothing extra bar a dozen lines of
code to include the info now in case it's useful in the future, if not
for pgAdmin, then maybe for Lightning Admin or one of the other tools.
If a need for it arises in the future and we haven't included it now
we'll either want to add a second function or break compatibility both
of which strike me as a lot more objectionable than doing it all now.

Attached is an updated patch, giving the following output. The catdesc
column can be translated.

postgres=# select * from pg_get_keywords();
       word        | catcode |        catdesc
-------------------+---------+-----------------------
 abort             | U       | Unreserved
 all               | R       | Reserved
 bigint            | C       | Column name
 binary            | T       | Type or function name

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Attachment

Re: Exposing keywords to clients

From
Tom Lane
Date:
"Dave Page" <dpage@pgadmin.org> writes:
> Attached is an updated patch, giving the following output. The catdesc
> column can be translated.

Documentation has got a couple of problems:

> +     contains the keyword, the <structfield>catcode</> column contains a
> +     category code of 'U' for unknown, 'C' for column name, 'T' for type or
> +     function name or 'U' for unreserved. The <structfield>catdesc</>
> +     column contains a localised stribg describing the category.

I wouldn't document the "unknown" case at all, much less document it
incorrectly, and you forgot the "reserved" case, and "stribg"?

            regards, tom lane

Re: Exposing keywords to clients

From
Tom Lane
Date:
"Dave Page" <dpage@pgadmin.org> writes:
> Attached is an updated patch, giving the following output.

Oh, one other thing: dropping externs into random modules unrelated to
their source module is completely awful programming style, because there
is nothing preventing incompatible declarations.  Put those externs in
keywords.h instead.  I suspect you have ignored a compiler warning
about not declaring pg_get_keywords itself, too --- it should be
extern'd in builtins.h.

            regards, tom lane

Re: Exposing keywords to clients

From
"Dave Page"
Date:
On Sat, May 3, 2008 at 9:06 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> "Dave Page" <dpage@pgadmin.org> writes:
> > Attached is an updated patch, giving the following output.
>
> Oh, one other thing: dropping externs into random modules unrelated to
> their source module is completely awful programming style, because there
> is nothing preventing incompatible declarations.  Put those externs in
> keywords.h instead.

OK.

> I suspect you have ignored a compiler warning
> about not declaring pg_get_keywords itself, too --- it should be
> extern'd in builtins.h.

No, no warning (I'm using VC++ today) - but fixed anyway.

Update attached, including corrected docs. Note to self - proof read
docs *after* putting the kids to bed in future.

--
Dave Page
EnterpriseDB UK: http://www.enterprisedb.com

Attachment

Re: Exposing keywords to clients

From
Nikhils
Date:
Hi,

> > Attached is an updated patch, giving the following output.
>
> Oh, one other thing: dropping externs into random modules unrelated to
> their source module is completely awful programming style, because there
> is nothing preventing incompatible declarations.  Put those externs in
> keywords.h instead.

OK.

> I suspect you have ignored a compiler warning
> about not declaring pg_get_keywords itself, too --- it should be
> extern'd in builtins.h.

No, no warning (I'm using VC++ today) - but fixed anyway.

Update attached, including corrected docs. Note to self - proof read
docs *after* putting the kids to bed in future.

Here are some comments from me:

* doc/src/sgml/func.sgml

a) Changed "localised" to "localized" to be consistent with the references elsewhere in the same file.

* src/backend/utils/adt/misc.c

b) I wonder if we need the default case in the switch statement at all, since we are scanning the statically populated ScanKeywords array with proper category values for each entry.
 
c) There was a warning during compilation since we were assigning a const pointer to a char pointer
             values[0] = ScanKeywords[funcctx->call_cntr].name;

* src/include/catalog/pg_proc.h

d) oid 2700 has been claimed by another function in the meanwhile. Modified it to 2701.
DATA(insert OID = 2701 ( pg_get_keywords    PGNSP PGUID 12 10 400 f f t t s 0 2249

e) I was wondering why pronargs is set to 0 above. But I see other functions doing the same, so its ok I guess for such kinds of usages.

PFA, version 4 of this patch with a,c and d taken care of.

Regards,
Nikhils
--
EnterpriseDB http://www.enterprisedb.com
Attachment

Re: Exposing keywords to clients

From
Tom Lane
Date:
Nikhils <nikkhils@gmail.com> writes:
>>> Attached is an updated patch, giving the following output.

Applied with minor revisions.

> Here are some comments from me:

> a) Changed "localised" to "localized" to be consistent with the references
> elsewhere in the same file.

What I didn't like about the docs was the classification of the function
as a "session information function".  There's certainly nothing
session-dependent about it.  I ended up putting it under "System Catalog
Information Functions", which isn't really quite right either but it's
closer than the other one.

> b) I wonder if we need the default case in the switch statement at all,
> since we are scanning the statically populated ScanKeywords array with
> proper category values for each entry.

I left it in for safety, but made it just return nulls --- there's no
point in wasting more code space on it than necessary, and certainly
no point in asking translators to translate a useless string.

> c) There was a warning during compilation since we were assigning a const
> pointer to a char pointer
>              values[0] = ScanKeywords[funcctx->call_cntr].name;

Yeah, I couldn't see a good way around that either --- we could pstrdup
but that seems overkill, and adjusting the API of BuildTupleFromCStrings
to take const char ** doesn't look appetizing either.

> d) oid 2700 has been claimed by another function in the meanwhile. Modified
> it to 2701.
> DATA(insert OID = 2701 ( pg_get_keywords    PGNSP PGUID 12 10 400 f f t t s
> 0 2249

2701 was claimed too.  You need to use the unused_oids script to find
unique free OIDs.

            regards, tom lane