Thread: Keyword classifications

Keyword classifications

From

Tom Lane

Date:

31 December 2015, 23:14:02

I did a bit of initial poking at the problem complained of in bug #13840,
namely that if you use "none" as the value of a reloption, you get a
syntax error when trying to reload a dump containing the table or index
declaration.  The submitter blames this on pg_dump but it is surely not
pg_dump's fault; it's just printing what pg_get_indexdef() gave it.
And what pg_get_indexdef() prints is just verbatim what is in each array
element of pg_class.reloptions.

Now, one line of thought here is that flatten_reloptions() is out of its
mind to not be worrying about quoting the reloption values.  And perhaps
it is, but I think if we go that direction, we may be fighting similar
fires for awhile to come.  psql's describe.c, for example, doesn't worry
about quoting anything when printing reloptions, and there's likely
similar code in third-party clients.  Also, a solution like this would
do nothing for existing dump files.

The other line of thought is that we're already making an effort to allow
any keyword to appear as the value of a def_arg, and maybe we should try
to make that work 100% instead of only 90%.  The existing code in
gram.y is:

def_arg:    func_type                       { $$ = (Node *)$1; }           | reserved_keyword              { $$ = (Node
*)makeString(pstrdup($1));}           | qual_all_Op                   { $$ = (Node *)$1; }           | NumericOnly
            { $$ = (Node *)$1; }           | Sconst                        { $$ = (Node *)makeString($1); }       ;
 

so we already allow any keyword that can be a type name, as well as
all fully reserved keywords.  The only thing that's missing is
col_name_keyword (which, not coincidentally, includes NONE).  Now,
if you just try to add a production for col_name_keyword like the one
for reserved_keyword, you get a bunch of reduce-reduce errors because
col_name_keyword includes several words that can also be type names, and
Bison can't decide whether those should go through the func_type or
col_name_keyword paths.  But some experimentation suggests that we could
fix that by subdividing col_name_keyword into two categories, one being
the keywords that can also be type names (BIGINT, BIT, etc) and one
being those that can't (BETWEEN, COALESCE, etc).  Everywhere
col_name_keyword is used now, just mention both sub-categories.
And in def_arg, add a production for only the second sub-category.

I think such a categorization would actually be cleaner than what we have
now; if you read the comments for col_name_keyword, they're pretty squishy
about whether these keywords can be type or function names or not, and
this subdivision would make that a little clearer.  Interestingly, it
also seems that the grammar tables become slightly smaller, suggesting
that Bison also finds this more regular.

However, we have a problem, which is that in the distant past somebody
had the bright idea of exposing the keyword categories to userspace via
pg_get_keywords().  That means we have to either add a new category code
to pg_get_keywords' output, or have it print the same code for both of
these new categories, neither of which seems exactly ideal.

Another question is whether we'd risk back-patching such a change.
It'd be relatively self-contained in terms of our own code, but I'm
wondering whether there's any third-party code with dependencies
on what keyword categories exist.

Thoughts?
        regards, tom lane

Re: Keyword classifications

From

Tom Lane

Date:

01 January 2016, 22:06:13

I wrote:
> Now, one line of thought here is that flatten_reloptions() is out of its
> mind to not be worrying about quoting the reloption values.  And perhaps
> it is, but I think if we go that direction, we may be fighting similar
> fires for awhile to come.  psql's describe.c, for example, doesn't worry
> about quoting anything when printing reloptions, and there's likely
> similar code in third-party clients.  Also, a solution like this would
> do nothing for existing dump files.

> The other line of thought is that we're already making an effort to allow
> any keyword to appear as the value of a def_arg, and maybe we should try
> to make that work 100% instead of only 90%.

After further thought I believe that the right thing to do is pursue both
these lines of attack.  Adding quoting in flatten_reloptions() seems like
a safely back-patchable fix for the original complaint, and it's really
necessary anyway for reloption values that don't look like either an
identifier or a number.  The grammar allows any arbitrary string constant
to be the original form of a reloption, and we have no good reason to
assume that extension modules will constrain their custom reloptions to
be one or the other.  (I'm thinking we'd better be prepared to
double-quote the option names, too, just in case.)

The grammar fixes seem like a good thing to do in the long run, too,
but there's little need to risk back-patching it since accepting
col_name_keywords without quoting would be mostly a convenience issue.
        regards, tom lane

Re: Keyword classifications

From

Michael Paquier

Date:

04 January 2016, 09:23:18

On Sat, Jan 2, 2016 at 4:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wrote:
>> Now, one line of thought here is that flatten_reloptions() is out of its
>> mind to not be worrying about quoting the reloption values.  And perhaps
>> it is, but I think if we go that direction, we may be fighting similar
>> fires for awhile to come.  psql's describe.c, for example, doesn't worry
>> about quoting anything when printing reloptions, and there's likely
>> similar code in third-party clients.  Also, a solution like this would
>> do nothing for existing dump files.
>
>> The other line of thought is that we're already making an effort to allow
>> any keyword to appear as the value of a def_arg, and maybe we should try
>> to make that work 100% instead of only 90%.
>
> After further thought I believe that the right thing to do is pursue both
> these lines of attack.  Adding quoting in flatten_reloptions() seems like
> a safely back-patchable fix for the original complaint, and it's really
> necessary anyway for reloption values that don't look like either an
> identifier or a number.  The grammar allows any arbitrary string constant
> to be the original form of a reloption, and we have no good reason to
> assume that extension modules will constrain their custom reloptions to
> be one or the other.  (I'm thinking we'd better be prepared to
> double-quote the option names, too, just in case.)
>
> The grammar fixes seem like a good thing to do in the long run, too,
> but there's little need to risk back-patching it since accepting
> col_name_keywords without quoting would be mostly a convenience issue.

A different angle of attack is to flatten the argument quotes directly
in reloptions.c:
http://www.postgresql.org/message-id/CAB7nPqTpdGLqLTxuGhBC2GabGNiFRAtLjFbxu=aGy1rX_DgwUg@mail.gmail.com
But you did not like that :p
-- 
Michael

Re: Keyword classifications

From

Simon Riggs

Date:

04 January 2016, 11:11:17

On 1 January 2016 at 19:06, Tom Lane <tgl@sss.pgh.pa.us> wrote:

I wrote:
> Now, one line of thought here is that flatten_reloptions() is out of its
> mind to not be worrying about quoting the reloption values. And perhaps
> it is, but I think if we go that direction, we may be fighting similar
> fires for awhile to come. psql's describe.c, for example, doesn't worry
> about quoting anything when printing reloptions, and there's likely
> similar code in third-party clients. Also, a solution like this would
> do nothing for existing dump files.

> The other line of thought is that we're already making an effort to allow
> any keyword to appear as the value of a def_arg, and maybe we should try
> to make that work 100% instead of only 90%.

After further thought I believe that the right thing to do is pursue both
these lines of attack. Adding quoting in flatten_reloptions() seems like
a safely back-patchable fix for the original complaint, and it's really
necessary anyway for reloption values that don't look like either an
identifier or a number. The grammar allows any arbitrary string constant
to be the original form of a reloption, and we have no good reason to
assume that extension modules will constrain their custom reloptions to
be one or the other. (I'm thinking we'd better be prepared to
double-quote the option names, too, just in case.)

The grammar fixes seem like a good thing to do in the long run, too,
but there's little need to risk back-patching it since accepting
col_name_keywords without quoting would be mostly a convenience issue.

All seems reasonable.

Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Keyword classifications

From

Alvaro Herrera

Date:

04 January 2016, 16:43:43

Tom Lane wrote:

> But some experimentation suggests that we could
> fix that by subdividing col_name_keyword into two categories, one being
> the keywords that can also be type names (BIGINT, BIT, etc) and one
> being those that can't (BETWEEN, COALESCE, etc).  Everywhere
> col_name_keyword is used now, just mention both sub-categories.
> And in def_arg, add a production for only the second sub-category.
> 
> I think such a categorization would actually be cleaner than what we have
> now; if you read the comments for col_name_keyword, they're pretty squishy
> about whether these keywords can be type or function names or not, and
> this subdivision would make that a little clearer.  Interestingly, it
> also seems that the grammar tables become slightly smaller, suggesting
> that Bison also finds this more regular.

Sounds reasonable.  We already have four categories, so one more
shouldn't be a problem, and if Bison likes it then all the better.

So you're talking about this:
* Many of these keywords will in fact be recognized as type or function* names too; but they have special productions
forthe purpose, and so* can't be treated as "generic" type or function names.

and you're saying that this comment will be moved to the first of these
new categories and s/Many of these/These/.  In other words, the "special
productions" will remain, which this is the stuff under SimpleTypename,
minus GenericType.

-- 
Álvaro Herrera                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: Keyword classifications

From

Tom Lane

Date:

05 January 2016, 08:53:08

Michael Paquier <michael.paquier@gmail.com> writes:
> On Sat, Jan 2, 2016 at 4:06 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> The grammar fixes seem like a good thing to do in the long run, too,
>> but there's little need to risk back-patching it since accepting
>> col_name_keywords without quoting would be mostly a convenience issue.

> A different angle of attack is to flatten the argument quotes directly
> in reloptions.c:
> http://www.postgresql.org/message-id/CAB7nPqTpdGLqLTxuGhBC2GabGNiFRAtLjFbxu=aGy1rX_DgwUg@mail.gmail.com
> But you did not like that :p

It seemed pretty messy.  There is nothing very wrong with the convention
that pg_class.reloptions is an array of "name=value" entries with both
name and value being taken literally.  The only thing that rule excludes
is that the option name cannot include an "=", which is a restriction that
bothers me not at all.

The dumped form of reloptions needs to meet the grammar restrictions on
what can be in WITH, but that's really a separate question.

The bug we had was that pg_dump and ruleutils.c weren't considering that
the rules might be different for the two representations.  Yeah, you could
fix it by insisting that the rules be identical, but I don't really find
that cleaner (and it could not be a back-patchable fix for existing
databases, anyway).
        regards, tom lane