Thread: Re: [COMMITTERS] pgsql: Arrange for quote_identifier() and pg_dump to not quote keywords

"Tom Lane" <tgl@postgresql.org> writes:

> For the moment, lie about WITH's status in the table so it will still get
> quoted --- this is because of the expectation that WITH will become reserved
> when the SQL recursive-queries patch gets done.

Out of curiosity I'm checking what consequences some other future grammer
changes might have for us. Today I checked out full spec compliant GROUP BY
syntax including ROLLUP, CUBE, and GROUPING SETS.

There are two conclusions of note:

1) ROLLUP and CUBE would have to be col_name_keyword keywords. 
  That could be an annoyance for the cube contrib package because it defines  a few constructors like cube(float8[]).
Youcould still have a type named  "cube" but the functions would have to be renamed. Personally I always  found "cube"
astrange name anyways, I think of this data type more as a  n-space vector than a cube anyways.
 
  quote_identifier would start quoting "cube" and "rollup" everywhere. My  first inclination was that it's probably not
necessaryto start  preemptively quoting them this release because people are more likely to  use them as column names
thanfunction names anyways. But perhaps that's  not true given the contrib module.
 

2) Assuming we keep our extension of allowing arbitrary expressions in GROUP  BY lists then there is a conflict between
ourundecorated row constructor  '(' expr_list ')' and the spec which allows parenthesized sublists in the  grouping
list.
  I'm not sure this is a real problem though. As near as I can tell the  semantics of grouping by a ROW(a,b) and
groupingby columns (a,b) as a  grouping set element are basically the same anyways. So I think we can just  accept any
arbitraryexpression including row constructors as what the spec  calls an "ordinary grouping set".
 




For what it's worth here's the grammar I get by basically copying the grammar
straight out of the spec and then cleaning up the conflicts including making
ordinary_grouping_set a straight expr_list as described above:

opt_group_set_clause:           DISTINCT           | ALL           | /*EMPTY*/       ;

group_clause:           GROUP_P BY opt_group_set_clause grouping_element_list           | /*EMPTY*/       ;

grouping_element_list:   grouping_element   | grouping_element_list ',' grouping_element   ;

grouping_element:   a_expr   | rollup_list   | cube_list   | grouping_sets_specification   | empty_grouping_set   ;

rollup_list:   ROLLUP '(' expr_list ')'   ;

cube_list:   CUBE '(' expr_list ')'   ;

grouping_sets_specification:   GROUPING SETS '(' grouping_set_list ')'   ;

grouping_set_list:   grouping_set   | grouping_set_list ',' grouping_set   ;

grouping_set:   a_expr   | rollup_list   | cube_list   | grouping_sets_specification   | empty_grouping_set   ;

empty_grouping_set: '(' ')'   ;

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



"Gregory Stark" <stark@enterprisedb.com> writes:

> "Tom Lane" <tgl@postgresql.org> writes:
>
>> For the moment, lie about WITH's status in the table so it will still get
>> quoted --- this is because of the expectation that WITH will become reserved
>> when the SQL recursive-queries patch gets done.
>
> Out of curiosity I'm checking what consequences some other future grammer
> changes might have for us.

And checking OLAP window functions it at first glance at least it seems we
would have to make ROWS, WINDOW, RANGE, and PARTITION reserved keywords.

Again, I just sucked in the spec's grammar and fixed it up to fit in our
grammar. I did make PARTITION BY and ORDER BY take arbitrary expressions as is
our general approach.

It's possible there may be clever ways around these conflicts, in particular
it seems like it might be possible to make PARTITION not a keyword given the
nearby BY. However it appears in parentheses which looks like it makes things
a bit tricky.

Here's the relevant grammar snippet, and I attached the diff for gram.y to
parse both window clause and rollup/cube/grouping sets.

window_clause:
    WINDOW window_definition_list
    | /* EMPTY */
    ;

window_definition_list:
    window_definition
    | window_definition ',' window_definition
    ;

window_definition:
    ColId AS window_specification
    ;

window_specification:
    '(' window_specification_details ')'
    ;

window_specification_details:
    opt_window_name
    window_partition_clause
    window_order_clause
    window_frame_clause
    ;

opt_window_name:
    ColId
    | /*EMPTY*/ ;
    ;

window_partition_clause:
    PARTITION BY expr_list
    | /* EMPTY */
    ;

window_order_clause:
    ORDER BY sortby_list
    | /* EMPTY */
    ;

window_frame_clause:
    window_frame_units window_frame_extent opt_window_frame_exclusion
    | /* EMPTY */
    ;

window_frame_units:
    ROWS
    | RANGE
    ;

window_frame_extent:
    window_frame_start
    | window_frame_between
    ;

window_frame_start:
    UNBOUNDED PRECEDING
    | window_frame_preceding
    | CURRENT_P ROW
    ;

window_frame_preceding:
    Iconst PRECEDING
    ;

window_frame_between: BETWEEN window_frame_bound AND window_frame_bound
    ;

window_frame_bound:
    window_frame_start
    | UNBOUNDED FOLLOWING
    | window_frame_following
    ;

window_frame_following:
    Iconst FOLLOWING
    ;

opt_window_frame_exclusion:
    /*EMPTY*/
    | EXCLUDE CURRENT_P ROW
    | EXCLUDE GROUP_P
    | EXCLUDE TIES
    | EXCLUDE NO OTHERS
    ;


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment
"Gregory Stark" <stark@enterprisedb.com> writes:

> "Tom Lane" <tgl@postgresql.org> writes:
>
>> For the moment, lie about WITH's status in the table so it will still get
>> quoted --- this is because of the expectation that WITH will become reserved
>> when the SQL recursive-queries patch gets done.
>
> Out of curiosity I'm checking what consequences some other future grammer
> changes might have for us. 

Oh, and I forgot OVER (assuming we want to actually use the windows, not just
define them). It looks like OVER and also PARTITION can get by as
type_func_name_keyword.

So that gives us:

col_name_keyword:   CUBE   ROLLUP

type_func_name_keyword:   OVER   PARTITION

reserved_keyword:   ROWS   WINDOW   RANGE

unreserved_keyword:   GROUPING   SETS   PRECEDING   FOLLOWING   OTHERS   UNBOUNDED   TIES   EXCLUDE


I'm thinking it may make sense to lie about all of these in quote_identifier
so that someone who upgrades from 8.2 to 8.3 can then upgrade to 8.4. If we
add reserved words in one step then there's no way to upgrade except to rename
things before dumping. Any comments?

Does anyone else have any pet features from the spec which introduce new
syntax they would like to see in Postgres soon?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com




Gregory Stark wrote:
> I'm thinking it may make sense to lie about all of these in quote_identifier
> so that someone who upgrades from 8.2 to 8.3 can then upgrade to 8.4. If we
> add reserved words in one step then there's no way to upgrade except to rename
> things before dumping. Any comments?
>
>
>   

I'm confused. Won't pg_dump quote the keywords it knows its postgres 
version will need quoted? We can't expect it to have knowledge of future 
requirements for quoting, unless someone really does invent time travel 
(in which case someone could just go to the future and bring back 
version 107.6 and save us all the trouble).

cheers

andrew


Andrew Dunstan <andrew@dunslane.net> writes:
> I'm confused. Won't pg_dump quote the keywords it knows its postgres 
> version will need quoted? We can't expect it to have knowledge of future 
> requirements for quoting, unless someone really does invent time travel 
> (in which case someone could just go to the future and bring back 
> version 107.6 and save us all the trouble).

Yeah.  I'm disinclined to pre-emptively quote things for pie-in-the-sky
patches.  WITH is already a grammar keyword, so it's not a big deal to
tweak things to quote it, but adding a dozen keywords that have zero
functionality in the grammar is another thing entirely.

Also, the fact that this particular form of the grammar requires
reserving the keywords does not prove that there is no way to have the
features without that.  I'd want to see us try a little harder first.
        regards, tom lane


"Andrew Dunstan" <andrew@dunslane.net> writes:

> Gregory Stark wrote:
>> I'm thinking it may make sense to lie about all of these in quote_identifier
>> so that someone who upgrades from 8.2 to 8.3 can then upgrade to 8.4. If we
>> add reserved words in one step then there's no way to upgrade except to rename
>> things before dumping. Any comments?
>
> I'm confused. Won't pg_dump quote the keywords it knows its postgres version
> will need quoted? We can't expect it to have knowledge of future requirements
> for quoting

Well as far as standard syntax from the spec we can. But you're right, as
Heikki pointed out offline, the approved upgrade path is to use the pg_dump
from the target version to dump the old database.

It may still be worth telling one version of Postgres about anticipated
keywords prior to a release which adds them so that people who don't follow
instructions and try a dump generated with an old pg_dump have a fighting
chance. Besides, what's a person to do if they have a dump lying around?
Restore it in an old database so they can re-dump it with the new pg_dump?

That said, this would only bite someone who is using columns named "cube" or
"rows" *and* uses the old pg_dump or tries to restore a dump they already
have without re-dumping with the new pg_dump. It may just not be worth
worrying about.

>  unless someone really does invent time travel (in which case someone could
> just go to the future and bring back version 107.6 and save us all the
> trouble).

That would revolutionize the IP law field....

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com




Gregory Stark wrote:
> It may still be worth telling one version of Postgres about anticipated
> keywords prior to a release which adds them so that people who don't follow
> instructions and try a dump generated with an old pg_dump have a fighting
> chance. Besides, what's a person to do if they have a dump lying around?
> Restore it in an old database so they can re-dump it with the new pg_dump?
>
>   


That is in fact the only thing we fairly much guarantee to work.

cheers

andrew


"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Yeah.  I'm disinclined to pre-emptively quote things for pie-in-the-sky
> patches.  WITH is already a grammar keyword, so it's not a big deal to
> tweak things to quote it, but adding a dozen keywords that have zero
> functionality in the grammar is another thing entirely.

Sure.

> Also, the fact that this particular form of the grammar requires
> reserving the keywords does not prove that there is no way to have the
> features without that.  I'd want to see us try a little harder first.

At least some of them are unavoidable conflicts:

select a,b,count(*) from tab GROUP BY cube(a,b)
select a,b,count(*) from tab GROUP BY rollup(a,b)

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> Also, the fact that this particular form of the grammar requires
>> reserving the keywords does not prove that there is no way to have the
>> features without that.  I'd want to see us try a little harder first.

> At least some of them are unavoidable conflicts:

> select a,b,count(*) from tab GROUP BY cube(a,b)
> select a,b,count(*) from tab GROUP BY rollup(a,b)

In principle this case could be handled by having parse analysis
treat a FuncCall node at the top level of GROUP BY specially when
the function name is CUBE etc.

I'm not saying that might not be uglier than having the grammar
recognize it, just pointing out that there's usually more than
one way to skin a cat.
        regards, tom lane