Re: creating index names automatically? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: creating index names automatically?
Date
Msg-id 603c8f070912241647i64c493can4a6dc3ca38d802f9@mail.gmail.com
Whole thread Raw
In response to Re: creating index names automatically?  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: creating index names automatically?  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Thu, Dec 24, 2009 at 12:07 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> It compiles without warnings for me. There's only one production that
>> allows exactly one word between INDEX and ON.
>
> In that case you broke something.  I'm too tired to work out exactly
> what.

Heh.  Well, I almost certainly did, since it wasn't a complete patch
and I didn't test it, but I am not sure that proves that the idea was
bad.  Upthread Greg said:

> I suppose we could fix this by specifying a precedence and then
> explicitly checking if you're trying to make an index named
> concurrently and fixing it up later.

And your response was:

> No, not really.  Past the grammar there is no way to tell concurrently
> from "concurrently", ie, if we did it like that then you couldn't even
> use double quotes to get around it.

But it is merely an accident of the way the grammer happens to be
built that CONCURRENTLY and "concurrently" happen to evaluate to
equivalent values.  It's easy to make a set of productions that treat
them differently, which is what I did here.   It doesn't even require
precedence.  AIUI, there are four constructs that we wish to support:

1. CREATE INDEX ON table (columns);
2. CREATE INDEX CONCURRENTLY ON table (columns);
3. CREATE INDEX index_name ON table (columns);
4. CREATE INDEX CONCURRENTLY index_name ON table (columns);

If we create these as four separate productions, then after shifting
CREATE INDEX CONCURRENTLY and seeing that the next token is ON, we
don't know whether to reduce CONCURRENTLY to index_name or shift.  But
if we unify (2) and (3) into a single production and sort it out when
we reduce the whole statement, then we end up with:

1. CREATE INDEX ON table (columns);
2/3. CREATE INDEX tricky_index_name ON table (columns);
4. CREATE INDEX CONCURRENTLY index_name ON table (columns);

Unless I'm missing something, this eliminates the problem.  Now, after
shifting CREATE INDEX CONCURRENTLY, if the next token is ON, we reduce
(matching case 2/3); otherwise, we shift again (hoping to match case
4).  The remaining problem is to define tricky_index_name in a way
that allows us to distinguish CONCURRENTLY from "concurrently", which
is easy enough to do.

Still another way to solve this problem would be to create a
production called unreserved_keywords_except_concurrently, so that
index_name could be defined not to include CONCURRENTLY without quotes
as one of the possibilities.  But I think this way is cleaner.

Having said all this, I don't really object to the alternate proposal
of creating a set of words that are reserved as relation names but not
as column names, either, especially if it would allow us to make some
other existing keywords less-reserved.  But I don't really understand
the justification for thinking that CONCURRENTLY is OK to make more
reserved, but, say, EXPLAIN would not be OK.  This is one, pretty
marginal production - there's nothing else in the grammar that even
uses CONCURRENTLY, let alone needs it to be reserved.  The whole
problem here comes from what seems like a pretty poor choice about
where to put the word CONCURRENTLY.   It would have been a lot more
robust to put this in a section of the statement where any additional
verbiage was inevitably going to be introduced by a keyword, like just
before or after the storage parameters.

I think what we should learn from this case, as well as the recent
changes to EXPLAIN, COPY, and VACUUM syntax, is that adding options to
commands by creating keywords is not very scalable, and that putting
the modifier immediately after the command name is an especially poor
positioning.  Without explicit delimiters, it's easy to get parser
conflicts, and as the number of options grows (even to a relatively
modest value like 2 or 3), the fact that they have to appear in a
fixed order becomes a real pain.

...Robert


pgsql-hackers by date:

Previous
From: "David E. Wheeler"
Date:
Subject: Re: unicode questions
Next
From: Robert Haas
Date:
Subject: Re: Small change of the HS document