Re: Dollar in identifiers - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Dollar in identifiers
Date
Msg-id 29572.998060418@sss.pgh.pa.us
Whole thread Raw
In response to Re: Dollar in identifiers  (Jan Wieck <JanWieck@Yahoo.com>)
Responses Re: Dollar in identifiers
Re: Dollar in identifiers
Re: Dollar in identifiers
List pgsql-hackers
I've been thinking some more about this dollar-sign business.  There
are a couple of points that haven't been made yet.  If you'll allow
me to recap:

It seems like there are two reasonable paths we could take:

1. Keep $ as an operator character.  If we go this way, I think we
should allow a single $ as an operator name too (by removing $ from
the set of "self" characters in scan.l, so that it lexes as an Op).

2. Make $ an identifier character.  Remove it from the set of allowed
operator characters, and instead allow it as second-or-later character
in identifiers.  (It cannot be allowed as first character, else it's
totally ambiguous whether $12 is meant to be a parameter or identifier.)

Option 2 improves Oracle compatibility, at the price of breaking
backwards compatibility for applications that presently use $ as part
of multi-character operator names.  (But does anyone know of any?)

An important thing to think about here is the effects on lexing of
parameter symbols ($digits).  Option 1 does not complicate parameter
lexing; $digits will still be read as a parameter since it's a longer
token than could be formed by taking the $ as an Op.  However, this
option doesn't make things any better either: in particular, we still
have the lexing ambiguity of multicharacter operator vs. parameter.
"x+$12" will be read as x +$ 12, though more likely x + $12 was meant.

With $-as-identifier, it'd no longer be possible for adjacent operators
and parameters to be confused.  Instead we have a new ambiguity with
adjacent parameters and identifiers/keywords.  Presently "select$1from"
is read as SELECT param FROM, but with $-as-identifier it'd be read as
a single identifier.  But the interesting point is that this'd make
parameters work a lot more like identifiers.  People don't expect to
be able to write identifiers adjacent to other identifiers with no
whitespace.  They do expect to be able to write them adjacent to
operators.

In fact, with $-as-identifier we'd have this useful property: given a
lexically-recognizable identifier, substitution of a parameter token
for the identifier does not require insertion of any whitespace to
keep the parameter lexically recognizable.  Some of you will recall
plpgsql bugs associated with the fact that the current lexer behavior
does not have this property.  (The other direction doesn't work 100%,
for example: "select $1from" is lexable, "select foofrom" isn't.  But
that direction is much less interesting in practice.)

In short, $-as-identifier makes the lexer behavior noticeably cleaner
than it is now.

I started out firmly in the "keep $ an operator character" camp.  But
after thinking this through I'm sitting on the fence: both options seem
about equally attractive to me.

Comments?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: crypt and null termination
Next
From: Bruce Momjian
Date:
Subject: Re: Dollar in identifiers