Unicode string literals versus the world - Mailing list pgsql-hackers

From Tom Lane
Subject Unicode string literals versus the world
Date
Msg-id 1927.1239400465@sss.pgh.pa.us
Whole thread Raw
Responses Re: Unicode string literals versus the world  (Sam Mason <sam@samason.me.uk>)
Re: Unicode string literals versus the world  (Andrew Dunstan <andrew@dunslane.net>)
Re: Unicode string literals versus the world  (Marko Kreen <markokr@gmail.com>)
Re: Unicode string literals versus the world  (Peter Eisentraut <peter_e@gmx.net>)
List pgsql-hackers
So I started to look at what might be involved in teaching plpgsql about
standard_conforming_strings, and was soon dismayed by the sheer epic
nature of its failure to act like the core lexer.  It was shaky enough
before, but the recent introduction of Unicode strings and identifiers
into the core has left plpgsql hopelessly behind.

I can see two basic approaches to making things work: copy-and-paste
practically all of parser/scan.l into plpgsql's lexer (certainly all of
it that involves exclusive states); or throw out plpgsql's lexer
altogether in favor of somehow using the core lexer directly.  Neither
one looks very attractive.

It gets worse though: I have seldom seen such a badly designed piece of
syntax as the Unicode string syntax --- see
http://developer.postgresql.org/pgdocs/postgres/sql-syntax-lexical.html#SQL-SYNTAX-STRINGS-UESCAPE

You scan the string, and then after that they tell you what the escape
character is!?  Not to mention the obvious ambiguity with & as an
operator.

If we let this go into 8.4, our previous rounds with security holes
caused by careless string parsing will look like a day at the beach.
No frontend that isn't fully cognizant of the Unicode string syntax is
going to parse such things correctly --- it's going to be trivial for
a bad guy to confuse a quoting mechanism as to what's an escape and what
isn't.

I think we need to give very serious consideration to ripping out that
"feature".
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: A renewed plea for inclusion of zone.tab
Next
From: Tom Lane
Date:
Subject: Re: pg_restore dependencies