I've been looking at SQL99 while reviewing a book, and stumbled across
some (new to me) behavior for double-quoted identifiers. The SQL99 way
to embed a double-quote into a quoted identifier is to put in two
adjacent double-quotes (much like is done for embedding single-quotes
into string literals in SQL9x).
I've modified (but not yet committed) the pieces in parser (scan.l) and
pg_dump (common.c) to accept and emit appropriate stuff for the embedded
double-quote case. An example is
create table "hi""there" (i int); \d List of relations Name | Type | Owner----------|-------|----------
hi"there| table | lockhart(1 row)
Currently, pg_dump escapes this by embedding a backslash/double-quote
pair, and I'm proposing that it emit two adjacent double-quotes instead
(btw, scan.l does not seem to accept this for input at the moment ;).
Any objections to committing this? Are there other cases which must be
considered?
On another somewhat related point:
String literals can contain escaped characters, which postgres removes
early in the parsing stage. These escapes are re-inserted *every time
the string is returned in a query*. imho this is the wrong behavior,
since the escapes were present in the input only to get around SQL9x
syntax for allowable input characters (or to get around some postgres
oddity). This is not an issue when sending strings back out from the
server, except for perhaps the special case of the null character. And
it's pretty silly to enter escaped strings, remove the escapes, then
re-escape them for all user-visible interactions with the string. Except
for perhaps string comparisons, there is hardly any point in bothering
to unescape the string internally!
I propose that we move the responsibility for re-escaping literal
strings to pg_dump, which is the only utility with the charter to
generate strings which are fully symmetric with input strings. We can
provide libpq with an "escape routine" to assist other apps with this
task if they need it.
Comments?
- Thomas