Re: 7.2 - changed array_out() - quotes vs no quotes - Mailing list pgsql-hackers

From Tom Lane
Subject Re: 7.2 - changed array_out() - quotes vs no quotes
Date
Msg-id 23619.1013182746@sss.pgh.pa.us
Whole thread Raw
In response to Re: 7.2 - changed array_out() - quotes vs no quotes  (David Gould <dg@nextbus.com>)
List pgsql-hackers
David Gould <dg@nextbus.com> writes:
> Yes. I think it is not excessive to insist that types have stable,
> predicatable representations. The other types do, why should arrays be
> even more special?

The representation is stable and predictable.  You're simply hoping to
avoid building smarts into your parser for it.  Unfortunately, some
degree of smarts are *necessary* if you are going to deal with array
items containing arbitrary text.  I can hardly believe that a client
program that can deal with backslash-escapes is going to have trouble
removing quotes.

> Or, you don't even need the quotes, you could just promise never
> to insert white space and to always escape embedded commas and curlys.

No, we can't, because that would break applications that rely on the
existing rules for array input: leading whitespace is insignificant
unless quoted.  Besides, weren't you complaining because the quotes
disappeared?  The above variant would still break your code.

> So a dumb client could simply split on un-escaped commas and be done.

I hardly think that a client that can tell the difference between an
escaped comma and an un-escaped one qualifies as "dumb".

We could perhaps dispense with quotes on output if we escaped leading
spaces.  For example, instead of"  foo"
emit\  foo
I don't think this is a step forward in readability, though.  And
increased reliance on backslashes instead of double quotes won't really
make anyone's life easier --- for example, you'd have to remember to
double them when sending the same value back to the SQL parser.

>> The only way I could see to make the behavior totally predictable at
>> the datatype level (while not being broken) is to always quote every
>> array element.

> Fine with me. That is what it did before.

No, it has never done that.  In particular, I do not wish to change the
longstanding no-quotes behavior for arrays of integers.  That *would*
break other people's code.  (One of the things I hoped to accomplish
with this change is to extend the same no-quotes behavior to floats and
numerics.)

> But to slip a client visible change late in a beta cycle to a specific
> format that has been stable since UC Berkeley freed the code,

It's been broken since Berkeley, too; the fact that no one complained
till a month or two ago just indicates how little arrays are used, IMHO.
I doubt you'd be any less annoyed no matter when in the development
cycle we'd done this.

I do agree that it'd be better if this had been called out in the
release notes.  We don't currently have any process for ensuring that
minor incompatibilities get noted in the release notes.  Bruce makes up
the notes based on scanning the CVS logs after the fact, and if he
misses the significance of an entry, it's missed.  Maybe we can do
better than that --- adding an entry to a release-notes-to-be file when
the change is made might be more reliable.

It's also true that the SGML documentation is sadly deficient on this
point; but then, its discussion of arrays is overly terse in just about
every respect.  Someone want to volunteer to expand it?

> Seriously, one point of a database is to insulate client applications
> from the exact representation and layout of the data. Which is not
> accomplished by making arbitrary changes to simple things like strings
> that make them take a yards and yards of code to parse.

Properly parsing arrays of text values is going to require dealing with
backslash-escapes in any case; seems to me that that's what will take
"yards and yards" of code.  Stripping off optional quotes is trivial
by comparison.  On the other hand, parsing arrays of integers is pretty
trivial since you know there are no escapable characters anywhere.
I don't favor pushing complexity out of the one case and into the other.

I'm willing to consider the output-no-quotes-at-all approach if people
think that's a superior solution.  Comments anyone?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Teodor Sigaev
Date:
Subject: GiST on 64-bit box
Next
From: hiroyuki hanai
Date:
Subject: Re: compile error of PostgreSQL 7.2 on FreeBSD-current