Thread: 7.2 - changed array_out() - quotes vs no quotes
Somewhere after 7.2b2 (it looks like for 7.2b4) a change was made to array_out() in: src/backend/utils/adt/arrayfuncs.c,v 1.72 2001 /11/29 21:02:41 tgl "Fix array_out's failure to backslash backslashes, per bug# 524. Also, remove brain-dead rule that double quotes areneeded if and only if the datatype is pass-by-reference; neither direction of the implication holds water. Instead,examine the actual data string to see if it contains any characters that force us to quote it. Add some documentationabout quoting of array values, which was previously explained nowhere AFAICT." The older code quoted any "pass by ref" types, ie varlena's as opposed to ints or floats. Which was perhaps clumsy, but at least was predictable. That is, if it was a char or text type you could assume it was quoted, eg { "foo", "bar", "foo bar", "foo.bar", "foo,bar" }. The new code goes to the trouble of scanning the string for embedded commas, curlys, backslashes and whitespace. If it finds some it arranges for quotes, otherwise it suppresses the quotes anywhere it is possible not to use them. There is now no easy way for a client to know what the output will look like as it now depends on the specific data values, eg: { foo, bar, "foo bar", foo.bar, "foo,bar", "foo\"bar" } This pretty much breaks any client that does not have a scanner at least as intelligent as the array_in() scanner. Which includes most perl programs, and pretty much all shell scripts using psql and arrays. Since Jdbc did not support arrays well until very recently, I suspect it broke a few hand written java array scanners too. It did ours. Finally, I simply don't see the need for this change. There is no pressing world shortage of double-quote characters, so there seems little reason to conserve them. Especially at the cost of breaking existing clients and complicating future client code (exercise: write a shell function to scan the new style of array result correctly). I view this change as a bug and would like to see it backed out. <rant> We monitor the hackers list, test with most betas, read the release notes carefully. And were completely blindsided when our production apps fell over after we upgraded from 7.2b2 to the release. Grrrr.... So right now we have reverted to our previous version and deferred our our upgrade until we decide whether to patch out this change (and maintain another patch) or to rewrite some of our client code (at least we have source so we have this option, some don't). I would really like to hear that I have persuaded you that it is a bug and will be fixed in a future release. Alternatively that you plan not to slipstream client visible formatting changes into late betas. Especially without any mention in the release notes. I really wish one could upgrade from release to release of postgres without a db reload, or rewriting client software, and especially not without both. Grrrr.... Grrrr.... </rant> Regards -dg -- David Gould dg@nextbus.com (or davidg@dnai.com) Criminal: A person with predatory instincts who has not sufficient capital to form a corporation. - Howard Scott
David Gould <dg@nextbus.com> writes: > Somewhere after 7.2b2 (it looks like for 7.2b4) a change was made to > array_out() in: I will take the blame for that. > I view this change as a bug and would like to see it backed out. The old behavior was certainly broken and I will not accept a proposal to back out the change entirely. What you are really saying is that you'd prefer the choice of quotes or no quotes to be driven by the datatype rather than by the data value. That's a legitimate gripe, but where shall we get the knowledge of whether the datatype might sometimes emit strings that need quoting? Using the pass-by-reference flag is *completely* wrong; the fact that it chanced not to fail in your application does not make it less wrong. The only way I could see to make the behavior totally predictable at the datatype level (while not being broken) is to always quote every array element. However, that would likely break some other people's oversimplified parsers, so I'm not convinced it's a net win. Perhaps you should fix your application code rather than relying on a never-documented behavior. regards, tom lane
On Fri, Feb 08, 2002 at 12:29:46AM -0500, Tom Lane wrote: > David Gould <dg@nextbus.com> writes: > > Somewhere after 7.2b2 (it looks like for 7.2b4) a change was made to > > array_out() in: > > I will take the blame for that. Well, it would have been nice if it had been in the release notes... > > I view this change as a bug and would like to see it backed out. > > The old behavior was certainly broken and I will not accept a proposal > to back out the change entirely. I am not defending the old behaviour, just the fact that it _was_ predictable. It was simple to know which types are pass by ref and it is simple to allow for it. I am not saying that pass by ref is the right criteria, just that predictable is good. > What you are really saying is that you'd prefer the choice of quotes or > no quotes to be driven by the datatype rather than by the data value. Yes. I think it is not excessive to insist that types have stable, predicatable representations. The other types do, why should arrays be even more special? > That's a legitimate gripe, but where shall we get the knowledge of whether > the datatype might sometimes emit strings that need quoting? Using the Hmmm, maybe the type designers should deal with this. And if they don't, well quote em. Or, you don't even need the quotes, you could just promise never to insert white space and to always escape embedded commas and curlys. So a dumb client could simply split on un-escaped commas and be done. I could live with that as long as it never changed again, at least without some warning. > pass-by-reference flag is *completely* wrong; the fact that it chanced > not to fail in your application does not make it less wrong. "chanced not to fail". Nice. > The only way I could see to make the behavior totally predictable at > the datatype level (while not being broken) is to always quote every > array element. However, that would likely break some other people's Fine with me. That is what it did before. > oversimplified parsers, so I'm not convinced it's a net win. Perhaps > you should fix your application code rather than relying on a > never-documented behavior. "relying on a never-documented behavior" ..., so it is now my fault there is no documentation? Heh. Besides, this isn't even strictly true, in the html documentation (doc/arrays.html "chapter 6") that ships with 7.2 there is a nice example: SELECT schedule[1:2][1:1] FROM sal_emp WHERE name = 'Bill'; schedule -------------------- {{"meeting"},{""}} (1 row) which shows quotes around the strings. Likewise, the doc quotes strings in every example of an insert. Further, it says: "Observe that to write an array value, we enclose the element values within curly braces and separate them by commas. Ifyou know C, this is not unlike the syntax for initializing structures." Which suggests that the syntax should be similar to C. Which requires strings to be quoted. And, since the "undocumented behaviour" has been there since the dawn of time, or at least since: src/backend/utils/adt/arrayfuncs.c Revision 1.1, Tue Jul 9 06:22:03 1996 UTC (5 years, 7 months ago) by scrappy it is likely that _not_ _changing_ _it_ would not break any of the last six years worth of existing "other people's oversimplified parsers". ... breathes ... I don't mean to be rude about this, I have a lot of respect for postgres, all of the postgres developers and the overall process. But to slip a client visible change late in a beta cycle to a specific format that has been stable since UC Berkeley freed the code, and then to suggest that is just a some silly user relying on "never-documented behavior" is almost comical. Seriously, one point of a database is to insulate client applications from the exact representation and layout of the data. Which is not accomplished by making arbitrary changes to simple things like strings that make them take a yards and yards of code to parse. I think a clearly explained rule about what types are quoted (and either qoted or not quoted, not sometimes this and sometimes that) would be a nice addition to the documentation, especially if the code then was made to match it. And it was in the release notes. But, as it stands, I still see a bug. -dg -- David Gould dg@nextbus.com (or davidg@dnai.com) 'Some people, when confronted with a problem, think "I know, I'll useregular expressions". Now they have two problems.' -- Jamie Zawinski
David Gould <dg@nextbus.com> writes: > Yes. I think it is not excessive to insist that types have stable, > predicatable representations. The other types do, why should arrays be > even more special? The representation is stable and predictable. You're simply hoping to avoid building smarts into your parser for it. Unfortunately, some degree of smarts are *necessary* if you are going to deal with array items containing arbitrary text. I can hardly believe that a client program that can deal with backslash-escapes is going to have trouble removing quotes. > Or, you don't even need the quotes, you could just promise never > to insert white space and to always escape embedded commas and curlys. No, we can't, because that would break applications that rely on the existing rules for array input: leading whitespace is insignificant unless quoted. Besides, weren't you complaining because the quotes disappeared? The above variant would still break your code. > So a dumb client could simply split on un-escaped commas and be done. I hardly think that a client that can tell the difference between an escaped comma and an un-escaped one qualifies as "dumb". We could perhaps dispense with quotes on output if we escaped leading spaces. For example, instead of" foo" emit\ foo I don't think this is a step forward in readability, though. And increased reliance on backslashes instead of double quotes won't really make anyone's life easier --- for example, you'd have to remember to double them when sending the same value back to the SQL parser. >> The only way I could see to make the behavior totally predictable at >> the datatype level (while not being broken) is to always quote every >> array element. > Fine with me. That is what it did before. No, it has never done that. In particular, I do not wish to change the longstanding no-quotes behavior for arrays of integers. That *would* break other people's code. (One of the things I hoped to accomplish with this change is to extend the same no-quotes behavior to floats and numerics.) > But to slip a client visible change late in a beta cycle to a specific > format that has been stable since UC Berkeley freed the code, It's been broken since Berkeley, too; the fact that no one complained till a month or two ago just indicates how little arrays are used, IMHO. I doubt you'd be any less annoyed no matter when in the development cycle we'd done this. I do agree that it'd be better if this had been called out in the release notes. We don't currently have any process for ensuring that minor incompatibilities get noted in the release notes. Bruce makes up the notes based on scanning the CVS logs after the fact, and if he misses the significance of an entry, it's missed. Maybe we can do better than that --- adding an entry to a release-notes-to-be file when the change is made might be more reliable. It's also true that the SGML documentation is sadly deficient on this point; but then, its discussion of arrays is overly terse in just about every respect. Someone want to volunteer to expand it? > Seriously, one point of a database is to insulate client applications > from the exact representation and layout of the data. Which is not > accomplished by making arbitrary changes to simple things like strings > that make them take a yards and yards of code to parse. Properly parsing arrays of text values is going to require dealing with backslash-escapes in any case; seems to me that that's what will take "yards and yards" of code. Stripping off optional quotes is trivial by comparison. On the other hand, parsing arrays of integers is pretty trivial since you know there are no escapable characters anywhere. I don't favor pushing complexity out of the one case and into the other. I'm willing to consider the output-no-quotes-at-all approach if people think that's a superior solution. Comments anyone? regards, tom lane
Tom Lane wrote: <snip> > > I'm willing to consider the output-no-quotes-at-all approach if people > think that's a superior solution. Comments anyone? Keeping the quotes seems better, purely because of readability. i.e. " xyzzy " is better than ,\ xyzzy\ , or similar. Regards and best wishes, Justin Clift > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 3: if posting/reading through Usenet, please send an appropriate > subscribe-nomail command to majordomo@postgresql.org so that your > message can get through to the mailing list cleanly -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
The issue is that changing a user interface will break people's code. It is clear that a user interface change was made and it is also clear that it broke people's code. In any organization that is a bug. I did this job of gating server user interfaces for commercial postgres based systems for several years. It is not always a nice job. The need to carefully ensure that interfaces were (almost) always backward compatible was given, as was the necessity of highlighting any unavoidable changes in interfaces so that the users would be impacted in the smallest possible way. In this case, the container aspect of the datatype requires more studied parsing that most others with or without the changes. Also, the array handling routines in jdbc were only completed in 7.2, so we can assume that java users had to do their own parsing of the representation of this container type. The counter argument that there were no complaints is not particularly relevent since few people have been using the beta versions and have been waiting for 7.2 to stabilize since it requires a significant effort to do the initdb. It is safe to say that the masses will be a bit slowere on the adoption of an initdb release than others. You all have been stellar at supporting postgresql and responding to users in very helpful and immediate ways. We have benefited directly from that. Keep up the good work by not changing user interfaces for anything other than unavoidable circumstances. elein Tom Lane wrote: > David Gould <dg@nextbus.com> writes: > >>Yes. I think it is not excessive to insist that types have stable, >>predicatable representations. The other types do, why should arrays be >>even more special? >> > > The representation is stable and predictable. You're simply hoping to > avoid building smarts into your parser for it. Unfortunately, some > degree of smarts are *necessary* if you are going to deal with array > items containing arbitrary text. I can hardly believe that a client > program that can deal with backslash-escapes is going to have trouble > removing quotes. > > >>Or, you don't even need the quotes, you could just promise never >>to insert white space and to always escape embedded commas and curlys. >> > > No, we can't, because that would break applications that rely on the > existing rules for array input: leading whitespace is insignificant > unless quoted. Besides, weren't you complaining because the quotes > disappeared? The above variant would still break your code. > > >>So a dumb client could simply split on un-escaped commas and be done. >> > > I hardly think that a client that can tell the difference between an > escaped comma and an un-escaped one qualifies as "dumb". > > We could perhaps dispense with quotes on output if we escaped leading > spaces. For example, instead of > " foo" > emit > \ foo > I don't think this is a step forward in readability, though. And > increased reliance on backslashes instead of double quotes won't really > make anyone's life easier --- for example, you'd have to remember to > double them when sending the same value back to the SQL parser. > > >>>The only way I could see to make the behavior totally predictable at >>>the datatype level (while not being broken) is to always quote every >>>array element. >>> > >>Fine with me. That is what it did before. >> > > No, it has never done that. In particular, I do not wish to change the > longstanding no-quotes behavior for arrays of integers. That *would* > break other people's code. (One of the things I hoped to accomplish > with this change is to extend the same no-quotes behavior to floats and > numerics.) > > >>But to slip a client visible change late in a beta cycle to a specific >>format that has been stable since UC Berkeley freed the code, >> > > It's been broken since Berkeley, too; the fact that no one complained > till a month or two ago just indicates how little arrays are used, IMHO. > I doubt you'd be any less annoyed no matter when in the development > cycle we'd done this. > > I do agree that it'd be better if this had been called out in the > release notes. We don't currently have any process for ensuring that > minor incompatibilities get noted in the release notes. Bruce makes up > the notes based on scanning the CVS logs after the fact, and if he > misses the significance of an entry, it's missed. Maybe we can do > better than that --- adding an entry to a release-notes-to-be file when > the change is made might be more reliable. > > It's also true that the SGML documentation is sadly deficient on this > point; but then, its discussion of arrays is overly terse in just about > every respect. Someone want to volunteer to expand it? > > >>Seriously, one point of a database is to insulate client applications >>from the exact representation and layout of the data. Which is not >>accomplished by making arbitrary changes to simple things like strings >>that make them take a yards and yards of code to parse. >> > > Properly parsing arrays of text values is going to require dealing with > backslash-escapes in any case; seems to me that that's what will take > "yards and yards" of code. Stripping off optional quotes is trivial > by comparison. On the other hand, parsing arrays of integers is pretty > trivial since you know there are no escapable characters anywhere. > I don't favor pushing complexity out of the one case and into the other. > > I'm willing to consider the output-no-quotes-at-all approach if people > think that's a superior solution. Comments anyone? > > regards, tom lane > > -- -------------------------------------------------------- elein@nextbus.com (510)420-3120 www.nextbus.comspinning to infinity, hallelujah --------------------------------------------------------