Thread: charin(), text_char() should return something else for empty input
I have been chasing Domingo Alvarez Duarte's report of funny behavior when assigning an empty string to a "char" variable in plpgsql. What it comes down to is that text-to-char conversion does not behave very well for zero-length input. charin() returns a null character, leading to the following bizarreness: regression=# select 'z' || (''::"char") || 'q';?column? ----------z (1 row) regression=# select length('z' || (''::"char") || 'q');length -------- 3 (1 row) The concatenation result is 'z\0q', which doesn't print nicely :-(. text_char() produces a completely random result, eg: regression=# select ''::text::"char";?column? ----------~ (1 row) and could even coredump in the worst case, since it tries to fetch the first character of the text input no matter whether there is one or not. I propose that both of these operations should return a space character for an empty input string. This is by analogy to space-padding as you'd get with char(1). Any objections? regards, tom lane
I wrote: > I propose that both of these operations should return a space character > for an empty input string. This is by analogy to space-padding as you'd > get with char(1). Any objections? An alternative approach is to make charin and text_char map empty strings to the null character (\0), and conversely make charout and char_text map the null character to empty strings. charout already acts that way, in effect, since it has to produce a null-terminated C string. This way would have the advantage that there would still be a reversible dump and reload representation for a "char" field containing '\0', whereas space-padding would cause such a field to become ' ' after reload. But it's a little strange if you think that "char" ought to behave the same as char(1). Comments? regards, tom lane
At 02:37 PM 5/28/01 -0400, Tom Lane wrote: >I wrote: >> I propose that both of these operations should return a space character >> for an empty input string. This is by analogy to space-padding as you'd >> get with char(1). Any objections? > >An alternative approach is to make charin and text_char map empty >strings to the null character (\0), and conversely make charout and >char_text map the null character to empty strings. charout already >acts that way, in effect, since it has to produce a null-terminated >C string. This way would have the advantage that there would still >be a reversible dump and reload representation for a "char" field >containing '\0', whereas space-padding would cause such a field to >become ' ' after reload. But it's a little strange if you think that >"char" ought to behave the same as char(1). > >Comments? I personally wouldn't expect "char" to behave exactly as "char(1)", because I understand it to be a one-byte variable which holds a single (not zero or one) character. Mapping '' to ' ' doesn't make a lot of sense to me. It isn't what I'd expect. I think the behavior you describe in this note is better. - Don Baccus, Portland OR <dhogaza@pacifier.com> Nature photos, on-line guides, Pacific Northwest Rare Bird Alert Serviceand other goodies at http://donb.photo.net.
Don Baccus <dhogaza@pacifier.com> writes: > Mapping '' to ' ' doesn't make a lot of sense to me. It isn't what > I'd expect. > I think the behavior you describe in this note is better. I'm coming to that conclusion as well. If you look closely, both charin() and charout() act that way already; so the second proposal boils down to making the text <=> char conversion functions act in accordance with the way that char's I/O conversions already act. That seems a less drastic change than altering both I/O and conversion behavior. regards, tom lane
Re: Re: charin(), text_char() should return something else for empty input
From
ncm@zembu.com (Nathan Myers)
Date:
On Mon, May 28, 2001 at 02:37:32PM -0400, Tom Lane wrote: > I wrote: > > I propose that both of these operations should return a space character > > for an empty input string. This is by analogy to space-padding as you'd > > get with char(1). Any objections? > > An alternative approach is to make charin and text_char map empty > strings to the null character (\0), and conversely make charout and > char_text map the null character to empty strings. charout already > acts that way, in effect, since it has to produce a null-terminated > C string. This way would have the advantage that there would still > be a reversible dump and reload representation for a "char" field > containing '\0', whereas space-padding would cause such a field to > become ' ' after reload. But it's a little strange if you think that > "char" ought to behave the same as char(1). Does the standard require any particular behavior in with NUL characters? I'd like to see PG move toward treating them as ordinary control characters. I realize that at best it will take a long time to get there. C is irretrievably mired in the "NUL is a terminator" swamp, but SQL isn't C. Nathan Myers ncm@zembu.com
Re: Re: charin(), text_char() should return something else for empty input
From
Peter Eisentraut
Date:
Nathan Myers writes: > Does the standard require any particular behavior in with NUL > characters? The standard describes the behaviour of the character types in terms of character sets. This decouples glyphs, encoding, and storage. So theoretically you could (AFAICT) define a character set that encodes some meaningful character with code zero, but the implementation is not required to handle this zero byte internally, it could catch it during input and represent it with an escape code. The standard also defines some possible "built-in" character sets, such as LATIN1 and UTF16. Most of these do not naturally contain a character that is encoded with the zero byte. In the case of the ISO8BIT/ASCII_FULL charset, the standard explicitly says that the zero byte is not contained in the character set. In general, I don't see a point in accepting a zero byte in character strings. If you want to store binary data there are binary data types (or effort could be invested in them). -- Peter Eisentraut peter_e@gmx.net http://funkturm.homeip.net/~peter
Peter Eisentraut <peter_e@gmx.net> writes: > In general, I don't see a point in accepting a zero byte in character > strings. If you want to store binary data there are binary data types (or > effort could be invested in them). If we were starting in a green field then I'd think it worthwhile to maintain null-byte-cleanness for the textual datatypes. At this point, though, the amount of pain involved seems to vastly outweigh the value. The major problem is that I/O conventions not based on null-terminated strings would break all existing user-defined datatypes. (Changing our own code is one thing, breaking users' code is something else.) There are minor-by-comparison problems like not being able to use strcoll() for locale-sensitive comparisons anymore... I agree with Peter that spending some extra effort on bytea and/or similar types is probably a more profitable answer. regards, tom lane