Re: The "char" type versus non-ASCII characters - Mailing list pgsql-hackers

From Tom Lane
Subject Re: The "char" type versus non-ASCII characters
Date
Msg-id 2954046.1638733913@sss.pgh.pa.us
Whole thread Raw
In response to Re: The "char" type versus non-ASCII characters  (Chapman Flack <chap@anastigmatix.net>)
Responses Re: The "char" type versus non-ASCII characters
List pgsql-hackers
Chapman Flack <chap@anastigmatix.net> writes:
> On 12/05/21 12:01, Tom Lane wrote:
>> regression=# select '\'::bytea;
>> ERROR:  invalid input syntax for type bytea
>> 
>> which would be incompatible with "char"'s existing behavior.  But as
>> long as we don't do that, I'd be okay with having high-bit-set char
>> values map to backslash-followed-by-three-octal-digits, which is
>> what bytea escape format would produce.

> Is that a proposal to change nothing about the current treatment
> of values < 128, or just to avoid rejecting bare '\'?

I intended to change nothing about charin's treatment of ASCII
characters, nor anything about bytea's behavior.  I don't think
we should relax the error checks in the latter.  That does mean
that backslash becomes a problem for the idea of transparent
conversion from char to bytea or vice versa.  We could think
about emitting backslash as '\\' in charout, I suppose.  I'm
not really convinced though that bytea compatibility is worth
changing a case that's non-problematic today.

> If there's a way to factor out and reuse the good parts of byteain,
> that would mean '\\' would also be accepted to mean a backslash,
> and the \r \n \t usual escapes would be accepted too, and \ooo and
> \xhh.

Uh, what?

regression=# select '\n'::bytea;
ERROR:  invalid input syntax for type bytea

But I doubt that sharing code here would be worth the trouble.
The vast majority of byteain is concerned with managing the
string length, which is a nonissue for charin.

> I think it ends up being no more complexity at all, because a single
> octet in bytea-hex form looks like \xhh, which is exactly what
> a single \xhh in bytea-escape form looks like.

I'm confused by this statement too.  AFAIK the alternatives in
bytea are \xhh or \ooo:

regression=# select '\xEE'::bytea;
 bytea 
-------
 \xee
(1 row)

regression=# set bytea_output to escape;
SET
regression=# select '\xEE'::bytea;
 bytea 
-------
 \356
(1 row)

            regards, tom lane



pgsql-hackers by date:

Previous
From: Noah Misch
Date:
Subject: Re: enable certain TAP tests for MSVC builds
Next
From: Daniel Gustafsson
Date:
Subject: Re: MSVC SSL test failure