Home > mailing lists

Re: [rfc] unicode escapes for extended strings - Mailing list pgsql-hackers

From	Marko Kreen
Subject	Re: [rfc] unicode escapes for extended strings
Date	April 18, 2009 12:29:09
Msg-id	e51f66da0904180529t2bf46458ga9df7909ab2aca78@mail.gmail.com Whole thread Raw
In response to	Re: [rfc] unicode escapes for extended strings (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [rfc] unicode escapes for extended strings
List	pgsql-hackers

Tree view

On 4/18/09, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Sam Mason <sam@samason.me.uk> writes:
>  > On Fri, Apr 17, 2009 at 07:01:47PM +0200, Martijn van Oosterhout wrote:
>  >> On Fri, Apr 17, 2009 at 07:07:31PM +0300, Marko Kreen wrote:
>  >>> Btw, is there any good reason why we don't reject \000, \x00
>  >>> in text strings?
>  >>
>  >> Why forbid nulls in text strings?
>
>  > As far as I know, PG assumes, like most C code, that strings don't
>  > contain embedded NUL characters.
>
>
> Yeah; we should reject them because nothing will behave very sensibly
>  with them, eg
>
>  regression=# select E'abc\000xyz';
>   ?column?
>  ----------
>   abc
>  (1 row)
>
>  The point has come up before, and I kinda thought we *had* changed the
>  lexer to reject \000.  I see we haven't though.  Curiously, this
>  does fail:
>
>  regression=# select U&'abc\0000xyz';
>  ERROR:  invalid byte sequence for encoding "SQL_ASCII": 0x00
>  HINT:  This error can also happen if the byte sequence does not match the encoding expected by the server, which is
controlledby "client_encoding". 
>
>  though that's not quite the message I'd have expected to see.

I think that's because out verifier actually *does* reject \0,
only problem is that \0 does not set saw_high_bit flag,
so the verifier simply does not get executed.
But U& executes it always.

unicode=# SELECT e'\xc3\xa4';?column?
----------ä
(1 row)

unicode=# SELECT e'\xc3\xa4\x00';
ERROR:  invalid byte sequence for encoding "UTF8": 0x00
HINT:  This error can also happen if the byte sequence does not match
the encoding expected by the server, which is controlled by
"client_encoding".

Heh.

--
marko

pgsql-hackers by date:

From: Tom Lane
Date: 18 April 2009, 12:17:02
Subject: Re: Patch for 8.5, transformationHook

From: Tom Lane
Date: 18 April 2009, 12:32:18
Subject: Re: [GENERAL] Performance of full outer join in 8.3

Re: [rfc] unicode escapes for extended strings - Mailing list pgsql-hackers

Previous

Next