Home > mailing lists

Re: chr() is still too loose about UTF8 code points - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: chr() is still too loose about UTF8 code points
Date	May 16, 2014 17:52:51
Msg-id	16802.1400262763@sss.pgh.pa.us Whole thread Raw
In response to	Re: chr() is still too loose about UTF8 code points (Noah Misch <noah@leadboat.com>)
Responses	Re: chr() is still too loose about UTF8 code points
List	pgsql-hackers

Tree view

Noah Misch <noah@leadboat.com> writes:
> On Fri, May 16, 2014 at 11:05:08AM -0400, Tom Lane wrote:
>> I think this probably means we need to change chr() to reject code points
>> above 10ffff.  Should we back-patch that, or just do it in HEAD?

> The compatibility risks resemble those associated with the fixes for bug
> #9210, so I recommend HEAD only:

> http://www.postgresql.org/message-id/flat/20140220043940.GA3064539@tornado.leadboat.com

While I'd be willing to ignore that risk so far as code points above
10ffff go, if we want pg_utf8_islegal to be happy then we will also
have to reject surrogate-pair code points.  It's not beyond the realm
of possibility that somebody is intentionally generating such code
points with chr(), despite the dump/reload hazard.  So now I agree
that this is sounding more like a major-version-only behavioral change.
        regards, tom lane

pgsql-hackers by date:

From: Noah Misch
Date: 16 May 2014, 17:39:18
Subject: Re: chr() is still too loose about UTF8 code points

From: Andres Freund
Date: 16 May 2014, 17:55:02
Subject: %d in log_line_prefix doesn't work for bg/autovacuum workers

Re: chr() is still too loose about UTF8 code points - Mailing list pgsql-hackers

Previous

Next