Home > mailing lists

Re: Implementing full UTF-8 support (aka supporting 0x00) - Mailing list pgsql-hackers

From	Craig Ringer
Subject	Re: Implementing full UTF-8 support (aka supporting 0x00)
Date	August 4, 2016 00:22:32
Msg-id	CAMsr+YF8ua27YmYJkOD_o+DwU_mUDjEZss9Nua9s-Wo2Qs2MOw@mail.gmail.com Whole thread
In response to	Re: Implementing full UTF-8 support (aka supporting 0x00) (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses	Re: Implementing full UTF-8 support (aka supporting 0x00)
List	pgsql-hackers

Tree view

On 4 August 2016 at 05:00, Thomas Munro <thomas.munro@enterprisedb.com> wrote:

On Thu, Aug 4, 2016 at 5:16 AM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 3 August 2016 at 22:54, Álvaro Hernández Tortosa <aht@8kdata.com> wrote:
>> What would it take to support it? Isn't the varlena header propagated
>> everywhere, which could help infer the real length of the string? Any
>> pointers or suggestions would be welcome.
>
>
> One of the bigger pain points is that our interaction with C library
> collation routines for sorting uses NULL-terminated C strings. strcoll,
> strxfrm, etc.

That particular bit of the problem would go away if this ever happened:

https://wiki.postgresql.org/wiki/Todo:ICU

ucoll_strcoll takes explicit lengths (though optionally accepts -1 for
null terminated mode).

http://userguide.icu-project.org/strings#TOC-Using-C-Strings:-NUL-Terminated-vs.-Length-Parameters

Yep, it does. But we've made little to no progress on integration of ICU support and AFAIK nobody's working on it right now.

I wonder how MySQL implements their collation and encoding support?

Craig Ringer http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

From: Simon Riggs
Date: 04 August 2016, 00:16:29
Subject: Re: Lossy Index Tuple Enhancement (LITE)

From: Michael Paquier
Date: 04 August 2016, 00:24:32
Subject: Re: PostgreSQL 10 kick-off

Re: Implementing full UTF-8 support (aka supporting 0x00) - Mailing list pgsql-hackers

Previous

Next