Home > mailing lists

Re: Windows UTF-8, non-ICU collation trouble - Mailing list pgsql-hackers

From	Thomas Munro
Subject	Re: Windows UTF-8, non-ICU collation trouble
Date	December 6, 2019 06:56:08
Msg-id	CA+hUKGLOaGoj8ZmwUyaMwysvXdE+LZcTYhsrmPq3r6wBK37EUA@mail.gmail.com Whole thread
In response to	Windows UTF-8, non-ICU collation trouble (Noah Misch <noah@leadboat.com>)
Responses	Re: Windows UTF-8, non-ICU collation trouble
List	pgsql-hackers

Tree view

On Fri, Dec 6, 2019 at 7:34 PM Noah Misch <noah@leadboat.com> wrote:
> We use system UTF-16 collation to implement UTF-8 collation on Windows.  The
> PostgreSQL security team received a report, from Timothy Kuun, that this
> collation does not uphold the "symmetric law" and "transitive law" that we
> require for btree operator classes.  The attached test program demonstrates
> this.  http://www.delphigroups.info/2/62/478610.html quotes reports of that
> problem going back eighteen years.  Most code points are unaffected.  Indexing
> an affected code point using such a collation can cause btree index scans to not
> find a row they should find and can make a UNIQUE or PRIMARY KEY constraint
> admit a duplicate.  The security team determined that this doesn't qualify as a
> security vulnerability, but it's still a bug.

Huh.  Does this apply in modern times?  Since Windows 10, I thought
they adopted[1] CLDR data to drive that, the same definitions used (or
somewhere in the process of being adopted by) GNU, Illumos, FreeBSD
etc.  Basically, everyone gave up on trying to own this rats nest of a
problem and deferred to the experts.  If you can still get
index-busting behaviour out of modern Windows collations, wouldn't
that be a bug that someone can file against SQL Server, Windows etc
and get fixed?

[1] https://blogs.msdn.microsoft.com/shawnste/2015/08/29/locale-data-in-windows-10-cldr/

pgsql-hackers by date:

From: Rushabh Lathia
Date: 06 December 2019, 06:35:19
Subject: Re: backup manifests

From: Noah Misch
Date: 06 December 2019, 07:33:49
Subject: Re: Windows UTF-8, non-ICU collation trouble

Re: Windows UTF-8, non-ICU collation trouble - Mailing list pgsql-hackers

Previous

Next