Home > mailing lists

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) - Mailing list pgsql-bugs

From	Tom Lane
Subject	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date	March 22, 2016 07:11:04
Msg-id	19477.1458619852@sss.pgh.pa.us Whole thread Raw
In response to	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) (Peter Geoghegan <pg@heroku.com>)
Responses	Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
List	pgsql-bugs

Tree view

Peter Geoghegan <pg@heroku.com> writes:
> At one point, Robert wrote a small self-contained tool to show OS
> strxfrm() blobs:
> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com

> It would be great if you showed us the output for your test case
> strings, both on an affected and on an unaffected system.

On RHEL6, I get

./strxfrm-binary de_DE.UTF-8 'eai' 'e aÃ'
"eai" -> 100c140108080801020202 (11 bytes)
"e aÃ" -> 100c140108080901020202010235 (14 bytes)

This seems a bit problematic, because these string sort in the other
order ("e aÃ" before "eai") according to sort(1) as well as Postgres
sorting code.

It's possible I've copied-and-pasted these multibyte characters wrong.
But if I haven't, this says that the strxfrm-based optimization is
unusably broken on a very large fraction of reasonably-modern
installations.  Quite aside from casting aspersions on the glibc guys,
how did we fail to notice this in our own testing?

            regards, tom lane

pgsql-bugs by date:

From: Peter Geoghegan
Date: 22 March 2016, 07:04:23
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

From: Peter Geoghegan
Date: 22 March 2016, 08:16:39
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)

Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) - Mailing list pgsql-bugs

Previous

Next