Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5) - Mailing list pgsql-bugs

From Robert Haas
Subject Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Date
Msg-id CA+TgmoYkvRZ6BfjRj3b68p2iuwXF5auaR2DLkpFaJj+LpZZu0A@mail.gmail.com
Whole thread Raw
In response to Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)  (Peter Geoghegan <pg@heroku.com>)
Responses Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)
List pgsql-bugs
On Tue, Mar 22, 2016 at 5:09 PM, Peter Geoghegan <pg@heroku.com> wrote:
> On Mon, Mar 21, 2016 at 9:04 PM, Peter Geoghegan <pg@heroku.com> wrote:
>> Can you look at generating a textual representation of the strxfrm()
>> blobs in question, using Robert's tool?:
>>
>> http://www.postgresql.org/message-id/CA+TgmoaOCyQpo8HK9yr6VTuyknWWvqgo7JeXi2kb=gpNveKR+g@mail.gmail.com
>
> I played with this tool myself, on an affected CentOS 6.7 VM:
>
> [vagrant@localhost ~]$ ldd --version
> ldd (GNU libc) 2.12
> Copyright (C) 2010 Free Software Foundation, Inc.
> This is free software; see the source for copying conditions.  There is NO
> warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.
> Written by Roland McGrath and Ulrich Drepper.
>
> I now think that we have this backwards: This isn't a bug in glibc's
> strxfrm(); it's a bug in glibc's strcoll(). Minimal testcase with
> modified tool, simplified to use ascii-safe strings:
>
> [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: 6
>
> If we assume for the sake of argument that this is a strxfrm() bug and
> strcoll() is a reliable source of truth, then I find it very curious
> that Germany's Austrian neighbors differ on this point about how text
> should be collated:
>
> [vagrant@localhost ~]$ ./a.out de_AT.UTF-8 'xxx' 'x xx'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "x xx" -> 2323230108080801020202010235 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> This surely adds doubt to the idea that strxfrm() in particular is broken.
>
> I find something else inconsistent with the strxfrm() theory: even the
> de_DE collation gives strxfrm()/strcoll() self-consistent answers when
> we move the rhs argument's space to the far side of its center 'x'
> char:
>
> [vagrant@localhost ~]$ ./a.out de_DE.UTF-8 'xxx' 'xx x'
> "xxx" -> 2323230108080801020202 (11 bytes)
> "xx x" -> 2323230108080801020202010335 (14 bytes)
> strcmp(arg1, arg2) result: -1
> strcoll(arg1, arg2) result: -1
>
> It seems very unlikely that this is because of a legitimate
> consideration that strcoll() makes about how German should be collated
> (one that strxfrm() fails to make, say).
>
> This is probably a worse situation for affected Postgres systems,
> though, because now they have no scope to turn the faulty part of the
> system off. I have a hard time believing that it's a good idea to
> trust strcoll() to be wrong in a consistent way that has collatable
> type opclasses at least follow "Notes to Operator Class Implementors".
> I'd like to hear more opinions on that, though, because it's a tricky
> thing to reason about.

Well, if we implement a compatibility GUC that shuts off our
dependency on strxfrm(), people can go back to having 9.5 be no more
broken than 9.4 was.  I vote we do that and go home.
Behavior-changing GUCs suck, but it seems clear that Tom is not going
to sit still for any solution that involves blaming the glibc vendor
no matter how well-justified that approach might be; and I don't have
a better idea.  I was a little worried that it was too much to hope
for that all libc vendors on earth would ship a strxfrm()
implementation that was actually consistent with strcoll(), and here
we are.  It's a good thing that operating systems manage to make
read() and getpid() several orders of magnitude more reliable than
strxfrm() and strcoll(), or we'd probably all be running Windows or
VMS or something now.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #14034: Select for update with inner select doesn't return value after committing by other transaction.
Next
From: Tom Lane
Date:
Subject: Re: Missing rows with index scan when collation is not "C" (PostgreSQL 9.5)