Re: BUG #16570: Collation not working - Mailing list pgsql-bugs

From Daniel Verite
Subject Re: BUG #16570: Collation not working
Date
Msg-id 118c09c5-3334-4570-8644-7c5300fbc0f2@manitou-mail.org
Whole thread Raw
In response to Re: BUG #16570: Collation not working  (Arnaud Perrier <arnaud.perrier@gmail.com>)
Responses Re: BUG #16570: Collation not working
List pgsql-bugs
    Arnaud Perrier wrote:

> 5/ SELECT * FROM TBL ORDER BY TEXT1 COLLATE digitslast, TEXT2 COLLATE
> digitslast;
> 5/ KO = TEXT1 = digits after letters + TEXT2 = digits before letters

I'm seeing wrong results too, and I don't really understand what's going
on with this collation.

Here's a simpler test with the current dev branch (14devel)
and ICU 60.2-3ubuntu.

=# CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');

=# select * from (values('1'),('a')) as list(x) order by x collate
digitslast;
 x
---
 a
 1

The digit is sorted after the letter, OK.

But then why would an inequality test report the opposite?

=# select 'a' < '1' collate digitslast;
 ?column?
----------
 f

I'm wondering whether there is a discrepancy between the binary
sort keys and results of direct comparisons by ucol_strcoll()
for this collation.
Maybe that could explain why the second level of ORDER BY
produces results that are inconsistent with the first level.

With icu_ext we can visualize the sort keys and see that key('a')
if lower than key('1').

=# select icu_sort_key('a' collate digitslast) as sort_key_a,
  icu_sort_key('1' collate digitslast) as sort_key_1

   sort_key_a  |  sort_key_1
--------------+--------------
 \x0f01050105 | \x4901050105

But then a direct comparison says the opposite, just like the
'<' operator as above.

=# select icu_compare('a', '1', 'en@colReorder=latn-digit');
 icu_compare
-------------
       1

icu_compare() is a simple interface on top of ucol_strcollUTF8()
or ucol_strcoll(), and 1 as a result maps to UCOL_GREATER,  so unless
I'm missing something, this is looking more like an ICU bug than a
Postgres bug.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: Index not used without explicit typecast
Next
From: Jan Kort
Date:
Subject: Re: Index not used without explicit typecast