Arnaud Perrier wrote:
> 5/ SELECT * FROM TBL ORDER BY TEXT1 COLLATE digitslast, TEXT2 COLLATE
> digitslast;
> 5/ KO = TEXT1 = digits after letters + TEXT2 = digits before letters
I'm seeing wrong results too, and I don't really understand what's going
on with this collation.
Here's a simpler test with the current dev branch (14devel)
and ICU 60.2-3ubuntu.
=# CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');
=# select * from (values('1'),('a')) as list(x) order by x collate
digitslast;
x
---
a
1
The digit is sorted after the letter, OK.
But then why would an inequality test report the opposite?
=# select 'a' < '1' collate digitslast;
?column?
----------
f
I'm wondering whether there is a discrepancy between the binary
sort keys and results of direct comparisons by ucol_strcoll()
for this collation.
Maybe that could explain why the second level of ORDER BY
produces results that are inconsistent with the first level.
With icu_ext we can visualize the sort keys and see that key('a')
if lower than key('1').
=# select icu_sort_key('a' collate digitslast) as sort_key_a,
icu_sort_key('1' collate digitslast) as sort_key_1
sort_key_a | sort_key_1
--------------+--------------
\x0f01050105 | \x4901050105
But then a direct comparison says the opposite, just like the
'<' operator as above.
=# select icu_compare('a', '1', 'en@colReorder=latn-digit');
icu_compare
-------------
1
icu_compare() is a simple interface on top of ucol_strcollUTF8()
or ucol_strcoll(), and 1 as a result maps to UCOL_GREATER, so unless
I'm missing something, this is looking more like an ICU bug than a
Postgres bug.
Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite