Thread: BUG #15285: Query used index over field with ICU collation in somecases wrongly return 0 rows

BUG #15285: Query used index over field with ICU collation in somecases wrongly return 0 rows

From

PG Bug reporting form

Date:

19 July 2018, 19:03:05

The following bug has been logged on the website:

Bug reference:      15285
Logged by:          Roman Lytovchenko
Email address:      roman.lytovchenko@gmail.com
PostgreSQL version: 10.4
Operating system:   fedora
Description:

How to reproduce:
CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');
CREATE TABLE t (b CHAR(4) NOT NULL COLLATE digitslast);
insert into t select '0000' from generate_series (0, 1000) as f(x);
insert into t select '0001' from generate_series (0, 1000) as f(x);
insert into t select 'ABCD' from generate_series (0, 1000) as f(x);

create index i on t(b);
select * from t where b = '0000' ;
-- 0 rows, and this is a bug

explain analyze select * from t where b = '0000' ;
Index Only Scan using i on t  (cost=0.28..41.80 rows=1001 width=5) (actual
time=0.045..0.045 rows=0 loops=1)
  Index Cond: (b = '0000'::bpchar)
  Heap Fetches: 0
Planning time: 0.146 ms
Execution time: 0.080 ms

drop index i;
select * from t where b = '0000' ;
-- 1001 rows

So, select version();
PostgreSQL 10.4 on x86_64-pc-linux-gnu, compiled by gcc (GCC) 8.0.1 20180324
(Red Hat 8.0.1-0.20), 64-bit
$cat /proc/version 
Linux version 3.10.0-514.10.2.el7.x86_64 (builder@kbuilder.dev.centos.org)
(gcc version 4.8.5 20150623 (Red Hat 4.8.5-11) (GCC) ) #1 SMP Fri Mar 3
00:04:05 UTC 2017

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Peter Geoghegan

Date:

19 July 2018, 19:44:32

On Thu, Jul 19, 2018 at 9:03 AM, PG Bug reporting form
<noreply@postgresql.org> wrote:
> How to reproduce:
> CREATE COLLATION digitslast (provider = icu, locale =
> 'en@colReorder=latn-digit');
> CREATE TABLE t (b CHAR(4) NOT NULL COLLATE digitslast);
> insert into t select '0000' from generate_series (0, 1000) as f(x);
> insert into t select '0001' from generate_series (0, 1000) as f(x);
> insert into t select 'ABCD' from generate_series (0, 1000) as f(x);

I can confirm the bug on the master branch:

pg@~[25013]=# select bt_index_parent_check('i');
ERROR:  item order invariant violated for index "i"
DETAIL:  Lower index tid=(3,3) (points to index tid=(4,23)) higher
index tid=(3,4) (points to index tid=(5,98)) page lsn=0/169CD78.

It looks like a problem with char(n) abbreviated keys not agreeing
with B-Tree support function 1 for the same opclass. "ABCD" appears
before "0000" and "0001" in the index, which seems like the expected
behavior:

pg@~[25013]=# select * from bt_page_items('i', 3);
 itemoffset │   ctid   │ itemlen │ nulls │ vars │          data
────────────┼──────────┼─────────┼───────┼──────┼─────────────────────────
          1 │ (1,0)    │       8 │ f     │ f    │
          2 │ (2,109)  │      16 │ f     │ t    │ 0b 41 42 43 44 00 00 00
          3 │ (4,23)   │      16 │ f     │ t    │ 0b 41 42 43 44 00 00 00
          4 │ (5,98)   │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          5 │ (6,12)   │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          6 │ (7,152)  │      16 │ f     │ t    │ 0b 30 30 30 30 00 00 00
          7 │ (8,66)   │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
          8 │ (9,206)  │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
          9 │ (10,120) │      16 │ f     │ t    │ 0b 30 30 30 31 00 00 00
(9 rows)

(This is the root page.)

It appears that the main support function 1 routine disagrees with the
CREATE INDEX sort order, which is wrong. I'll try to isolate the
problem a bit further.

--
Peter Geoghegan

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Peter Geoghegan

Date:

20 July 2018, 02:26:49

On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> It appears that the main support function 1 routine disagrees with the
> CREATE INDEX sort order, which is wrong. I'll try to isolate the
> problem a bit further.

As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
with this digitslast collation, which ucol_nextSortKeyPart() fails to
be bug-compatible with. Other similar customized collations (e.g.
'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
way.)

I'm using libicu60. What version are you using, Roman?

I tried to find something that matches this on the ICU bug tracker.
This might be a match: https://ssl.icu-project.org/trac/ticket/12518

-- 
Peter Geoghegan

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Thomas Munro

Date:

21 July 2018, 04:39:12

On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <pg@bowt.ie> wrote:
>> It appears that the main support function 1 routine disagrees with the
>> CREATE INDEX sort order, which is wrong. I'll try to isolate the
>> problem a bit further.
>
> As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
> with this digitslast collation, which ucol_nextSortKeyPart() fails to
> be bug-compatible with. Other similar customized collations (e.g.
> 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
> way.)
>
> I'm using libicu60. What version are you using, Roman?
>
> I tried to find something that matches this on the ICU bug tracker.
> This might be a match: https://ssl.icu-project.org/trac/ticket/12518

FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports.

-- 
Thomas Munro
http://www.enterprisedb.com

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

09 June 2020, 22:29:33

Hello,

I didn't find any other discussion related to this bug, neither on pgsql-bugs
or pgsql-hackers. Hopefully, this is the best thread to give some update.

On Sat, 21 Jul 2018 13:39:12 +1200
Thomas Munro <thomas.munro@enterprisedb.com> wrote:

> On Fri, Jul 20, 2018 at 11:26 AM, Peter Geoghegan <pg@bowt.ie> wrote:
> > On Thu, Jul 19, 2018 at 9:44 AM, Peter Geoghegan <pg@bowt.ie> wrote:  
> >> It appears that the main support function 1 routine disagrees with the
> >> CREATE INDEX sort order, which is wrong. I'll try to isolate the
> >> problem a bit further.  
> >
> > As far as I can tell, this is an ICU bug. ucol_strcollUTF8() is buggy
> > with this digitslast collation, which ucol_nextSortKeyPart() fails to
> > be bug-compatible with. Other similar customized collations (e.g.
> > 'en-u-kf-upper') work fine. (Ugh, that's familiar in an unpleasant
> > way.)
> >
> > I'm using libicu60. What version are you using, Roman?
> >
> > I tried to find something that matches this on the ICU bug tracker.
> > This might be a match: https://ssl.icu-project.org/trac/ticket/12518  
> 
> FWIW I see the same result with icu 61.1 and 62.1_1 from FreeBSD ports.

Some colleagues hit this bug as well last week and reported it to me. I can
reproduce this bug with ICU current master branch, version post 67.1.

I wrote a regression test for icu4c and posted it on ICU-12518. See:
https://unicode-org.atlassian.net/browse/ICU-12518

As Peter wrote, ucol_strcollUTF8 (and ucol_strcoll) functions are affected. A
quick and dirty patch to replace ucol_strcoll* by ucol_getSortKey/strcmp
everywhere fixed the bug for my tests.

After playing with ICU regression tests, I found functions ucol_strcollIter
and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
here.

In the meantime, I've been working on various workarounds. The only one I found
is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
Unfortunately, the two collations are not equivalent, but I believe it might be
useful in many case.

I've been working on a second workaround: creating a type (a char variant for
our usecase), its operators and opfamily. All operators and function 1 relies
on ucol_getSortKey. Most of the workaround works good but surprisingly, the
sort order is only enforced if the field is in the first position:

  * this works: "SORT BY f1 COLLATE digitslast"
  * this fails: "SORT BY f2, f1 COLLATE digitslast"

I hadn't time to investigate further on this last topic.

Regards,

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

12 June 2020, 16:40:55

On Wed, 10 Jun 2020 00:29:33 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
[...]
> After playing with ICU regression tests, I found functions ucol_strcollIter
> and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> here.

I did some benchmarks. See attachment for the script and its header to
reproduce.

It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
Performance tests were done on current master HEAD (buggy) and using the patch
in attachment, relying on ucol_strcollIter.

My preliminary test with ucol_getSortKey was catastrophic, as we might
expect. 15-17x slower than the current HEAD. So I removed it from actual tests.
I didn't try with ucol_nextSortKeyPart though.

Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
this might be acceptable. Here are the numbers:

   DB Encoding   HEAD  strcollIter   ratio
   UTF8          2.74         3.27   1.19x
   LATIN1        5.34         5.40   1.01x

I plan to add a regression test soon.

> In the meantime, I've been working on various workarounds. The only one I
> found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> Unfortunately, the two collations are not equivalent, but I believe it might
> be useful in many case.
> 
> I've been working on a second workaround: creating a type (a char variant for
> our usecase), its operators and opfamily. All operators and function 1 relies
> on ucol_getSortKey. Most of the workaround works good but surprisingly, the
> sort order is only enforced if the field is in the first position:
> 
>   * this works: "SORT BY f1 COLLATE digitslast"
>   * this fails: "SORT BY f2, f1 COLLATE digitslast"

I fixed this. I didn't declare my opclass as default for the type I created.
I'm not sure people would like to see/discuss this user workaround here?

Regards,

Attachment

Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

12 June 2020, 22:43:22

On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:

> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> [...]
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.
> 
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
> 
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
> 
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
> 
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
> 
>    DB Encoding   HEAD  strcollIter   ratio
>    UTF8          2.74         3.27   1.19x
>    LATIN1        5.34         5.40   1.01x
> 
> I plan to add a regression test soon.

Please, find in attachment the second version of the patch, with a
regression test.

Regards,

Attachment

v2-0001-Replace-buggy-ucol_strcoll-funcs-with-ucol_strcollIt.patch

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Geoghegan

Date:

12 August 2020, 04:14:03

On Fri, Jun 12, 2020 at 3:43 PM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
> Please, find in attachment the second version of the patch, with a
> regression test.

Is it possible to fix this by making the existing
HAVE_UCOL_STRCOLLUTF8 test more conservative about ICU version? IOW,
by making it only use ucol_strcollUTF8() on versions that are known to
not be affected by this bug?

This related ICU bug describes an issue affecting only versions 53/54:

https://unicode-org.atlassian.net/browse/ICU-11388

Why not just broaden the existing HAVE_UCOL_STRCOLLUTF8 workaround to
recognize that the related functionality is broken on my versions of
ICU than initially suspected?

-- 
Peter Geoghegan

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

18 August 2020, 16:02:46

Hi Peter,

On Tue, 11 Aug 2020 21:14:03 -0700
Peter Geoghegan <pg@bowt.ie> wrote:

> On Fri, Jun 12, 2020 at 3:43 PM Jehan-Guillaume de Rorthais
> <jgdr@dalibo.com> wrote:
> > Please, find in attachment the second version of the patch, with a
> > regression test.
> 
> Is it possible to fix this by making the existing
> HAVE_UCOL_STRCOLLUTF8 test more conservative about ICU version? IOW,
> by making it only use ucol_strcollUTF8() on versions that are known to
> not be affected by this bug?

I might be missing something, but according to my tests back in June, the bug
exists in both ucol_strcollUTF8()/ucol_strcoll() and is still affecting the very
last version of ICU (67.1).

That's why my patch replaces both functions altogether using ucol_strcollIter
as replacement.

> This related ICU bug describes an issue affecting only versions 53/54:
> 
> https://unicode-org.atlassian.net/browse/ICU-11388

This bug is related to ICU4J, not ICU4C. AS far as I understand, this was
related to a bad variable type when porting the code to java. Do I miss
something?

Regards,

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

"Daniel Verite"

Date:

20 August 2020, 11:43:22

    Jehan-Guillaume de Rorthais wrote:

> I might be missing something, but according to my tests back in
> June, the bug exists in both ucol_strcollUTF8()/ucol_strcoll() and
> is still affecting the very last version of ICU (67.1).

Yes. This what I've seen as well when investigating bug #16570
a couple weeks ago, before you pointed out it was the same bug:

https://www.postgresql.org/message-id/16570-58cc04e1a6ef3c3f%40postgresql.org

In that thread I've tried with ICU-60.2-3ubuntu but the results are identical
with 67.1.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Geoghegan

Date:

01 September 2020, 23:01:22

On Tue, Aug 18, 2020 at 9:02 AM Jehan-Guillaume de Rorthais
<jgdr@dalibo.com> wrote:
> I might be missing something, but according to my tests back in June, the bug
> exists in both ucol_strcollUTF8()/ucol_strcoll() and is still affecting the very
> last version of ICU (67.1).
>
> That's why my patch replaces both functions altogether using ucol_strcollIter
> as replacement.

I see. I misunderstood.

> > This related ICU bug describes an issue affecting only versions 53/54:
> >
> > https://unicode-org.atlassian.net/browse/ICU-11388
>
> This bug is related to ICU4J, not ICU4C. AS far as I understand, this was
> related to a bad variable type when porting the code to java. Do I miss
> something?

That was based on a comment from TracBot on the bug tracker page you
linked to. Clearly it's totally unrelated, though. I jumped the gun.

-- 
Peter Geoghegan

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Eisentraut

Date:

02 September 2020, 12:06:18

On 2020-06-10 00:29, Jehan-Guillaume de Rorthais wrote:
> In the meantime, I've been working on various workarounds. The only one I found
> is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> Unfortunately, the two collations are not equivalent, but I believe it might be
> useful in many case.

What precisely is broken in the ICU library?  All the examples so far 
refer to kr-latn-digit.  Are all reorderings broken, or something 
specifically related to latn and/or digit?  Are any collation 
customizations other than reorderings affected?

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

02 September 2020, 12:55:50

On Wed, 2 Sep 2020 14:06:18 +0200
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2020-06-10 00:29, Jehan-Guillaume de Rorthais wrote:
> > In the meantime, I've been working on various workarounds. The only one I
> > found is to use "fr-u-kr-latn-digit-kn" instead of "fr-u-kr-latn-digit".
> > Unfortunately, the two collations are not equivalent, but I believe it
> > might be useful in many case.  
> 
> What precisely is broken in the ICU library? 

Using ucol_strcoll/ucol_strcollUTF8 with a custom collation sorting digits after
latn.

> All the examples so far refer to kr-latn-digit.  Are all reorderings broken,
> or something specifically related to latn and/or digit? 

I don't know. So far, I only found a couple of reports (mine included) using
kr-latn-digit in different languages. And as I wrote, kr-latn-digit-kn doesn't
seem affected. So all reorderings might not be broken.

But I have no strong facts about this, just tests.

> Are any collation customizations other than reorderings affected?

I didn't poke around to try some other random customizations. The answer lies
somewhere in the ICU codebase. I suppose we'll be able to answer this question
as soon as the bug will be explained.

However, the bug reported here are all about sorting: wrong result order and/or
wrong result because of badly sorted index.

Maybe Daniel has some more experience feedback with other customizations as he
seems to work extensively with ICU and PostgreSQL?

Regards,

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

"Daniel Verite"

Date:

03 September 2020, 07:41:51

    Jehan-Guillaume de Rorthais wrote:

> Maybe Daniel has some more experience feedback with other customizations

No, I've just tried various other reorderings, and didn't find any other that
seems to have the same bug as latn-digit.
My tests consisted of indexing a large corpus of text and running the
index through amcheck.


Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Eisentraut

Date:

03 September 2020, 08:26:03

On 2020-09-03 09:41, Daniel Verite wrote:
>     Jehan-Guillaume de Rorthais wrote:
> 
>> Maybe Daniel has some more experience feedback with other customizations
> 
> No, I've just tried various other reorderings, and didn't find any other that
> seems to have the same bug as latn-digit.
> My tests consisted of indexing a large corpus of text and running the
> index through amcheck.

In this case I'm tempted to just leave it alone and write it off as a 
bug in ICU.  We could potentially inspect the collator object at CREATE 
COLLATION time and issues warnings if we find something we know to be buggy.

I don't think we want to make our code uglier and slower for normal uses 
to work around a bug in some niche feature in ICU.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

03 September 2020, 08:57:27

On Thu, 3 Sep 2020 10:26:03 +0200
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2020-09-03 09:41, Daniel Verite wrote:
> >     Jehan-Guillaume de Rorthais wrote:
> >   
> >> Maybe Daniel has some more experience feedback with other customizations  
> > 
> > No, I've just tried various other reorderings, and didn't find any other
> > that seems to have the same bug as latn-digit.
> > My tests consisted of indexing a large corpus of text and running the
> > index through amcheck.  
> 
> In this case I'm tempted to just leave it alone and write it off as a 
> bug in ICU.  We could potentially inspect the collator object at CREATE 
> COLLATION time and issues warnings if we find something we know to be buggy.
> 
> I don't think we want to make our code uglier and slower

It's not that uglier, only slower. And maybe we could wrap the logic inside
some dedicated func/macro checking for versions, etc.

> for normal uses to work around a bug in some niche feature in ICU.

Well, indeed, I was wondering in another thread if we should fix it or
document it.

However, raising some WARNING doesn't seem enough as we would effectively leave
the user create a buggy collation and maybe corrupted index on top of it. *If*
we choose this way, I would vote for an ERROR.

However, as I wrote earlier, we have no hard evidence latn-digit is the only
buggy customization with ICU. Even if there is very little probability, we
might have to pile up some more tests about versions, customization, etc. As
instance, we would have to exclude latn-digit, but not latn-digit-kn, for
some ICU versions, etc, etc... until proven otherwise. Code maintenance for
each new version of ICU might become boring.

But maybe I am being silly while planing on some unknown things and ICU is only
affected for latn-digit?

I really have no strong feeling right now about the best solution to adopt.
However, I feel the least to do would be document it somewhere with a lot of
strong emphasis.

Regards,

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

"Daniel Verite"

Date:

03 September 2020, 09:29:15

    Jehan-Guillaume de Rorthais wrote:

> I really have no strong feeling right now about the best solution to adopt.
> However, I feel the least to do would be document it somewhere with a lot of
> strong emphasis.

Right now https://www.postgresql.org/docs/devel/collation.html
includes this example:

<quote>
CREATE COLLATION digitslast (provider = icu, locale = 'en-u-kr-latn-digit');
CREATE COLLATION digitslast (provider = icu, locale =
'en@colReorder=latn-digit');

    Sort digits after Latin letters. (The default is digits before letters.)
</quote>

Now that we know that this collation is problematic, we could remove
this example, even if we don't want to go as far as documenting
ICU bugs. In fact bug reports used the same name "digitslast", so
I wonder if people tried this straight from our doc.

Best regards,
--
Daniel Vérité
PostgreSQL-powered mailer: https://www.manitou-mail.org
Twitter: @DanielVerite

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

07 September 2020, 13:27:39

On Thu, 03 Sep 2020 13:49:24 -0400
Tom Lane <tgl@sss.pgh.pa.us> wrote:

>     =?UTF-8?Q?=D0=BE=D0=B2=D1=87=D0=B5=D0=BD=D0=BA=D0=BE" ?=
>     <roman.lytovchenko@gmail.com>,
>     "PostgreSQL mailing lists" <pgsql-bugs@lists.postgresql.org>
> Subject: Re: BUG #15285: Query used index over field with ICU collation in
> some cases wrongly return 0 rows In-reply-to:
> <c00a63d3-f9c3-4222-a659-637232523b30@manitou-mail.org> References:
> <c00a63d3-f9c3-4222-a659-637232523b30@manitou-mail.org> Comments: In-reply-to
> "Daniel Verite" <daniel@manitou-mail.org> message dated "Thu, 03 Sep 2020
> 11:29:15 +0200" Fcc: inbox
> --------

Something broke in this answer, so I try to hook it back to the appropriate
thread.

> "Daniel Verite" <daniel@manitou-mail.org> writes:
> > Now that we know that this collation is problematic, we could remove
> > this example, even if we don't want to go as far as documenting
> > ICU bugs. In fact bug reports used the same name "digitslast", so
> > I wonder if people tried this straight from our doc.  
> 
> If we aren't going to try to work around the bug, I agree that
> removing that example (or replacing it with a less buggy one?)
> is a good idea.

OK.
Please, find a patch in attachment. It removes the buggy collation from doc and
adapt existing ones to keep an example of combination of rules.

> I tend to agree with Peter that trying to work around a bug that
> isn't ours and that we don't fully understand is not going to
> be very productive.  What is the argument, other than observation
> of a small number of test cases, that these other subroutines
> don't have bugs of their own?

What about adding it as a "known bug"/"will not fix" in
https://wiki.postgresql.org/wiki/Todo and link it from the doc in a note bloc? I
strongly feel most user do not know where to find such list of bugs in
PostgreSQL ecosystem.

Regards,

Attachment

v1-0001-doc-remove-buggy-ICU-collation-from-documentation.patch

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Eisentraut

Date:

08 September 2020, 14:36:23

On 2020-09-07 15:27, Jehan-Guillaume de Rorthais wrote:
> Please, find a patch in attachment. It removes the buggy collation from doc and
> adapt existing ones to keep an example of combination of rules.

I agree with this patch in principle.  But perhaps we could keep another 
reordering example, maybe latin/greek?

> What about adding it as a "known bug"/"will not fix" in
> https://wiki.postgresql.org/wiki/Todo and link it from the doc in a note bloc? I
> strongly feel most user do not know where to find such list of bugs in
> PostgreSQL ecosystem.

If you feel more users will make use of the Todo list in the wiki, feel 
free to add something there.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Jehan-Guillaume de Rorthais

Date:

09 September 2020, 16:20:02

On Tue, 8 Sep 2020 16:36:23 +0200
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2020-09-07 15:27, Jehan-Guillaume de Rorthais wrote:
> > Please, find a patch in attachment. It removes the buggy collation from doc
> > and adapt existing ones to keep an example of combination of rules.  
> 
> I agree with this patch in principle.  But perhaps we could keep another 
> reordering example, maybe latin/greek?

Please, find in attachment a patch implementing your suggestion.

Regards,

Attachment

v2-0001-doc-remove-buggy-ICU-collation-from-documentation.patch

Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows

From

Peter Eisentraut

Date:

10 September 2020, 13:45:57

On 2020-09-09 18:20, Jehan-Guillaume de Rorthais wrote:
> On Tue, 8 Sep 2020 16:36:23 +0200
> Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:
> 
>> On 2020-09-07 15:27, Jehan-Guillaume de Rorthais wrote:
>>> Please, find a patch in attachment. It removes the buggy collation from doc
>>> and adapt existing ones to keep an example of combination of rules.
>>
>> I agree with this patch in principle.  But perhaps we could keep another
>> reordering example, maybe latin/greek?
> 
> Please, find in attachment a patch implementing your suggestion.

Committed and backpatched.

-- 
Peter Eisentraut              http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services