Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows - Mailing list pgsql-bugs

From Jehan-Guillaume de Rorthais
Subject Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
Date
Msg-id 20200903105727.064665ce@firost
Whole thread Raw
In response to Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows  (Peter Eisentraut <peter.eisentraut@2ndquadrant.com>)
Responses Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
List pgsql-bugs
On Thu, 3 Sep 2020 10:26:03 +0200
Peter Eisentraut <peter.eisentraut@2ndquadrant.com> wrote:

> On 2020-09-03 09:41, Daniel Verite wrote:
> >     Jehan-Guillaume de Rorthais wrote:
> >   
> >> Maybe Daniel has some more experience feedback with other customizations  
> > 
> > No, I've just tried various other reorderings, and didn't find any other
> > that seems to have the same bug as latn-digit.
> > My tests consisted of indexing a large corpus of text and running the
> > index through amcheck.  
> 
> In this case I'm tempted to just leave it alone and write it off as a 
> bug in ICU.  We could potentially inspect the collator object at CREATE 
> COLLATION time and issues warnings if we find something we know to be buggy.
> 
> I don't think we want to make our code uglier and slower

It's not that uglier, only slower. And maybe we could wrap the logic inside
some dedicated func/macro checking for versions, etc.

> for normal uses to work around a bug in some niche feature in ICU.

Well, indeed, I was wondering in another thread if we should fix it or
document it.

However, raising some WARNING doesn't seem enough as we would effectively leave
the user create a buggy collation and maybe corrupted index on top of it. *If*
we choose this way, I would vote for an ERROR.

However, as I wrote earlier, we have no hard evidence latn-digit is the only
buggy customization with ICU. Even if there is very little probability, we
might have to pile up some more tests about versions, customization, etc. As
instance, we would have to exclude latn-digit, but not latn-digit-kn, for
some ICU versions, etc, etc... until proven otherwise. Code maintenance for
each new version of ICU might become boring.

But maybe I am being silly while planing on some unknown things and ICU is only
affected for latn-digit?

I really have no strong feeling right now about the best solution to adopt.
However, I feel the least to do would be document it somewhere with a lot of
strong emphasis.

Regards,



pgsql-bugs by date:

Previous
From: Daniel Gustafsson
Date:
Subject: Re: Download page of Postgres not working
Next
From: Magnus Hagander
Date:
Subject: Re: Download page of Postgres not working