Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows - Mailing list pgsql-bugs

From Jehan-Guillaume de Rorthais
Subject Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows
Date
Msg-id 20200613004322.3bcc37f6@firost
Whole thread Raw
In response to Re: BUG #15285: Query used index over field with ICU collation insome cases wrongly return 0 rows  (Jehan-Guillaume de Rorthais <jgdr@dalibo.com>)
Responses Re: BUG #15285: Query used index over field with ICU collation in some cases wrongly return 0 rows
List pgsql-bugs
On Fri, 12 Jun 2020 18:40:55 +0200
Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:

> On Wed, 10 Jun 2020 00:29:33 +0200
> Jehan-Guillaume de Rorthais <jgdr@dalibo.com> wrote:
> [...]
> > After playing with ICU regression tests, I found functions ucol_strcollIter
> > and ucol_nextSortKeyPart are safe. I'll do some performance tests and report
> > here.
> 
> I did some benchmarks. See attachment for the script and its header to
> reproduce.
> 
> It sorts 935895 french phrases from 0 to 122 chars with an average of 49.
> Performance tests were done on current master HEAD (buggy) and using the patch
> in attachment, relying on ucol_strcollIter.
> 
> My preliminary test with ucol_getSortKey was catastrophic, as we might
> expect. 15-17x slower than the current HEAD. So I removed it from actual
> tests. I didn't try with ucol_nextSortKeyPart though.
> 
> Using ucol_strcollIter performs ~20% slower than HEAD on UTF8 databases, but
> this might be acceptable. Here are the numbers:
> 
>    DB Encoding   HEAD  strcollIter   ratio
>    UTF8          2.74         3.27   1.19x
>    LATIN1        5.34         5.40   1.01x
> 
> I plan to add a regression test soon.

Please, find in attachment the second version of the patch, with a
regression test.

Regards,

Attachment

pgsql-bugs by date:

Previous
From: raf
Date:
Subject: Re: Potential G2-item cycles under serializable isolation
Next
From: Thomas Munro
Date:
Subject: Re: Potential G2-item cycles under serializable isolation