Re: Support POSITION with nondeterministic collations - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Support POSITION with nondeterministic collations
Date
Msg-id 6107daa2-5cf7-4cf2-a526-626be1d15b18@eisentraut.org
Whole thread Raw
In response to Re: Support POSITION with nondeterministic collations  ("Euler Taveira" <euler@eulerto.com>)
List pgsql-hackers
On 12.02.25 05:31, Euler Taveira wrote:
> On Mon, Dec 2, 2024, at 6:09 AM, Peter Eisentraut wrote:
>> On 26.08.24 08:09, Peter Eisentraut wrote:
>> > This patch allows using text position search functions with
>> > nondeterministic collations.  These functions are
>> >
>> > - position, strpos
>> > - replace
>> > - split_part
>> > - string_to_array
>> > - string_to_table
>> >
>> > which all use common internal infrastructure.
>>
>> > Some exploratory testing could be useful here.  The present test
>> > coverage was already quite helpful during development, but there is
>> > always the possibility that something was overlooked.
> 
> I took a look at this patch.
> 
>           * Most callers will require "greedy" semantics, meaning that 
> we need
>           * to find the longest such substring, not the shortest.  For 
> callers
>           * don't don't need greedy semantics, we can finish on the first
> 
> s/don't don't/that don't/ ?

fixed

> -   Assert(len1 > 0);
>      Assert(len2 > 0);
> 
> Is there a reason to remove this assert?

len1 is the length of the "haystack".  The previous code did not call 
text_position_setup() with an empty haystack, due to early exits at

/* Empty needle always matches at position 1 */

and

/* Otherwise, can't match if haystack is shorter than needle */

but the second one does not apply for nondeterministic collations, so we 
have to deal with with zero-length haystacks.

>       * (With nondeterministic collations, the search was already
>       * multibyte-aware, so we don't need this.)
> 
> s/was/is/

fixed

> The commit title could be changed to reflect that you are adding support for
> multiple functions. The POSITION gives the impression that it is only 
> for the
> position() function. Something like
> 
>      Support position search functions with nondeterministic collations

fixed

> I did a couple of tests (some are shown below) and I didn't find issues.

Thanks for the extra tests.  I have committed this with the adjustments 
mentioned above.




pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Missing [NO] INDENT flag in XMLSerialize backward parsing
Next
From: Rucha Kulkarni
Date:
Subject: Doubts regarding pg_freespacemap extension