Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
Date
Msg-id 436944.1764708826@sss.pgh.pa.us
Whole thread Raw
In response to Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation  (Laurenz Albe <laurenz.albe@cybertec.at>)
Responses Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation
List pgsql-bugs
Laurenz Albe <laurenz.albe@cybertec.at> writes:
> On Tue, 2025-12-02 at 12:25 -0500, Tom Lane wrote:
>> You need to rearrange the loop logic so that we won't attempt to
>> increment test_end that last time through.  Perhaps a for-loop
>> isn't the best way to write it.

> Right.  The attached patch v3 turns it into a while loop to avoid
> the problem.

Looking at the code overall, I wonder if the outer loop doesn't have
the same issue.  The comments claim that we should be able to handle
zero-length matches, but if the overall haystack is of length zero,
we will fail to check for such a match.

Also, since we have haystack <= haystack_end as a starting condition,
I think both loops could omit the initial test.  I'd be inclined
to code them like

    test_ptr = start point;
    for (;;)
    {
        ...
        if (test_ptr >= haystack_end)
            break;
        test_ptr += pg_mblen(test_ptr);
    }

On the other hand ... is that comment really right about zero-length
match being possible?  If it is, the API for this function is in
need of redesign, because callers that try to find "the next match"
would go into an infinite loop re-finding the same zero-length
match over and over.

            regards, tom lane



pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #19340: Wrong result from CORR() function
Next
From: Laurenz Albe
Date:
Subject: Re: BUG #19341: REPLACE() fails to match final character when using nondeterministic ICU collation