Re: Support LIKE with nondeterministic collations - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: Support LIKE with nondeterministic collations
Date
Msg-id bce1da9a-2975-462f-8946-c58c878ba82c@iki.fi
Whole thread Raw
In response to Re: Support LIKE with nondeterministic collations  ("Daniel Verite" <daniel@manitou-mail.org>)
List pgsql-hackers
On 04/11/2024 10:26, Peter Eisentraut wrote:
> On 29.10.24 18:15, Jacob Champion wrote:
>> libfuzzer is unhappy about the following code in MatchText:
>>
>>> +            while (p1len > 0)
>>> +            {
>>> +                if (*p1 == '\\')
>>> +                {
>>> +                    found_escape = true;
>>> +                    NextByte(p1, p1len);
>>> +                }
>>> +                else if (*p1 == '_' || *p1 == '%')
>>> +                    break;
>>> +                NextByte(p1, p1len);
>>> +            }
>>
>> If the pattern ends with a backslash, we'll call NextByte() twice,
>> p1len will wrap around to INT_MAX, and we'll walk off the end of the
>> buffer. (I fixed it locally by duplicating the ERROR case that's
>> directly above this.)
> 
> Thanks.  Here is an updated patch with that fixed.

Sadly the algorithm is O(n^2) with non-deterministic collations.Is there 
any way this could be optimized? We make no claims on how expensive any 
functions or operators are, so I suppose a slow implementation is 
nevertheless better than throwing an error.

Let's at least add some CHECK_FOR_INTERRUPTS(). For example, this takes 
a very long time and is uninterruptible:

  SELECT repeat('x', 100000) LIKE '%xxxy%' COLLATE ignore_accents;

-- 
Heikki Linnakangas
Neon (https://neon.tech)




pgsql-hackers by date:

Previous
From: "Hayato Kuroda (Fujitsu)"
Date:
Subject: RE: Parallel heap vacuum
Next
From: Ashutosh Bapat
Date:
Subject: Re: logical replication: restart_lsn can go backwards (and more), seems broken since 9.4