Re: strpos behavior change around empty substring in PG12 - Mailing list pgsql-hackers

From Shay Rojansky
Subject Re: strpos behavior change around empty substring in PG12
Date
Msg-id CADT4RqDQUhGuVsa-1fUP0s25fd+vJW8Mrh9mQPfP6hAP8iDqbw@mail.gmail.com
Whole thread Raw
In response to Re: strpos behavior change around empty substring in PG12  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Thanks for the quick turnaround!

Tom Lane <tgl@sss.pgh.pa.us> schrieb am Mo., 28. Okt. 2019, 16:57:
Robert Haas <robertmhaas@gmail.com> writes:
> On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:
>> Before PG12, select strpos('test', '') returns 1 (empty substring found at first position of the string), whereas starting with PG12 it returns 0 (empty substring not found).

> It looks to me like this got broken here:

> commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
> Date:   Fri Jan 25 16:25:05 2019 +0200
>     Use single-byte Boyer-Moore-Horspool search even with multibyte encodings.

> Not sure what happened exactly.

I think the problem is lack of clarity about the edge cases.
The patch added this short-circuit right at the top of text_position():

+   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
+       return 0;

and as this example shows, that's the Wrong Thing.  Fortunately,
it also seems easily fixed.

                        regards, tom lane

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Join Correlation Name
Next
From: Tom Lane
Date:
Subject: Re: Remove one use of IDENT_USERNAME_MAX