Re: strpos behavior change around empty substring in PG12 - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: strpos behavior change around empty substring in PG12
Date
Msg-id 868105f6-c3f7-74fe-4578-54f89eb64753@iki.fi
Whole thread Raw
In response to Re: strpos behavior change around empty substring in PG12  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On 28/10/2019 17:57, Tom Lane wrote:
> Robert Haas <robertmhaas@gmail.com> writes:
>> On Mon, Oct 28, 2019 at 11:02 AM Shay Rojansky <roji@roji.org> wrote:
>>> Before PG12, select strpos('test', '') returns 1 (empty substring found at first position of the string), whereas
startingwith PG12 it returns 0 (empty substring not found).
 
> 
>> It looks to me like this got broken here:
> 
>> commit 9556aa01c69a26ca726d8dda8e395acc7c1e30fc
>> Author: Heikki Linnakangas <heikki.linnakangas@iki.fi>
>> Date:   Fri Jan 25 16:25:05 2019 +0200
>>      Use single-byte Boyer-Moore-Horspool search even with multibyte encodings.
> 
>> Not sure what happened exactly.
> 
> I think the problem is lack of clarity about the edge cases.
> The patch added this short-circuit right at the top of text_position():
> 
> +   if (VARSIZE_ANY_EXHDR(t1) < 1 || VARSIZE_ANY_EXHDR(t2) < 1)
> +       return 0;
> 
> and as this example shows, that's the Wrong Thing.  Fortunately,
> it also seems easily fixed.

Tom fixed this in commit bd1ef5799b; thanks!

To be sure, I also checked the SQL standard for what POSITION('' IN 
'test') is supposed to return. It agrees that 1 is correct:

 > If CHAR_LENGTH(CVE1) is 0 (zero), then the result is 1 (one).

- Heikki




pgsql-hackers by date:

Previous
From: vignesh C
Date:
Subject: Re: Typos and inconsistencies in code
Next
From: Tom Lane
Date:
Subject: Re: Join Correlation Name