Re: Backpatching of "Teach the regular expression functions to do case-insensitive matching" - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Backpatching of "Teach the regular expression functions to do case-insensitive matching"
Date
Msg-id BANLkTinX+SKFkrFfmDAF_MjgGf88YSoeCQ@mail.gmail.com
Whole thread Raw
In response to Re: Backpatching of "Teach the regular expression functions to do case-insensitive matching"  (Andres Freund <andres@anarazel.de>)
Responses Re: Backpatching of "Teach the regular expression functions to do case-insensitive matching"  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
On Fri, May 6, 2011 at 9:22 AM, Andres Freund <andres@anarazel.de> wrote:
> On Friday, May 06, 2011 04:30:01 AM Robert Haas wrote:
>> On Thu, May 5, 2011 at 5:21 AM, Andres Freund <andres@anarazel.de> wrote:
>> > In my opinion this is actually a bug in < 9.0. As its a (imo) low impact
>> > fix thats constrained to two files it seems sensible to backpatch it now
>> > that the solution has proven itself in the field?
>> > The issue is hard to find and has come up several times in the field. And
>> > it has been slightly embarassing more than once ;)
>> Can you share some more details about your experiences?
> About the embarassing or hard to find part?
>
> One of the hard to find part parts involved a search (constraining word order
> after a tsearch search) where slightly fewer than usual search results were
> returned in production.
> Nobody had noticed during testing that case insensitive search worked for most
> things except multibyte chars as the tested case was something like: SELECT
> 'ÖFFENTLICHKEIT' ~* 'Öffentlichkeit' and the regex condition was only relevant
> when searching for multiple words.
>
> One of the emarassing examples was that I suggested moving away from a
> solution using several ILIKE rules to one case insenitive regular expression.
> Totally forgetting that I knew that this was only fixed in 9.0. This turned out
> to be faster. And it turned out to be wrong. In production :-(.
>
>
> Both sum up that the problem is often not noticed as most of the people
> realizing that that case could be a problem don't have a knowledge of the
> content and don't notice the problem until later...

After mulling this over a bit more, I guess I''m a little skeptical of
back-patching this because it is clearly a behavior change.  It seems
unlikely, but not impossible, that someone is relying on the current
behavior, and changing it in a minor release might be considered
unfriendly.

On the flip side, the risk of it flat-out blowing up seems pretty
small.  For someone to invent their own version of wchar_t that uses
something other than Unicode code points would be pretty much pure
masochism, wouldn't it?

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: GSoC 2011 - New phpPgAdmin Plugin Architecture
Next
From: Dan Ports
Date:
Subject: Re: patch: fix race in SSI's CheckTargetForConflictsIn