Home > mailing lists

Re: UTF8MatchText - Mailing list pgsql-patches

From	Andrew Dunstan
Subject	Re: UTF8MatchText
Date	May 21, 2007 13:34:32
Msg-id	46519FDF.5070302@dunslane.net Whole thread Raw
In response to	Re: UTF8MatchText (db@zigo.dhs.org)
Responses	Re: UTF8MatchText (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-patches

Tree view


db@zigo.dhs.org wrote:
>> Doh, you're right ... but on third thought, what happens with a pattern
>> containing "%_"?  If % tries to advance bytewise then we'll be trying to
>> apply NextChar in the middle of a data character, and bad things ensue.
>>
>
> Right, when you have '_' after a '%' you need to make sure the '%'
> advances full characters. In my suggestion the test if '_' (or '\') come
> after the '%' is done once and it select which of the two loops to use,
> the one that do byte stepping or the one with NextChar.
>
> It's difficult to know for sure that we have thought about all the corner
> cases. I hope the gain is worth the effort.. :-)
>
>
>

Yes, I came to the same conclusion about how to restructure the code.

The current code contains this:

            while (tlen > 0)
            {
                /*
                 * Optimization to prevent most recursion: don't recurse
                 * unless first pattern char might match this text char.
                 */
                if (CHAREQ(t, p) || (*p == '\\') || (*p == '_'))
                {
                    int         matched = MatchText(t, tlen, p, plen);

                    if (matched != LIKE_FALSE)
                        return matched; /* TRUE or ABORT */
                }

                NextChar(t, tlen);
            }


The code appears to date from v 1.23 of like.c way back in 2001. I'm not
sure I agree with the comment, though. In the first place, the invariant
tests should not be in the loop, I think, and I'll hoist them out as
Dennis suggests. But why are we doing that CHAREQ? If it succeeds we'll
just do it again when we recurse, I think.

cheers

andrew

pgsql-patches by date:

From: Gregory Stark
Date: 21 May 2007, 13:22:06
Subject: Re: Concurrent psql patch

From: Tom Lane
Date: 21 May 2007, 13:44:30
Subject: Re: UTF8MatchText

Re: UTF8MatchText - Mailing list pgsql-patches

Previous

Next