Home > mailing lists

Re: UTF8MatchText - Mailing list pgsql-patches

From	Andrew Dunstan
Subject	Re: UTF8MatchText
Date	May 20, 2007 10:28:48
Msg-id	46504CFD.5040505@dunslane.net Whole thread Raw
In response to	Re: UTF8MatchText (Dennis Bjorklund <db@zigo.dhs.org>)
Responses	Re: UTF8MatchText
List	pgsql-patches

Tree view

Dennis Bjorklund wrote:
> Tom Lane skrev:
>> You could imagine trying to do
>> % a byte at a time (and indeed that's what I'd been thinking it did)
>> but that gets you out of sync which breaks the _ case.
>
> It is only when you have a pattern like '%_' when this is a problem
> and we could detect this and do byte by byte when it's not. Now we
> check (*p == '\\') || (*p == '_') in each iteration when we scan over
> characters for '%', and we could do it once and have different loops
> for the two cases.
>
> Other than this part that I think can be optimized I don't see
> anything wrong with the idea behind the patch. To make the '%' case
> fast might be an important optimization for a lot of use cases. It's
> not uncommon that '%' matches a bigger part of the string than the
> rest of the pattern.
>

Are you sure? The big remaining char-matching bottleneck will surely be
in the code that scans for a place to start matching a %. But that's
exactly where we can't use byte matching for cases where the charset
might include AB and BA as characters - the pattern might contain %BA
and the string AB. However, this isn't a danger for UTF8, which leads me
to think that we do indeed need a special case for UTF8, but for a
different improvement from that proposed in the original patch. I'll
post an updated patch shortly.

cheers

andrew

pgsql-patches by date:

From: "Henry B. Hotz"
Date: 20 May 2007, 10:15:04
Subject: Re: Preliminary GSSAPI Patches

From: Heikki Linnakangas
Date: 20 May 2007, 10:50:36
Subject: Re: Seq scans status update

Re: UTF8MatchText - Mailing list pgsql-patches

Previous

Next