Home > mailing lists

Re: like/ilike improvements - Mailing list pgsql-hackers

From	Andrew Dunstan
Subject	Re: like/ilike improvements
Date	May 25, 2007 00:21:51
Msg-id	4656563F.50608@dunslane.net Whole thread Raw
In response to	Re: like/ilike improvements (Tom Lane <tgl@sss.pgh.pa.us>)
List	pgsql-hackers

Tree view

Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Tom Lane wrote:
>>     
>>> You have to be on a first byte before you can meaningfully apply
>>> NextChar, and you have to use NextChar or else you don't count
>>> characters correctly (eg "__" must match 2 chars not 2 bytes).
>>>       
>
>   
>> Yes, I agree completely. However it looks to me like IsFirstByte will in 
>> fact always be true when we get to call NextChar for matching "_" for UTF8.
>>     
>
> If that's true, the patch is failing to achieve its goal of treating %
> bytewise ...
>   

Let's back up. % processing works by looking for a place in the text 
that might match what follows % in the pattern, and then calling itself 
recursively. For UTF8, if what follows % is _, it does that search by 
repeatedly calling NextChar - otherwise it calls NextByte. But if we're 
not processing a wildcard we have to match an actual complete UTF8 char, 
so the fact that we proceed byte-wise won't get us out of sync. whenever 
we happen to encounter an _. We can't rely on that process for other 
multi-byte charsets because the suffix of one char might be the prefix 
of another, so we could get false matches. That can't happen with UTF8.

cheers

andrew

pgsql-hackers by date:

From: Tom Lane
Date: 25 May 2007, 00:21:37
Subject: Re: like/ilike improvements

From: Andrew Dunstan
Date: 25 May 2007, 00:34:25
Subject: Re: like/ilike improvements

Re: like/ilike improvements - Mailing list pgsql-hackers

Previous

Next