Re: like/ilike improvements - Mailing list pgsql-hackers

From Andrew Dunstan
Subject Re: like/ilike improvements
Date
Msg-id 46531FA7.6060904@dunslane.net
Whole thread Raw
In response to Re: like/ilike improvements  (Andrew Dunstan <andrew@dunslane.net>)
Responses Re: like/ilike improvements
Re: like/ilike improvements
List pgsql-hackers

Andrew Dunstan wrote:
>
>
> Tom Lane wrote:
>> Andrew Dunstan <andrew@dunslane.net> writes:
>>  
>>> ... It turns out (according to the analysis) that the only time we 
>>> actually need to use NextChar is when we are matching an "_" in a 
>>> like/ilike pattern.
>>>     
>>
>> I thought we'd determined that advancing bytewise for "%" was also 
>> risky,
>> in two cases:
>>
>> 1. Multibyte character set that is not UTF8 (more specifically, does not
>> have a guarantee that first bytes and not-first bytes are distinct)

I thought we disposed of the idea that there was a problem with charsets 
that didn't do first byte special.

And Dennis said:

> Tom Lane skrev:
>> You could imagine trying to do
>> % a byte at a time (and indeed that's what I'd been thinking it did)
>> but that gets you out of sync which breaks the _ case.
>
> It is only when you have a pattern like '%_' when this is a problem 
> and we could detect this and do byte by byte when it's not. Now we 
> check (*p == '\\') || (*p == '_') in each iteration when we scan over 
> characters for '%', and we could do it once and have different loops 
> for the two cases.

That's pretty much what the patch does now - It never tries to match a 
single byte when it sees "_", whether or not preceeded by "%".

cheers

andrew





pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: like/ilike improvements
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Re: [Oledb-dev] double precision error with pg linux server, but not with windows pg server