Home > mailing lists

Re: LIKE optimization in UTF-8 and locale-C - Mailing list pgsql-hackers

From	Dennis Bjorklund
Subject	Re: LIKE optimization in UTF-8 and locale-C
Date	March 23, 2007 05:53:04
Msg-id	460362E6.2040208@zigo.dhs.org Whole thread Raw
In response to	Re: LIKE optimization in UTF-8 and locale-C (ITAGAKI Takahiro <itagaki.takahiro@oss.ntt.co.jp>)
List	pgsql-hackers

Tree view

ITAGAKI Takahiro skrev:

>> I guess it works well for % but not for _ , the latter has to know, how
>> many bytes the current (multibyte) character covers.
> 
> Yes, % is not used in trailing bytes for all encodings, but _ is
> used in some of them. I think we can use the optimization for all
> of the server encodings except JOHAB. 

The problem with the like pattern _ is that it has to know how long the 
single caracter is that it should pass over. Say you have a UTF-8 string 
with 2 characters encoded in 3 bytes ('ÖA'). Where the first character 
is 2 bytes:

0xC3 0x96 'A'

and now you want to match that with the LIKE pattern:

'_A'

How would that work in the C locale?

Maybe one should simply write a special version of LIKE for the UTF-8 
encoding since it's probably the most used encoding today. But I don't 
think you can use the C locale and that it would work for UTF-8.

/Dennis

pgsql-hackers by date:

From: ITAGAKI Takahiro
Date: 23 March 2007, 05:45:58
Subject: Re: LIKE optimization in UTF-8 and locale-C

From: Andrew - Supernews
Date: 23 March 2007, 06:00:54
Subject: Re: LIKE optimization in UTF-8 and locale-C

Re: LIKE optimization in UTF-8 and locale-C - Mailing list pgsql-hackers

Previous

Next