Home > mailing lists

Re: LIKE optimization in UTF-8 and locale-C - Mailing list pgsql-hackers

From	ITAGAKI Takahiro
Subject	Re: LIKE optimization in UTF-8 and locale-C
Date	March 23, 2007 00:25:29
Msg-id	20070323104410.635A.ITAGAKI.TAKAHIRO@oss.ntt.co.jp Whole thread Raw
In response to	Re: LIKE optimization in UTF-8 and locale-C (Hannu Krosing <hannu@skype.net>)
Responses	Re: LIKE optimization in UTF-8 and locale-C
List	pgsql-hackers

Tree view

Hannu Krosing <hannu@skype.net> wrote:

> > > We've had an optimization for single-byte encodings using 
> > > pg_database_encoding_max_length() == 1 test. I'll propose to extend it
> > > in UTF-8 with locale-C case.
> > 
> > If this works for UTF8, won't it work for all the backend-legal
> > encodings?
> 
> I guess it works well for % but not for _ , the latter has to know, how
> many bytes the current (multibyte) character covers.

Yes, % is not used in trailing bytes for all encodings, but _ is
used in some of them. I think we can use the optimization for all
of the server encodings except JOHAB. 

Also, I took notice that locale settings are not used in LIKE matching,
so the following is enough for checking availability of byte-wise matching
functions. or am I missing something?

#define sb_match_available()    (GetDatabaseEncoding() == PG_JOHAB))



Multi-byte encodings supported by a server encoding.
             | % 0x25 | _ 0x5f | \ 0x5c |
--------------+--------+--------+--------+-
EUC_JP        | unused | unused | unused |
EUC_CN        | unused | unused | unused |
EUC_KR        | unused | unused | unused |
EUC_TW        | unused | unused | unused |
JOHAB         | unused | *used* | *used* |
UTF8          | unused | unused | unused |
MULE_INTERNAL | unused | unused | unused |

Just for reference, encodings only supported as a client encoding.
             | % 0x25 | _ 0x5f | \ 0x5c |
--------------+--------+--------+--------+-
SJIS          | unused | *used* | *used* |
BIG5          | unused | *used* | *used* |
GBK           | unused | *used* | *used* |
UHC           | unused | unused | unused |
GB18030       | unused | *used* | *used* |

Regards,
---
ITAGAKI Takahiro
NTT Open Source Software Center

pgsql-hackers by date:

From: Russell Smith
Date: 22 March 2007, 23:22:39
Subject: Re: CREATE INDEX and HOT (was Question: pg_classattributes and race conditions ?)

From: "Pavan Deolasee"
Date: 23 March 2007, 00:58:07
Subject: Re: CREATE INDEX and HOT - revised design

Re: LIKE optimization in UTF-8 and locale-C - Mailing list pgsql-hackers

Previous

Next