Home > mailing lists

Re: tsearch2: language or encoding - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: tsearch2: language or encoding
Date	July 6, 2007 03:57:40
Msg-id	24929.1183705054@sss.pgh.pa.us Whole thread Raw
In response to	tsearch2: language or encoding (Tatsuo Ishii <ishii@sraoss.co.jp>)
List	pgsql-hackers

Tree view

Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> I'm wondering if a tsearch's configuration is bound to a language or
> an encoding. If it's bound to a language, there's a serious design
> problem, I would think. An encoding or charset is not necessarily
> bound to single language. We can find such that example everywhere(I'm
> not talking about Unicode here). LATIN1 inclues English and several
> european languages. EUC-JP includes English and Japanese etc. And
> we specify encoding for char's property, not language, I would say the
> configuration should be bound to an encoding.

Surely not, because then what do you do with utf8, which (allegedly)
represents every language on earth?

As far as the word-stemming part goes, that is very clearly bound
to a language not an encoding.  There may be some other parts of
the code that really are better attached to an encoding --- Oleg,
Teodor, your thoughts?
        regards, tom lane

pgsql-hackers by date:

From: Tatsuo Ishii
Date: 06 July 2007, 03:44:52
Subject: tsearch2: language or encoding

From: Greg Smith
Date: 06 July 2007, 04:46:14
Subject: Re: usleep feature for pgbench

Re: tsearch2: language or encoding - Mailing list pgsql-hackers

Previous

Next