Home > mailing lists

Re: [Fwd: Re: tsearch in core patch] - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: [Fwd: Re: tsearch in core patch]
Date	June 25, 2007 01:41:52
Msg-id	20070625.134059.26277531.t-ishii@sraoss.co.jp Whole thread Raw
In response to	Re: [Fwd: Re: tsearch in core patch] (Tom Lane <tgl@sss.pgh.pa.us>)
Responses	Re: [Fwd: Re: tsearch in core patch]
List	pgsql-hackers

Tree view

> Tatsuo Ishii <ishii@sraoss.co.jp> writes:
> > Ok, probably we need to copy the English stemming rule to the one for
> > Japanese.
> 
> Pardon my ignorance here, but is the concept of stemming even relevant
> to Japanese/Chinese/Korean?  What little I know about ideographic
> languages suggests it wouldn't work well.  And surely the specific rules
> in the Snowball project's English stemmer wouldn't work.

Your undestanding is correct. English stemmer would not work for
Japanese "non English" part.

What I meant was the "chunks of English text" in Japanese.

> > I think same thing (commonly used English with local
> > language) can be applied to Chinese and Korean.
> 
> Well, it's not hard at all to find chunks of English text that have
> embedded bits of French, Spanish, or what-have-you, but that's not an
> argument for trying to intermix the stemmers.  I doubt that such simple
> bits of program could tell the language difference well enough to
> determine which stemming rules to apply.

For Japanese, it will be fairly simple: 7bit ASCII range words must be
English (Note that mostly used Japanese encodings such as EUC do not
allow to mix with ISO 8859).
--
Tatsuo Ishii
SRA OSS, Inc. Japan

pgsql-hackers by date:

From: Tom Lane
Date: 25 June 2007, 01:26:12
Subject: Re: [Fwd: Re: tsearch in core patch]

From: Michael Paesold
Date: 25 June 2007, 03:07:29
Subject: Re: msvc and vista fun

Re: [Fwd: Re: tsearch in core patch] - Mailing list pgsql-hackers

Previous

Next