Home > mailing lists

Re: tsearch2: enable non ascii stop words with C locale - Mailing list pgsql-hackers

From	Tatsuo Ishii
Subject	Re: tsearch2: enable non ascii stop words with C locale
Date	February 13, 2007 04:50:37
Msg-id	20070213.175032.102577107.t-ishii@sraoss.co.jp Whole thread Raw
In response to	Re: tsearch2: enable non ascii stop words with C locale (Teodor Sigaev <teodor@sigaev.ru>)
Responses	Re: tsearch2: enable non ascii stop words with C locale
List	pgsql-hackers

Tree view

> > I know. My guess is the parser does not read the stop word file at
> > least with default configuration.
> 
> Parser should not read stopword file: its deal for dictionaries.

I'll come up with more detailed info, explaining why stopword file is
not read.

> > So if a character is not ASCII, it returns 0 even if p_isalpha returns
> > 1. Is this what you expect?
> No, p_islatin should return true only for latin characters, not for national ones.

Precise definition for "latin" in C locale please. Are you saying that
single byte encoding with range 0-7f? is "latin"? If so, it seems they
are exacty same as ASCII.
--
Tatsuo Ishii
SRA OSS, Inc. Japan

> > In our case, we added JAPANESE_STOP_WORD into english.stop then:
> > select to_tsvector(JAPANESE_STOP_WORD)
> > which returns words even they are in JAPANESE_STOP_WORD.
> > And with the patches the problem was solved.
> 
> Pls, show your configuration for lexemes/dictionaries. I suspect that you have 
> en_stem dictionary on for lword lexemes type. Better way is to use 'simple' 
> distionary (it's support stopword the same way as en_stem does) and set it for
> nlword, word, part_hword, nlpart_hword, hword, nlhword lexeme's types. Note, 
> leave unchanged en_stem for any latin word.
> 
> -- 
> Teodor Sigaev                                   E-mail: teodor@sigaev.ru

pgsql-hackers by date:

From: Magnus Hagander
Date: 13 February 2007, 04:33:59
Subject: Re: Variable length varlena headers redux

From: Teodor Sigaev
Date: 13 February 2007, 05:05:49
Subject: Re: tsearch2: enable non ascii stop words with C locale

Re: tsearch2: enable non ascii stop words with C locale - Mailing list pgsql-hackers

Previous

Next