Home > mailing lists

Re: making tsearch2 dictionaries - Mailing list pgsql-general

From	Ben
Subject	Re: making tsearch2 dictionaries
Date	February 17, 2004 14:37:25
Msg-id	Pine.LNX.4.44.0402171000290.32605-100000@localhost.localdomain Whole thread Raw
In response to	Re: making tsearch2 dictionaries (Oleg Bartunov <oleg@sai.msu.su>)
List	pgsql-general

Tree view

On Tue, 17 Feb 2004, Oleg Bartunov wrote:

> it's unpredictable  and I still don't get your idea of pipilining, but
> in general, I have nothing agains it.

Oh, well, the idea is that instead of the dictionary searching stopping at
the first dictionary in the chain that returns a lexeme, it would take
each of the lexemes returned and pass them on to the next dictionary in
the chain.

So if I specified numbers were to be handled by my num2english dictionary,
followed by en_stem, and then tried to deal get a vector for "100",
num2english would return "one" and "hundred". Then both "one" and
"hundred" would each be looked up in en_stem, and the union of these
lexems would be the final result.

Similarly, if a latin word gets piped through an ispell dictionary before
being sent to en_stem, each possible spelling would be stemmed.

> Aha, the same way as we handle complex words with hyphen - we return
> the whole word and its parts. So you need to introduce new type of token
> in parser and use synonym dictionary which in one's turn will returns
> the symbol token and human readable word.

Okay, that makes sense. I'll look more into how hyphenated words are being
handled now.

pgsql-general by date:

From: andrew@pillette.com
Date: 17 February 2004, 14:08:39
Subject: pg_dump and circular dependency

From: "Simone Crider"
Date: 17 February 2004, 14:47:49
Subject: GPG MD5 Checksum for postgres 7.4

Re: making tsearch2 dictionaries - Mailing list pgsql-general

Previous

Next