Re: making tsearch2 dictionaries - Mailing list pgsql-general

From Ben
Subject Re: making tsearch2 dictionaries
Date
Msg-id Pine.LNX.4.44.0402170923441.32605-100000@localhost.localdomain
Whole thread Raw
In response to Re: making tsearch2 dictionaries  (Oleg Bartunov <oleg@sai.msu.su>)
Responses Re: making tsearch2 dictionaries  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-general
On Tue, 17 Feb 2004, Oleg Bartunov wrote:

> If ispell dictionary recognizes a word, that word will not pass to en_stem.
> We know how to add "query spelling feature" to tsearch2, just waiting
> for sponsorships :) meanwhile, you could use our trgm module, which
> implements trigram based spelling correction. You need to maintain
> separate table with all words of interests (say, from tsvectors) and
> search query words in that table using bestmatch finction.

Hm, I'll take a look at this approach. I take it you think piping
dictionary output to more dictionaries in the chain is a bad idea? :)

> > > What do you want from parser ?
> >
> > I want to be able to recognize symbols, such as the degree (ôá) and
> > vulgar half (ôî) symbols.
>
> You mean '(TA)', '(TH)' ?  I think it's not very difficult. What'd be
> a token type ( parenthesis_word :?)

uh, not sure how you got (TA) and (TH)... if you look at the original
message with utf-8 unicode encoding, the sympols come out fine. Or, maybe
you'd just have better luck pointing a browser at a page like
http://homepages.comnet.co.nz/~r-mahoney/bca_text/utf8.html. I want to be
able to recognize a subset of these symbols, and I'd want another
dictionary I'd make to handle the symbol token to return both the symbol
and the common name as lexemes, in case people spell out the symbol
instead of entering it.


pgsql-general by date:

Previous
From: "scott.marlowe"
Date:
Subject: Re: psql, 7.4, and the \d command
Next
From: Oleg Bartunov
Date:
Subject: Re: making tsearch2 dictionaries