On Tue, 17 Feb 2004, Oleg Bartunov wrote:
> If ispell dictionary recognizes a word, that word will not pass to en_stem.
> We know how to add "query spelling feature" to tsearch2, just waiting
> for sponsorships :) meanwhile, you could use our trgm module, which
> implements trigram based spelling correction. You need to maintain
> separate table with all words of interests (say, from tsvectors) and
> search query words in that table using bestmatch finction.
Hm, I'll take a look at this approach. I take it you think piping
dictionary output to more dictionaries in the chain is a bad idea? :)
> > > What do you want from parser ?
> >
> > I want to be able to recognize symbols, such as the degree (ôá) and
> > vulgar half (ôî) symbols.
>
> You mean '(TA)', '(TH)' ? I think it's not very difficult. What'd be
> a token type ( parenthesis_word :?)
uh, not sure how you got (TA) and (TH)... if you look at the original
message with utf-8 unicode encoding, the sympols come out fine. Or, maybe
you'd just have better luck pointing a browser at a page like
http://homepages.comnet.co.nz/~r-mahoney/bca_text/utf8.html. I want to be
able to recognize a subset of these symbols, and I'd want another
dictionary I'd make to handle the symbol token to return both the symbol
and the common name as lexemes, in case people spell out the symbol
instead of entering it.