Re: making tsearch2 dictionaries - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: making tsearch2 dictionaries
Date
Msg-id Pine.GSO.4.58.0402172046480.17553@ra.sai.msu.su
Whole thread Raw
In response to Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Responses Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
List pgsql-general
On Tue, 17 Feb 2004, Ben wrote:

> On Tue, 17 Feb 2004, Oleg Bartunov wrote:
>
> > If ispell dictionary recognizes a word, that word will not pass to en_stem.
> > We know how to add "query spelling feature" to tsearch2, just waiting
> > for sponsorships :) meanwhile, you could use our trgm module, which
> > implements trigram based spelling correction. You need to maintain
> > separate table with all words of interests (say, from tsvectors) and
> > search query words in that table using bestmatch finction.
>
> Hm, I'll take a look at this approach. I take it you think piping
> dictionary output to more dictionaries in the chain is a bad idea? :)

it's unpredictable  and I still don't get your idea of pipilining, but
in general, I have nothing agains it.

>
> > > > What do you want from parser ?
> > >
> > > I want to be able to recognize symbols, such as the degree (ТА) and
> > > vulgar half (ТН) symbols.
> >
> > You mean '(TA)', '(TH)' ?  I think it's not very difficult. What'd be
> > a token type ( parenthesis_word :?)
>
> uh, not sure how you got (TA) and (TH)... if you look at the original
> message with utf-8 unicode encoding, the sympols come out fine. Or, maybe
> you'd just have better luck pointing a browser at a page like

Yup:)

> http://homepages.comnet.co.nz/~r-mahoney/bca_text/utf8.html. I want to be
> able to recognize a subset of these symbols, and I'd want another
> dictionary I'd make to handle the symbol token to return both the symbol
> and the common name as lexemes, in case people spell out the symbol
> instead of entering it.
>

Aha, the same way as we handle complex words with hyphen - we return
the whole word and its parts. So you need to introduce new type of token
in parser and use synonym dictionary which in one's turn will returns
the symbol token and human readable word.

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-general by date:

Previous
From: Ben
Date:
Subject: Re: making tsearch2 dictionaries
Next
From: andrew@pillette.com
Date:
Subject: pg_dump and circular dependency