Re: making tsearch2 dictionaries - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: making tsearch2 dictionaries
Date
Msg-id Pine.GSO.4.58.0402171337160.3452@ra.sai.msu.su
Whole thread Raw
In response to Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
Responses Re: making tsearch2 dictionaries  (Ben <bench@silentmedia.com>)
List pgsql-general
On Mon, 16 Feb 2004, Ben wrote:

> So I noticed. ;) The dictionary's working, and I'd be happy to expand
> upon the documentation. Just point me at something to work on.
>

I think you may just write a paper "How I did custom dictionary for tsearch2".
From what I've read I see your dictionary could be interesting to people
especially if you describe the motivation and usage.
Do you want '100' or 'hundred' will be fully equivalent ? So,
if you search '100' you will find document with 'hundred'. Interesting,
that you will find '123', because '123' will be 'one hundred twenty three'.

> But, like I said, I really want to figure out a way to pipe the output
> of my dictionary through the another dictionary. If I can't do that, it
> doesn't seem as useful, because "100" (handled by my dictionary) and
> "one hundred" (handled by en_stem) currently don't generate the same
> ts_vector.

What's the problem ? You may configure which dictionaries and in what order
should be used for given type of token (pg_ts_cfgmap table).
Aha, I got your problem:

www=# select * from ts_debug('one hundred');
     ts_name     | tok_type | description |  token  | dict_name | tsvector
-----------------+----------+-------------+---------+-----------+----------
 default_russian | lword    | Latin word  | one     | {en_stem} | 'one'
 default_russian | lword    | Latin word  | hundred | {en_stem} | 'hundr

'hundred' becames 'hundr'. You may use synonym dictionary which is
rather simple
( see http://www.sai.msu.su/~megera/oddmuse/index.cgi/Tsearch_V2_Notes for details ).
Once word is recognized by synonym dictionary it will not pass to
next dictionary ! This is how tsearch2 is working with any dictionary.


>
> Once I figure out how to tweak the parser to parse things they way I
> want, I can expand upon those docs too. Looks like I'm going to need to
> reach waaaay back into my brain and dust off my flex knowledge for that,
> though....

What do you want from parser ?

>
> On Mon, 2004-02-16 at 10:33, Oleg Bartunov wrote:
> > btw, Ben, if you get you dictionary working, could you describe process
> > of developing so  other people will appreciate your work. This part of
> > tsearch2 documentation is very weak.
> >
> >     Oleg
> >
> > On Mon, 16 Feb 2004, Teodor Sigaev wrote:
> >
> > >
> > >
> > > Ben wrote:
> > > > Thanks for the replies. Just to clarify what I was doing, quaicode
> > > > looked something like:
> > > >
> > > > phrase = palloc(8);
> > > > phrase = "foo\0bar\0";
> > > > res = palloc(3);
> > > > res[0] = phrase[0];
> > > > res[1] = phrase[5];
> > > > res[2] = 0;
> > > >
> > > > That crashed. Once I changed it to:
> > > >
> > > > res = palloc(3);
> > > > res[0] = palloc(4);
> > > > res[0] = "foo\0";
> > > > res[1] = palloc(4);
> > > > res[2] = "bar\0";
> > > > res[3] = 0;
> > > >
> > > > it worked.
> > > >
> > > :)
> > > I hope you mean:
> > > res = palloc(3);
> > > res[0] = palloc(4);
> > > memcpy(res[0] ,"foo", 4);
> > > res[1] = palloc(4);
> > > memcpy(res[1] ,"bar", 4);
> > > res[2] = 0;
> > >
> > > Look at indexes of res.
> > >
> > >
> >
> >     Regards,
> >         Oleg
> > _____________________________________________________________
> > Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
> > Sternberg Astronomical Institute, Moscow University (Russia)
> > Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
> > phone: +007(095)939-16-83, +007(095)939-23-83
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 6: Have you searched our list archives?
>
>                http://archives.postgresql.org
>

    Regards,
        Oleg
_____________________________________________________________
Oleg Bartunov, sci.researcher, hostmaster of AstroNet,
Sternberg Astronomical Institute, Moscow University (Russia)
Internet: oleg@sai.msu.su, http://www.sai.msu.su/~megera/
phone: +007(095)939-16-83, +007(095)939-23-83

pgsql-general by date:

Previous
From: "Matthew Lunnon"
Date:
Subject: summary aggregate information from a second table
Next
From: Pascal Polleunus
Date:
Subject: Re: function returning a record