Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords - Mailing list pgsql-patches

From Jan Urbański
Subject Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Date
Msg-id 47345276.5060803@students.mimuw.edu.pl
Whole thread Raw
In response to Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords  (Heikki Linnakangas <heikki@enterprisedb.com>)
List pgsql-patches
> dictionaries. In this case, you would first check against one stopword
> list, eliminating 'od', then check the ispell dictionary, and then check
> another stopword list without 'od'.

My problem is basically solved using the patch I sent earlier. I use
'{stop, pl_ispell, simple}' which has the effect of:
a) eliminating words that are stopwords but stemmed produce
non-stopwords (such as  'od', that gets stemmed to 'oda')
b) stemming non-stopwords properly (using an ispell dictionary)
c) indexing words that are not reckognized by ispell, (for instance
'postgresql' gets indexed as 'postgresql')

> I suggested that a while ago
> (http://archives.postgresql.org/pgsql-hackers/2007-08/msg01036.php).
> Hopefully Oleg or someone else gets around restructuring the
> dictionaries in a future release.

I'm gald to see I'm not the only one who is in need of a more
sophisticated way of dealing with dictionaries chaining. I understand
however the problems that arise when one wants to extend the dictionary
API beyond the reject/accept/pass-on schema. For these three we have an
easy way of passing the result from lexize - it returns an empty array,
an array of stemmed lexemes or NULL. If more complex actions were to be
taken, I'm afraid lexize would have to return something more complex
than just text[].

> I wonder if you could hack the ispell dictionary file to treat oda
> specially?

I thought about it, but it turned out that writing a custom dictionary
was easier than figuring out how ispell works internally.

Regards,
--
Jan Urbanski
GPG key ID: E583D7D2

ouden estin


Attachment

pgsql-patches by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: a tsearch2 (8.2.4) dictionary that only filters out stopwords
Next
From: Magnus Hagander
Date:
Subject: Re: krb_match_realm