Re: contrib/tsearch - Mailing list pgsql-hackers

From Teodor Sigaev
Subject Re: contrib/tsearch
Date
Msg-id 3D7CADFE.6070209@stack.net
Whole thread Raw
In response to Re: contrib/tsearch  (Oleg Bartunov <oleg@sai.msu.su>)
List pgsql-hackers
> Should we check for stop words before stemming or after ?

Current implementation supports both variants. Look dictionary interface 
definition in morph.c:

typedef struct
{        char            localename[NAMEDATALEN];        /* init dictionary */        void       *(*init) (void);
/* close dictionary */        void            (*close) (void *);        /* find in dictionary */        char
*(*lemmatize)(void *, char *, int *);        int                     (*is_stoplemm) (void *, char *, int);        int
                 (*is_stemstoplemm) (void *, char *, int);
 
}       DICT;

'is_stoplemm'  method is called before 'lemmtize' and 'is_stemstoplemm' after.
dict/porter_english.dct at the end:
TABLE_DICT_START        "C",        setup_english_stemmer,        closedown_english_stemmer,        engstemming,
NULL,       is_stopengword
 
TABLE_DICT_END

dict/russian_stemming.dct:
TABLE_DICT_START        "ru_RU.KOI8-R",        NULL,        NULL,        ru_RUKOI8R_stem,
ru_RUKOI8R_is_stopword,       NULL
 
TABLE_DICT_END

So english stemmer defines is lexem stop or not after stemming, but russian before.



-- 
Teodor Sigaev
teodor@stack.net




pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Rule updates and PQcmdstatus() issue
Next
From: Jan Wieck
Date:
Subject: Re: Rule updates and PQcmdstatus() issue