Home > mailing lists

Hunspell as filtering dictionary - Mailing list pgsql-general

From	Bibi Mansione
Subject	Hunspell as filtering dictionary
Date	November 5, 2019 14:42:17
Msg-id	CACZ67_U8Vu66-kPRj_v2icmn_wmz9_LDM8Tv_tvptKKwBXD2tQ@mail.gmail.com Whole thread
Responses	Re: Hunspell as filtering dictionary
List	pgsql-general

Tree view

Hi,

I am trying to create a ts_vector from a French text. Here are the operations that seem logical to perform in that order:

1. remove stopwords

2. use hunspell to find words roots

3. unaccent

I first tried:

CREATE TEXT SEARCH CONFIGURATION fr_conf (copy='simple');

ALTER TEXT SEARCH CONFIGURATION fr_conf

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

word, hword, hword_part

WITH unaccent, french_hunspell;

select * from to_tsvector('fr_conf', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':5 'aventurer':5 'rencontrer':3

But the verb "découvrir" is missing :(

If I try with french_hunspell only, I get it, but with the accent:

select * from to_tsvector('french_hunspell', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':6 'aventurer':6 'découvrir':2 'rencontrer':4

I also tried:

CREATE TEXT SEARCH CONFIGURATION fr_conf2 (copy='simple');

ALTER TEXT SEARCH CONFIGURATION fr_conf2

ALTER MAPPING FOR asciiword, asciihword, hword_asciipart,

word, hword, hword_part

WITH french_hunspell, unaccent;

select * from to_tsvector('fr_conf2', E'Pour découvrir et rencontrer l\'aventure.');

-- 'aventure':5 'aventurer':5 'rencontrer':3

But I guess unaccent is never called.

I believe this is because french_hunspell is not a filtering dictionary, but I might be wrong. So is there a way to get this result from any FTS configuration (existing or :

-- 'aventure':6 'aventurer':6 'decouvrir':2 'rencontrer':4

Thanks,

Bertrand

pgsql-general by date:

From: Andreas Joseph Krogh
Date: 05 November 2019, 11:40:09
Subject: Re: Create a logical and physical replication

From: Michael Shapiro
Date: 05 November 2019, 14:43:51
Subject: Re: select view definition from pg_views feature request

Hunspell as filtering dictionary - Mailing list pgsql-general

Previous

Next