Re: [to_tsvector] German Compound Words - Mailing list pgsql-general

From Oleg Bartunov
Subject Re: [to_tsvector] German Compound Words
Date
Msg-id CAF4Au4zS=usY=_azoYPeuiYsRbNVhATRQZCYjFWgF8W7cUrrvw@mail.gmail.com
Whole thread Raw
In response to [to_tsvector] German Compound Words  ("Sven R. Kunze" <srkunze@tbz-pariv.de>)
Responses Re: [to_tsvector] German Compound Words
List pgsql-general
ts_debug() ?

=# select * from ts_debug('english', 'messages');
   alias   |   description   |  token   |  dictionaries  |  dictionary  | lexemes
-----------+-----------------+----------+----------------+--------------+----------
 asciiword | Word, all ASCII | messages | {english_stem} | english_stem | {messag}


On Thu, May 28, 2015 at 2:05 PM, Sven R. Kunze <srkunze@tbz-pariv.de> wrote:
Hi everybody,

what do I need to do in order to enable compound word handling in PostgreSQL tsvector implementation?

I run an Ubuntu 14.04 machine, PostgreSQL 9.3, have installed package hunspell-de-de and already created a new dictionary as described here: http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html#TEXTSEARCH-ISPELL-DICTIONARY

CREATE TEXT SEARCH DICTIONARY german_hunspell (
    TEMPLATE = ispell,
    DictFile = de_de,
    AffFile = de_de,
    StopWords = german
);

Furthermore, created a new test text search configuration (copied from german) and updated all parser parts where the german_stem dictionary is used so that it uses german_hunspell first and then german_stem.

However, ts_vector still does not work for the compound words such as:

wasserkraft -> wasserkraft, kraft
schifffahrt -> schifffahrt, fahrt
blindflansch -> blindflansch, flansch

etc.


What have I done wrong here?

--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

pgsql-general by date:

Previous
From: Ravi Krishna
Date:
Subject: Partitioning and performance
Next
From: "Sven R. Kunze"
Date:
Subject: Re: [to_tsvector] German Compound Words