Re: [tsvector] to_tsvector called multiple times - Mailing list pgsql-general

From Sven R. Kunze
Subject Re: [tsvector] to_tsvector called multiple times
Date
Msg-id 55644C7F.50103@tbz-pariv.de
Whole thread Raw
In response to Re: [tsvector] to_tsvector called multiple times  ("Sven R. Kunze" <srkunze@tbz-pariv.de>)
List pgsql-general
For future reference: https://github.com/snowballstem/snowball/issues/19


On 26.05.2015 12:29, Sven R. Kunze wrote:
> Thanks. It seems as if I have use snowball. So, I go ahead and post my
> issue at github.
>
>
> Maybe, I have difficulties to understand the relationship/dependencies
> between all these 'maybe' available dictionary/parser/stemmer packages.
>
> What happens if I install all packages for a single language?
> (hunspell, myspell, ispell, snowball)
>
> Are they complementary? Do they replace each other?
>
>
> >>> \dFd
>                              List of text search dictionaries
>    Schema   |      Name       | Description
> ------------+-----------------+-----------------------------------------------------------
>
>  pg_catalog | danish_stem     | snowball stemmer for danish language
>  pg_catalog | dutch_stem      | snowball stemmer for dutch language
>  pg_catalog | english_stem    | snowball stemmer for english language
>  pg_catalog | finnish_stem    | snowball stemmer for finnish language
>  pg_catalog | french_stem     | snowball stemmer for french language
>  pg_catalog | german_stem     | snowball stemmer for german language
>  pg_catalog | hungarian_stem  | snowball stemmer for hungarian language
>  pg_catalog | italian_stem    | snowball stemmer for italian language
>  pg_catalog | norwegian_stem  | snowball stemmer for norwegian language
>  pg_catalog | portuguese_stem | snowball stemmer for portuguese language
>  pg_catalog | romanian_stem   | snowball stemmer for romanian language
>  pg_catalog | russian_stem    | snowball stemmer for russian language
>  pg_catalog | simple          | simple dictionary: just lower case and
> check for stopword
>  pg_catalog | spanish_stem    | snowball stemmer for spanish language
>  pg_catalog | swedish_stem    | snowball stemmer for swedish language
>  pg_catalog | turkish_stem    | snowball stemmer for turkish language
> (16 rows)
>
>
> On 26.05.2015 12:09, Albe Laurenz wrote:
>> Sven R. Kunze wrote:
>>> However, are you sure, I am using snowball? Maybe, I am reading the
>>> documenation wrong:
>> test=> SELECT * FROM ts_debug('german', 'system');
>>     alias   |   description   | token  | dictionaries  | dictionary
>> | lexemes
>> -----------+-----------------+--------+---------------+-------------+---------
>>
>>   asciiword | Word, all ASCII | system | {german_stem} | german_stem
>> | {syst}
>> (1 row)
>>
>> test=> \dFd german_stem
>>                  List of text search dictionaries
>>     Schema   |    Name     |             Description
>> ------------+-------------+--------------------------------------
>>   pg_catalog | german_stem | snowball stemmer for german language
>> (1 row)
>>
>>> http://www.postgresql.org/docs/9.3/static/textsearch-dictionaries.html
>>> but it seems as it depends on which packages (ispell, hunspell,
>>> myspell,
>>> snowball + corresponding languages) my system has installed.
>>>
>>> Is there an easy way to determine which of these packages PostgreSQL
>>> uses AND what for?
>> If you use a standard PostgreSQL distribution, you will have no ispell
>> dictionary (as the documentation you quote says).
>> You can always list all dictionaries with "\dFd" in psql.
>>
>>> Sure. That might be the problem. It occurs to me that stems (if
>>> detected
>>> as such) should be left alone.
>>> In case a stem is real German word, it should be stemmed to itself
>>> anyway
>>> If not, it might help not to stem in order to avoid errors.
>> Yes, but that would mean that you have a way to determine from a string
>> whether it is a word or a stem or both, and the software does not do
>> that.
>>
>> Yours,
>> Laurenz Albe
>>
>
> Regards,
>


--
Sven R. Kunze
TBZ-PARIV GmbH, Bernsdorfer Str. 210-212, 09126 Chemnitz
Tel: +49 (0)371 33714721, Fax: +49 (0)371 5347920
e-mail: srkunze@tbz-pariv.de
web: www.tbz-pariv.de

Geschäftsführer: Dr. Reiner Wohlgemuth
Sitz der Gesellschaft: Chemnitz
Registergericht: Chemnitz HRB 8543



pgsql-general by date:

Previous
From: "Sven R. Kunze"
Date:
Subject: Re: [tsvector] to_tsvector called multiple times
Next
From: Albe Laurenz
Date:
Subject: Re: [tsvector] to_tsvector called multiple times