questions about tsearch2 (for czech language) - Mailing list pgsql-general

From Pavel Stehule
Subject questions about tsearch2 (for czech language)
Date
Msg-id Pine.LNX.4.44.0312221128350.27697-100000@kix.fsv.cvut.cz
Whole thread Raw
Responses Re: questions about tsearch2 (for czech language)
List pgsql-general
Hello

I try tsearch2 within czech environment. It is works fine, but I have two
questions.

1. I have words "se", "ve" in my czech stop words. But I get this words in
result. Why? Have I problem with my configuration?

tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.');
    ts_name    | tok_type | description |  token  |  dict_name  | tsvector
---------------+----------+-------------+---------+-------------+-----------
 default_czech | lword    | Latin word  | jmenuji | {cz_ispell} |
'jmenuji'
 default_czech | lword    | Latin word  | se      | {cz_ispell} | 'se'
 default_czech | lword    | Latin word  | Pavel   | {cz_ispell} | 'pavel'
 default_czech | word     | Word        | Stěhule | {cz_ispell} |
 default_czech | lword    | Latin word  | a       | {cz_ispell} |
 default_czech | word     | Word        | bydlím  | {cz_ispell} | 'bydlet'
 default_czech | lword    | Latin word  | ve      | {cz_ispell} | 've'
 default_czech | lword    | Latin word  | Skalici | {cz_ispell} |
'skalici'
(8 řádek)

tsearch2=# select * from pg_ts_cfgmap where ts_name='default_czech';
    ts_name    |  tok_alias   |  dict_name
---------------+--------------+-------------
 default_czech | email        | {simple}
 default_czech | file         | {simple}
 default_czech | float        | {simple}
 default_czech | host         | {simple}
 default_czech | hword        | {cz_ispell}
 default_czech | int          | {simple}
 default_czech | lhword       | {cz_ispell}
 default_czech | lpart_hword  | {cz_ispell}
 default_czech | lword        | {cz_ispell}
 default_czech | nlhword      | {cz_ispell}
 default_czech | nlpart_hword | {cz_ispell}
 default_czech | nlword       | {cz_ispell}
 default_czech | part_hword   | {simple}
 default_czech | sfloat       | {simple}
 default_czech | uint         | {simple}
 default_czech | uri          | {simple}
 default_czech | url          | {simple}
 default_czech | version      | {simple}
 default_czech | word         | {cz_ispell}
(19 řádek)

2. I use small czech dictionary. I need don't erase words which aren't in
dictionary (in my sample Stěhule). Can I set it somewhere? I tryed add
simple dict into cfg map, but witout sucess

tsearch2=# select * from ts_debug('jmenuji se Pavel Stěhule a bydlím ve
Skalici.');           ts_name    | tok_type | description |  token  |
dict_name      | tsvector
---------------+----------+-------------+---------+--------------------+-----------
 default_czech | word     | Word        | Stěhule | {cz_ispell,simple} |
 default_czech | lword    | Latin word  | a       | {cz_ispell,simple} |
 default_czech | word     | Word        | bydlím  | {cz_ispell,simple} |
'bydlet'


Thank You
Pavel Stehule


pgsql-general by date:

Previous
From: Kris Jurka
Date:
Subject: Re: BLOBS : how to remove them totally
Next
From: Richard Huxton
Date:
Subject: Re: Groff and Weinberg SQL Complete Reference - Sample database?