Thread: a tsearch issue
Hello I found a interesting issue when I checked a tsearch prefix searching. We use a ispell based dictionary CREATE TEXT SEARCH DICTIONARY cspell (template=ispell, dictfile = czech, afffile=czech, stopwords=czech); CREATE TEXT SEARCH CONFIGURATION cs (copy=english); ALTER TEXT SEARCH CONFIGURATION cs ALTER MAPPING FOR word, asciiword WITH cspell, simple; Then I created a table postgres=# create table n(a varchar); CREATE TABLE postgres=# insert into n values('Stěhule'),('Chromečka'); INSERT 0 2 postgres=# select * from n; a ───────────StěhuleChromečka (2 rows) and I tested a prefix searching: I found a following issue postgres=# select * from n where to_tsvector('cs', a) @@ to_tsquery('cs','Stě:*') ;a ─── (0 rows) I expected one row. The problem is in transformation of word 'Stě' postgres=# select * from ts_debug('cs','Stě:*') ; ─[ RECORD 1 ]┬────────────────── alias │ word description │ Word, all letters token │ Stě dictionaries │ {cspell,simple} dictionary │ cspell lexemes │ {sto} ─[ RECORD 2 ]┼────────────────── alias │ blank description │ Space symbols token │ :* dictionaries │ {} dictionary │ [null] lexemes │ [null] Ispell disctionary cannot to work well with a first n chars from word. I don't know what is correct solution of this problem. Minimally note in prefix search, so this cannot work well with *spell dictionaries - or description of this issue. Regards Pavel Stehue
On Fri, 2011-11-04 at 11:22 +0100, Pavel Stehule wrote: > Hello > > I found a interesting issue when I checked a tsearch prefix searching. > > We use a ispell based dictionary > > CREATE TEXT SEARCH DICTIONARY cspell > (template=ispell, dictfile = czech, afffile=czech, stopwords=czech); > CREATE TEXT SEARCH CONFIGURATION cs (copy=english); > ALTER TEXT SEARCH CONFIGURATION cs > ALTER MAPPING FOR word, asciiword WITH cspell, simple; > > Then I created a table > > postgres=# create table n(a varchar); > CREATE TABLE > postgres=# insert into n values('Stěhule'),('Chromečka'); > INSERT 0 2 > postgres=# select * from n; > a > ─────────── > Stěhule > Chromečka > (2 rows) > > and I tested a prefix searching: > > I found a following issue > > postgres=# select * from n where to_tsvector('cs', a) @@ > to_tsquery('cs','Stě:*') ; > a > ─── > (0 rows) Most likely you are hit by this problem. http://archives.postgresql.org/pgsql-hackers/2011-10/msg01347.php 'Stě' may be a stopword in czech. > I expected one row. The problem is in transformation of word 'Stě' > > postgres=# select * from ts_debug('cs','Stě:*') ; > ─[ RECORD 1 ]┬────────────────── > alias │ word > description │ Word, all letters > token │ Stě > dictionaries │ {cspell,simple} > dictionary │ cspell > lexemes │ {sto} > ─[ RECORD 2 ]┼────────────────── > alias │ blank > description │ Space symbols > token │ :* > dictionaries │ {} > dictionary │ [null] > lexemes │ [null] > ':*' is only specific to to_tsquery. ts_debug just invokes the parser. So this is not correct. -Sushant.