a tsearch issue - Mailing list pgsql-hackers

From Pavel Stehule
Subject a tsearch issue
Date
Msg-id CAFj8pRD1rTJaFEnEGjEfB9BCQMPD=w46z1vvO6xK=4Z8Dvhx+Q@mail.gmail.com
Whole thread Raw
Responses Re: a tsearch issue
List pgsql-hackers
Hello

I found a interesting issue when I checked a tsearch prefix searching.

We use a ispell based dictionary

CREATE TEXT SEARCH DICTIONARY cspell  (template=ispell, dictfile = czech, afffile=czech, stopwords=czech);
CREATE TEXT SEARCH CONFIGURATION cs (copy=english);
ALTER TEXT SEARCH CONFIGURATION cs  ALTER MAPPING FOR word, asciiword WITH cspell, simple;

Then I created a table

postgres=# create table n(a varchar);
CREATE TABLE
postgres=# insert into n values('Stěhule'),('Chromečka');
INSERT 0 2
postgres=# select * from n;    a
───────────StěhuleChromečka
(2 rows)

and I tested a prefix searching:

I found a following issue

postgres=# select * from n where to_tsvector('cs', a) @@
to_tsquery('cs','Stě:*') ;a
───
(0 rows)

I expected one row. The problem is in transformation of word 'Stě'

postgres=# select * from ts_debug('cs','Stě:*') ;
─[ RECORD 1 ]┬──────────────────
alias        │ word
description  │ Word, all letters
token        │ Stě
dictionaries │ {cspell,simple}
dictionary   │ cspell
lexemes      │ {sto}
─[ RECORD 2 ]┼──────────────────
alias        │ blank
description  │ Space symbols
token        │ :*
dictionaries │ {}
dictionary   │ [null]
lexemes      │ [null]


Ispell disctionary cannot to work well with a first n chars from word.
I don't know what is correct solution of this problem.

Minimally note in prefix search, so this cannot work well with *spell
dictionaries - or description of this issue.

Regards

Pavel Stehue


pgsql-hackers by date:

Previous
From: Yoann Moreau
Date:
Subject: Re: Term positions in GIN fulltext index
Next
From: Dimitri Fontaine
Date:
Subject: Re: DeArchiver process