Re: BUG #13690: Full Text Search with spanish dictionary cannot find some words - Mailing list pgsql-bugs

From Tom Lane
Subject Re: BUG #13690: Full Text Search with spanish dictionary cannot find some words
Date
Msg-id 33158.1445358098@sss.pgh.pa.us
Whole thread Raw
In response to BUG #13690: Full Text Search with spanish dictionary cannot find some words  (vtamara@pasosdeJesus.org)
Responses Re: BUG #13690: Full Text Search with spanish dictionary cannot find some words  (Artur Zakirov <a.zakirov@postgrespro.ru>)
List pgsql-bugs
vtamara@pasosdeJesus.org writes:
> The following search in english succeeds  (returns 1):

> SELECT  COUNT(*) FROM cat
>         WHERE to_tsvector('english', nombre) @@ to_tsquery('english',
> 'politi:*'
> );

> But fails using the spanish dictionary (returns 0):

> SELECT  COUNT(*) FROM cat
>         WHERE to_tsvector('spanish', nombre) @@ to_tsquery('spanish',
> 'politi:*'
> );

This is because you didn't adjust the wildcard search pattern for the
different stemming rules used in Spanish.  Look at the to_tsvector and
to_tsquery results:

regression=# SELECT to_tsvector('english', nombre) , to_tsquery('english','politi:*') from cat;
       to_tsvector       | to_tsquery
-------------------------+------------
 'politica':1 'social':2 | 'politi':*
(1 row)

regression=# SELECT to_tsvector('spanish', nombre) , to_tsquery('spanish','politi:*') from cat;
     to_tsvector      | to_tsquery
----------------------+------------
 'polit':1 'social':2 | 'politi':*
(1 row)

I don't know enough Spanish to follow the reasoning for stemming
"politica" as "polit" rather than something else; but I do see that
"politi" is not reduced to "polit", which is fairly reasonable since
that's not a word.  "politi:*" will match anything whose stemmed
version starts with "politi", but that's too long ...

            regards, tom lane

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #13689: Build failed pg9.4.5 with mingw5.1
Next
From: Alvaro Herrera
Date:
Subject: Re: BUG #13688: lack of return value in r_mark_regions()