Thread: [BUGS] TO_TSVECTOR acts differently with national charcters

[BUGS] TO_TSVECTOR acts differently with national charcters

From
Mart Palmas
Date:
Query:

SELECT strip(to_tsvector('simple','toop/6 foo bar')),strip(to_tsvector('simple','tüüp/6 foo bar'));

PosgreSQL 9.3.5, Collation - Estonian

Results are:
'bar' 'foo' 'toop/6'
'/6' 'bar' 'foo' 'tüüp'

The string is converted to vector differently, when the string contains national charcters "äöüõžš".

Mart Palmas

Re: [BUGS] TO_TSVECTOR acts differently with national charcters

From
Arthur Zakirov
Date:
On Tue, Aug 22, 2017 at 08:53:45AM +0000, Mart Palmas wrote:
> 
> The string is converted to vector differently, when the string contains national charcters "äöüõžš".
> 

I suppose it is true for all non-ascii characters. It could be fixed by
patching the parser of text search. But maybe someone won't be happy
about it, because it can break backward compatibility.

> Results are:
> 'bar' 'foo' 'toop/6'
> '/6' 'bar' 'foo' 'tüüp'

Do you expect first or second option?

Someone may want not devide words by the "/" character, because "toop/6"
can mean a path:

=# select * from ts_debug('simple', 'toop/6');alias |    description    | token  | dictionaries | dictionary | lexemes

-------+-------------------+--------+--------------+------------+----------file  | File or path name | toop/6 |
{simple}    | simple     | {toop/6}
 
(1 row)

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs