Thread: [BUGS] TO_TSVECTOR acts differently with national charcters
Query:
SELECT strip(to_tsvector('simple','toop/6 foo bar')),strip(to_tsvector('simple','tüüp/6 foo bar'));
PosgreSQL 9.3.5, Collation - Estonian
Results are:
'bar' 'foo' 'toop/6'
'/6' 'bar' 'foo' 'tüüp'
The string is converted to vector differently, when the string contains national charcters "äöüõžš".
SELECT strip(to_tsvector('simple','toop/6 foo bar')),strip(to_tsvector('simple','tüüp/6 foo bar'));
PosgreSQL 9.3.5, Collation - Estonian
Results are:
'bar' 'foo' 'toop/6'
'/6' 'bar' 'foo' 'tüüp'
The string is converted to vector differently, when the string contains national charcters "äöüõžš".
Mart Palmas
On Tue, Aug 22, 2017 at 08:53:45AM +0000, Mart Palmas wrote: > > The string is converted to vector differently, when the string contains national charcters "äöüõžš". > I suppose it is true for all non-ascii characters. It could be fixed by patching the parser of text search. But maybe someone won't be happy about it, because it can break backward compatibility. > Results are: > 'bar' 'foo' 'toop/6' > '/6' 'bar' 'foo' 'tüüp' Do you expect first or second option? Someone may want not devide words by the "/" character, because "toop/6" can mean a path: =# select * from ts_debug('simple', 'toop/6');alias | description | token | dictionaries | dictionary | lexemes -------+-------------------+--------+--------------+------------+----------file | File or path name | toop/6 | {simple} | simple | {toop/6} (1 row) -- Arthur Zakirov Postgres Professional: http://www.postgrespro.com Russian Postgres Company -- Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-bugs