Re: [BUGS] TO_TSVECTOR acts differently with national charcters - Mailing list pgsql-bugs

From Arthur Zakirov
Subject Re: [BUGS] TO_TSVECTOR acts differently with national charcters
Date
Msg-id 20170824190134.GA1699@arthur.localdomain
Whole thread Raw
In response to [BUGS] TO_TSVECTOR acts differently with national charcters  (Mart Palmas <Mart.Palmas@datel.ee>)
List pgsql-bugs
On Tue, Aug 22, 2017 at 08:53:45AM +0000, Mart Palmas wrote:
> 
> The string is converted to vector differently, when the string contains national charcters "äöüõžš".
> 

I suppose it is true for all non-ascii characters. It could be fixed by
patching the parser of text search. But maybe someone won't be happy
about it, because it can break backward compatibility.

> Results are:
> 'bar' 'foo' 'toop/6'
> '/6' 'bar' 'foo' 'tüüp'

Do you expect first or second option?

Someone may want not devide words by the "/" character, because "toop/6"
can mean a path:

=# select * from ts_debug('simple', 'toop/6');alias |    description    | token  | dictionaries | dictionary | lexemes

-------+-------------------+--------+--------------+------------+----------file  | File or path name | toop/6 |
{simple}    | simple     | {toop/6}
 
(1 row)

-- 
Arthur Zakirov
Postgres Professional: http://www.postgrespro.com
Russian Postgres Company


-- 
Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-bugs

pgsql-bugs by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: [BUGS] BUG #14788: `pg_restore -c` won't restore schema access privileges.
Next
From: Andres Freund
Date:
Subject: Re: [BUGS] Standby corruption after master is restarted