Home > mailing lists

Re: Extending range of to_tsvector et al - Mailing list pgsql-hackers

From	Tom Lane
Subject	Re: Extending range of to_tsvector et al
Date	October 1, 2012 07:11:23
Msg-id	28864.1349064678@sss.pgh.pa.us Whole thread Raw
In response to	Re: Extending range of to_tsvector et al (john knightley <john.knightley@gmail.com>)
Responses	Re: Extending range of to_tsvector et al (john knightley <john.knightley@gmail.com>)
List	pgsql-hackers

Tree view

john knightley <john.knightley@gmail.com> writes:
> The OS I am using is Ubuntu 12.04, with PostgreSQL 9.1.5 installed on
> a utf8 local

> A short 5 line dictionary file  is sufficient to test:-

> raeuz
> 我们
> 𦘭𥎵
> 𪽖𫖂
> 󶒘󴮬

> line 1 "raeuz" Zhuang word written using English letters and show up
> under ts_vector ok
> line 2 "我们" uses everyday Chinese word and show up under ts_vector ok
> line 3 "𦘭𥎵" Zhuang word written using rather old Chinese charcters
> found in Unicode 3.1 which came in about the year 2000  and show up
> under ts_vector ok
> line 4 "𪽖𫖂" Zhuang word written using rather old Chinese charcters
> found in Unicode 5.2 which came in about the year 2009 but do not show
> up under ts_vector ok
> line 5 "󶒘󴮬" Zhuang word written using rather old Chinese charcters
> found in PUA area of the font Sawndip.ttf but do not show up under
> ts_vector ok (Font can be downloaded from
> http://gdzhdb.l10n-support.com/sawndip-fonts/Sawndip.ttf)

AFAIK there is nothing in Postgres itself that would distinguish, say,
𦘭 from 𪽖.  I think this must be down to
your platform's locale definition: it probably thinks that the former is
a letter and the latter is not.  You'd have to gripe to the locale
maintainers to get that fixed.
        regards, tom lane

pgsql-hackers by date:

From: Dan Scott
Date: 01 October 2012, 06:58:15
Subject: Re: Extending range of to_tsvector et al

From: john knightley
Date: 01 October 2012, 07:35:09
Subject: Re: Extending range of to_tsvector et al

Re: Extending range of to_tsvector et al - Mailing list pgsql-hackers

Previous

Next