Thread: Sorting in Unicode not working

Sorting in Unicode not working

From
Hitesh Bagadiya
Date:
Hi,

Our database contains Hindi as well as English characters. We
have specified the encoding to be unicode during initdb as well
as createdb commands.

Unfortunately sorting of the Hindi fields is not working. For
e.g. we have a person table and the query "SELECT * FROM PERSON
ORDERY BY LASTNAME" returns all the rows but the records are not
being sorted by the last name.

We tried this on Postgresql 7.3.2 as well as 7.4.1 but with no
luck. The OS we tried were Mandrake 9.1 and Fedora Core 1.

Do we need to do anything special to get Hindi/Unicode sorting
working in Postgresql?

Please help,

Hitesh


__________________________________
Do you Yahoo!?
Yahoo! Small Business $15K Web Design Giveaway
http://promotions.yahoo.com/design_giveaway/

Re: Sorting in Unicode not working

From
Tom Lane
Date:
Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> Our database contains Hindi as well as English characters. We
> have specified the encoding to be unicode during initdb as well
> as createdb commands.
> Unfortunately sorting of the Hindi fields is not working.

You need to make sure you initdb with the right locale, not only
the right encoding.  I dunno which locale you want ... but if
sort(1) sorts the way you want then Postgres should too.

            regards, tom lane

Re: Sorting in Unicode not working

From
Hitesh Bagadiya
Date:
We did set the locate to hi_IN at initdb but sorting is not
working. One thing is that linux system is running on en_US
locale. Does this makes any difference?

hitesh


--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> > Our database contains Hindi as well as English characters.
> We
> > have specified the encoding to be unicode during initdb as
> well
> > as createdb commands.
> > Unfortunately sorting of the Hindi fields is not working.
>
> You need to make sure you initdb with the right locale, not
> only
> the right encoding.  I dunno which locale you want ... but if
> sort(1) sorts the way you want then Postgres should too.
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html

Re: Sorting in Unicode not working

From
Holger Klawitter
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1


> You need to make sure you initdb with the right locale, not only
> the right encoding.

So in other words, all databases inside postgres must have the same (or at
least a compatible) encoding+locale in order to allow proper sorting or other
locale dependant things?

Mit freundlichem Gruß / With kind regards
    Holger Klawitter
- --
lists <at> klawitter <dot> de
-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.2.2 (GNU/Linux)

iD8DBQFAd9fd1Xdt0HKSwgYRApjMAJ0aMilcPWmVSocljLD9PC2PpAXgvgCfRI+H
I438s/mxrVmUHLwMOceMw6E=
=yV1G
-----END PGP SIGNATURE-----


Re: Sorting in Unicode not working

From
Tom Lane
Date:
Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> We did set the locate to hi_IN at initdb but sorting is not
> working.

You should check that you have selected a database encoding that matches
what the locale expects.  Also double-check that you really do have that
locale selected (use pg_controldata, or in 7.4 just "show lc_collate").

            regards, tom lane

Re: Sorting in Unicode not working

From
Tom Lane
Date:
Holger Klawitter <lists@klawitter.de> writes:
> So in other words, all databases inside postgres must have the same (or at
> least a compatible) encoding+locale

Yup.  strcoll()'s locale setting implicitly assumes a particular
encoding (at least on the platforms I'm familiar with), and so selecting
a database encoding that's incompatible with that will give you bizarre
sorting behavior.  The apparent freedom to select a per-database
encoding is really illusory in the current PG system, at least if you
have specific ideas about what you want the sort order to be.  You
pretty much have to get it right at initdb time.

There was a thread just a day or two back on pgsql-hackers about
generalizing our locale support, which would fix this problem among
others.  I'm not sure how soon it will really happen though...

            regards, tom lane

Re: Sorting in Unicode not working

From
Hitesh Bagadiya
Date:
Thanks for your help. The command shows that the locale is
en_US. Now I will try to figure out how to correctly set locale
to hi_IN.

hitesh



--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> > We did set the locate to hi_IN at initdb but sorting is not
> > working.
>
> You should check that you have selected a database encoding
> that matches
> what the locale expects.  Also double-check that you really do
> have that
> locale selected (use pg_controldata, or in 7.4 just "show
> lc_collate").
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html

Re: Sorting in Unicode not working

From
Hitesh Bagadiya
Date:
I set the locale to hi_IN during initdb. pg_controldata and show
lc_collate both show that locale is hi_IN. But postgresql is not
returning sorted records.

What can I do next to get sorting working on postgresql?



--- Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> > We did set the locate to hi_IN at initdb but sorting is not
> > working.
>
> You should check that you have selected a database encoding
> that matches
> what the locale expects.  Also double-check that you really do
> have that
> locale selected (use pg_controldata, or in 7.4 just "show
> lc_collate").
>
>             regards, tom lane
>
> ---------------------------(end of
> broadcast)---------------------------
> TIP 4: Don't 'kill -9' the postmaster


__________________________________
Do you Yahoo!?
Yahoo! Tax Center - File online by April 15th
http://taxes.yahoo.com/filing.html

Re: Sorting in Unicode not working

From
Tom Lane
Date:
Hitesh Bagadiya <bagadiya@yahoo.com> writes:
> I set the locale to hi_IN during initdb. pg_controldata and show
> lc_collate both show that locale is hi_IN. But postgresql is not
> returning sorted records.

There's still the other point about whether the database's character
set encoding matches what the locale setting requires.

For that matter, are you certain the locale itself works?  Have you
checked that sort(1) produces the sort order you are expecting when
LC_ALL=hi_IN?

            regards, tom lane