Thread: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Adam PAPAI
Date:
I have a PgSQL node on a FreeBSD 8.1-RELEASE.

(PostgreSQL) 9.0.3 client/server.

pkg_info | grep 'postgresql'
postgresql-client-9.0.3
postgresql-server-9.0.3


I have a lots of UTF8 database with hu_HU.UTF8 COLLATE and LC_CTYPE

For example:

test       | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8

The problem is with the "ORDER BY", which does not sort the results
corretly.

        Table "public.erdekessegek"
 Column |          Type          | Modifiers
--------+------------------------+-----------
 lead   | character varying(255) |


test=# select * from erdekessegek  order by lead asc;
 lead
-------
 Adi
 Bedi
 Evi
 Zsolt
 Ádi
 Évi
(6 rows)

which is not correct.

It should be:

Adi
Ádi
Bedi
Evi
Évi
Zsolt

The "chars" wich accents always at the end of the order by but I don't
know why.

What can I do to make it work? I've read all docs on this topic and all
sais the LC_COLLATE and LC_CTYPE solves it, but it seems not.

Any advice would be helpful.

Thanks in advance,

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Pavel Stehule
Date:
Hello

2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:
> I have a PgSQL node on a FreeBSD 8.1-RELEASE.
>
> (PostgreSQL) 9.0.3 client/server.
>
> pkg_info | grep 'postgresql'
> postgresql-client-9.0.3
> postgresql-server-9.0.3
>
>
> I have a lots of UTF8 database with hu_HU.UTF8 COLLATE and LC_CTYPE
>
> For example:
>
> test       | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8
>
> The problem is with the "ORDER BY", which does not sort the results
> corretly.
>
>        Table "public.erdekessegek"
>  Column |          Type          | Modifiers
> --------+------------------------+-----------
>  lead   | character varying(255) |
>
>
> test=# select * from erdekessegek  order by lead asc;
>  lead
> -------
>  Adi
>  Bedi
>  Evi
>  Zsolt
>  Ádi
>  Évi
> (6 rows)
>
> which is not correct.
>
> It should be:
>
> Adi
> Ádi
> Bedi
> Evi
> Évi
> Zsolt
>
> The "chars" wich accents always at the end of the order by but I don't
> know why.
>
> What can I do to make it work? I've read all docs on this topic and all
> sais the LC_COLLATE and LC_CTYPE solves it, but it seems not.
>

You should to initialize database with correct locale. You can't to
change locale after database is created.

/usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
COMMENT ON DATABASE test IS 'UTF-8';

test=# show lc_collate ;
 lc_collate
-------------
 cs_CZ.utf-8
(1 row)

test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
ORDER BY 1;
 column1
---------
 Sís
 Syrový
 Šiška
 Vondra
(4 rows)

Regards

Pavel Stehule

> Any advice would be helpful.
>
> Thanks in advance,
>
> --
> Adam PAPAI
> BSD Support Service
> http://www.bsdsupportservice.hu
> E-mail: adam.papai@bsdsupportservice.hu
> Phone: +36 30 33-55-735 (Hungary)
>
> --
> Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-general
>

Pavel Stehule wrote:
> Hello
>
> You should to initialize database with correct locale. You can't to
> change locale after database is created.
>
> /usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
> CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
> COMMENT ON DATABASE test IS 'UTF-8';
>
> test=# show lc_collate ;
>  lc_collate
> -------------
>  cs_CZ.utf-8
> (1 row)
>
> test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
> ORDER BY 1;
>  column1
> ---------
>  Sís
>  Syrový
>  Šiška
>  Vondra
> (4 rows)
>
> Regards
>
> Pavel Stehule
>

Something is not correct for me.

I've re-initialised it again with:

--locale=hu_HU.UTF-8 --encoding=UTF-8

which generated everything LC_* to hu_HU.UTF-8.

I've created the DBs with createdb:

createdb -U pgsql "$DBNAME" -E "UTF-8" --locale="hu_HU.UTF-8"

Example:

createdb -U pgsql spiritnet -E UTF-8 --locale=hu_HU.UTF-8

spiritnet=# show lc_collate;
 lc_collate
-------------
 hu_HU.UTF-8
(1 row)

spiritnet=# show lc_ctype;
  lc_ctype
-------------
 hu_HU.UTF-8

         Name          | Owner | Encoding  |  Collation  |    Ctype
-----------------------+-------+-----------+-------------+-------------
 spiritnet             | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8


And it STILL NOT working, the order by is absolutly incorrect. All chars
with accents, are located at the END of the list. Is it a problem only
with 9.0.3?


For me, your test shows:

"Syrový"
"Sís"
"Vondra"
"Šiška"

instead your correct order.

Any ideas?

Further information:

[root@radon /usr/local/pgsql]# pg_config
BINDIR = /usr/local/bin
DOCDIR = /usr/local/share/doc/postgresql
HTMLDIR = /usr/local/share/doc/postgresql
INCLUDEDIR = /usr/local/include
PKGINCLUDEDIR = /usr/local/include/postgresql
INCLUDEDIR-SERVER = /usr/local/include/postgresql/server
LIBDIR = /usr/local/lib
PKGLIBDIR = /usr/local/lib/postgresql
LOCALEDIR = /usr/local/share/locale
MANDIR = /usr/local/man
SHAREDIR = /usr/local/share/postgresql
SYSCONFDIR = /usr/local/etc/postgresql
PGXS = /usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk
CONFIGURE = '--with-libraries=/usr/local/lib'
'--with-includes=/usr/local/include' '--enable-thread-safety'
'--with-openssl' '--with-libxml' '--enable-nls' '--with-gssapi'
'--prefix=/usr/local' '--mandir=/usr/local/man'
'--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.1'
'build_alias=amd64-portbld-freebsd8.1' 'CC=cc' 'CFLAGS=-O2 -pipe -O3
-funroll-loops -fno-strict-aliasing' 'LDFLAGS= -L/usr/local/lib
-rpath=/usr/lib:/usr/local/lib' 'CPP=cpp'
CC = cc
CPPFLAGS = -I/usr/local/include/libxml2 -I/usr/local/include
-I/usr/local/include -I/usr/local/include
CFLAGS = -O2 -pipe -O3 -funroll-loops -fno-strict-aliasing -Wall
-Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
-Wendif-labels -fno-strict-aliasing -fwrapv
CFLAGS_SL = -fPIC -DPIC
LDFLAGS = -L/usr/local/lib -rpath=/usr/lib:/usr/local/lib
-L/usr/local/lib -L/usr/local/lib -L/usr/local/lib -Wl,--as-needed
-Wl,-R'/usr/local/lib'
LDFLAGS_EX =
LDFLAGS_SL =
LIBS = -lpgport -lintl -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz
-lreadline -lcrypt -lm
VERSION = PostgreSQL 9.0.3


[root@radon /usr/local/pgsql]# pg_controldata /usr/local/pgsql/data
pg_control version number:            903
Catalog version number:               201008051
Database system identifier:           5570316617043000695
Database cluster state:               in production
pg_control last modified:             Sat Feb  5 22:58:43 2011
Latest checkpoint location:           0/9008D748
Prior checkpoint location:            0/8FF99F68
Latest checkpoint's REDO location:    0/9005FEB0
Latest checkpoint's TimeLineID:       1
Latest checkpoint's NextXID:          0/16405
Latest checkpoint's NextOID:          57344
Latest checkpoint's NextMultiXactId:  1
Latest checkpoint's NextMultiOffset:  0
Latest checkpoint's oldestXID:        654
Latest checkpoint's oldestXID's DB:   1
Latest checkpoint's oldestActiveXID:  0
Time of latest checkpoint:            Sat Feb  5 22:58:33 2011
Minimum recovery ending location:     0/0
Backup start location:                0/0
Current wal_level setting:            minimal
Current max_connections setting:      40
Current max_prepared_xacts setting:   0
Current max_locks_per_xact setting:   64
Maximum data alignment:               8
Database block size:                  8192
Blocks per segment of large relation: 131072
WAL block size:                       8192
Bytes per WAL segment:                16777216
Maximum length of identifiers:        64
Maximum columns in an index:          32
Maximum size of a TOAST chunk:        1996
Date/time type storage:               64-bit integers
Float4 argument passing:              by value
Float8 argument passing:              by value


--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Pavel Stehule
Date:
2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:
> Pavel Stehule wrote:
>> Hello
>>
>> You should to initialize database with correct locale. You can't to
>> change locale after database is created.
>>
>> /usr/local/pgsql91/bin/createdb test --locale=cs_CZ.utf-8 -e UTF-8
>> CREATE DATABASE test LC_COLLATE 'cs_CZ.utf-8' LC_CTYPE 'cs_CZ.utf-8';
>> COMMENT ON DATABASE test IS 'UTF-8';
>>
>> test=# show lc_collate ;
>>  lc_collate
>> -------------
>>  cs_CZ.utf-8
>> (1 row)
>>
>> test=# select * from (values('Sís'),('Šiška'),('Syrový'),('Vondra')) x
>> ORDER BY 1;
>>  column1
>> ---------
>>  Sís
>>  Syrový
>>  Šiška
>>  Vondra
>> (4 rows)
>>
>> Regards
>>
>> Pavel Stehule
>>
>
> Something is not correct for me.
>
> I've re-initialised it again with:
>
> --locale=hu_HU.UTF-8 --encoding=UTF-8
>
> which generated everything LC_* to hu_HU.UTF-8.
>
> I've created the DBs with createdb:
>
> createdb -U pgsql "$DBNAME" -E "UTF-8" --locale="hu_HU.UTF-8"
>
> Example:
>
> createdb -U pgsql spiritnet -E UTF-8 --locale=hu_HU.UTF-8
>
> spiritnet=# show lc_collate;
>  lc_collate
> -------------
>  hu_HU.UTF-8
> (1 row)
>
> spiritnet=# show lc_ctype;
>  lc_ctype
> -------------
>  hu_HU.UTF-8
>
>         Name          | Owner | Encoding  |  Collation  |    Ctype
> -----------------------+-------+-----------+-------------+-------------
>  spiritnet             | pgsql | UTF8      | hu_HU.UTF-8 | hu_HU.UTF-8
>
>
> And it STILL NOT working, the order by is absolutly incorrect. All chars
> with accents, are located at the END of the list. Is it a problem only
> with 9.0.3?
>
>
> For me, your test shows:
>
> "Syrový"
> "Sís"
> "Vondra"
> "Šiška"
>
> instead your correct order.
>
> Any ideas?


your system locales is correct? PostgreSQL uses only system libs

Regards

Pavel Stehule
>
> Further information:
>
> [root@radon /usr/local/pgsql]# pg_config
> BINDIR = /usr/local/bin
> DOCDIR = /usr/local/share/doc/postgresql
> HTMLDIR = /usr/local/share/doc/postgresql
> INCLUDEDIR = /usr/local/include
> PKGINCLUDEDIR = /usr/local/include/postgresql
> INCLUDEDIR-SERVER = /usr/local/include/postgresql/server
> LIBDIR = /usr/local/lib
> PKGLIBDIR = /usr/local/lib/postgresql
> LOCALEDIR = /usr/local/share/locale
> MANDIR = /usr/local/man
> SHAREDIR = /usr/local/share/postgresql
> SYSCONFDIR = /usr/local/etc/postgresql
> PGXS = /usr/local/lib/postgresql/pgxs/src/makefiles/pgxs.mk
> CONFIGURE = '--with-libraries=/usr/local/lib'
> '--with-includes=/usr/local/include' '--enable-thread-safety'
> '--with-openssl' '--with-libxml' '--enable-nls' '--with-gssapi'
> '--prefix=/usr/local' '--mandir=/usr/local/man'
> '--infodir=/usr/local/info/' '--build=amd64-portbld-freebsd8.1'
> 'build_alias=amd64-portbld-freebsd8.1' 'CC=cc' 'CFLAGS=-O2 -pipe -O3
> -funroll-loops -fno-strict-aliasing' 'LDFLAGS= -L/usr/local/lib
> -rpath=/usr/lib:/usr/local/lib' 'CPP=cpp'
> CC = cc
> CPPFLAGS = -I/usr/local/include/libxml2 -I/usr/local/include
> -I/usr/local/include -I/usr/local/include
> CFLAGS = -O2 -pipe -O3 -funroll-loops -fno-strict-aliasing -Wall
> -Wmissing-prototypes -Wpointer-arith -Wdeclaration-after-statement
> -Wendif-labels -fno-strict-aliasing -fwrapv
> CFLAGS_SL = -fPIC -DPIC
> LDFLAGS = -L/usr/local/lib -rpath=/usr/lib:/usr/local/lib
> -L/usr/local/lib -L/usr/local/lib -L/usr/local/lib -Wl,--as-needed
> -Wl,-R'/usr/local/lib'
> LDFLAGS_EX =
> LDFLAGS_SL =
> LIBS = -lpgport -lintl -lxml2 -lssl -lcrypto -lgssapi_krb5 -lz
> -lreadline -lcrypt -lm
> VERSION = PostgreSQL 9.0.3
>
>
> [root@radon /usr/local/pgsql]# pg_controldata /usr/local/pgsql/data
> pg_control version number:            903
> Catalog version number:               201008051
> Database system identifier:           5570316617043000695
> Database cluster state:               in production
> pg_control last modified:             Sat Feb  5 22:58:43 2011
> Latest checkpoint location:           0/9008D748
> Prior checkpoint location:            0/8FF99F68
> Latest checkpoint's REDO location:    0/9005FEB0
> Latest checkpoint's TimeLineID:       1
> Latest checkpoint's NextXID:          0/16405
> Latest checkpoint's NextOID:          57344
> Latest checkpoint's NextMultiXactId:  1
> Latest checkpoint's NextMultiOffset:  0
> Latest checkpoint's oldestXID:        654
> Latest checkpoint's oldestXID's DB:   1
> Latest checkpoint's oldestActiveXID:  0
> Time of latest checkpoint:            Sat Feb  5 22:58:33 2011
> Minimum recovery ending location:     0/0
> Backup start location:                0/0
> Current wal_level setting:            minimal
> Current max_connections setting:      40
> Current max_prepared_xacts setting:   0
> Current max_locks_per_xact setting:   64
> Maximum data alignment:               8
> Database block size:                  8192
> Blocks per segment of large relation: 131072
> WAL block size:                       8192
> Bytes per WAL segment:                16777216
> Maximum length of identifiers:        64
> Maximum columns in an index:          32
> Maximum size of a TOAST chunk:        1996
> Date/time type storage:               64-bit integers
> Float4 argument passing:              by value
> Float8 argument passing:              by value
>
>
> --
> Adam PAPAI
> BSD Support Service
> http://www.bsdsupportservice.hu
> E-mail: adam.papai@bsdsupportservice.hu
> Phone: +36 30 33-55-735 (Hungary)
>

Pavel Stehule wrote:
> 2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:

> your system locales is correct? PostgreSQL uses only system libs

I've tested it with a fresh 8.4 and 9.0.

It's the same.

My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not working.

initdb output:

[root@titanium /usr/home/wooh]# /usr/local/etc/rc.d/postgresql initdb
The files belonging to this database system will be owned by user "pgsql".
This user must also own the server process.

The database cluster will be initialized with locale hu_HU.UTF-8.
The default text search configuration will be set to "hungarian".

creating directory /usr/local/pgsql/data ... ok
creating subdirectories ... ok
selecting default max_connections ... 40
selecting default shared_buffers ... 28MB
creating configuration files ... ok
creating template1 database in /usr/local/pgsql/data/base/1 ... ok
initializing pg_authid ... ok
initializing dependencies ... ok
creating system views ... ok
loading system objects' descriptions ... ok
creating conversions ... ok
creating dictionaries ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
loading PL/pgSQL server-side language ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

Perhaps the new FreeBSD 8.1 does something wrong. I'm gonna check it on
7.0/7.1/7.2.



--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

Adam PAPAI wrote:
> Pavel Stehule wrote:
>> 2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:
>
>> your system locales is correct? PostgreSQL uses only system libs
>
> I've tested it with a fresh 8.4 and 9.0.
>
> It's the same.
>
> My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not working.


I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
encoding=LATIN2, It's working correctly.

But not with locale=hu_HU.UTF-8, encoding=UTF-8

Is it related to the FreeBSD team or the PostgreSQL team?

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Martijn van Oosterhout
Date:
On Sun, Feb 06, 2011 at 11:02:25AM +0100, Adam PAPAI wrote:
> I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
> encoding=LATIN2, It's working correctly.
>
> But not with locale=hu_HU.UTF-8, encoding=UTF-8
>
> Is it related to the FreeBSD team or the PostgreSQL team?

Last I checked *BSD did not support sorting in UTF-8. I know Apple
added it themselves because they needed it but I don't think it got
backported to *BSD.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> Patriotism is when love of your own people comes first; nationalism,
> when hate for people other than your own comes first.
>                                       - Charles de Gaulle

Attachment

Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Radosław Smogura
Date:
You need to create database with LC_COLLATE="hu_HU.utf8", e.g.

CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
LC_COLLATE='pl_PL.utf8';

Kind regards,
Radosław Smogura
http://www.softperience.eu

Adam PAPAI <adam.papai@bsdsupportservice.hu> Sunday 06 February 2011 11:02:25
> Adam PAPAI wrote:
> > Pavel Stehule wrote:
> >> 2011/2/5 Adam PAPAI <adam.papai@bsdsupportservice.hu>:
> >>
> >> your system locales is correct? PostgreSQL uses only system libs
> >
> > I've tested it with a fresh 8.4 and 9.0.
> >
> > It's the same.
> >
> > My FreeBSD 8.1 supports hu_HU.UTF-8, but I don't know why it's not
> > working.
>
> I've tested it with 8.4 and 9.0 with locale=hu_HU.ISO8859-2,
> encoding=LATIN2, It's working correctly.
>
> But not with locale=hu_HU.UTF-8, encoding=UTF-8
>
> Is it related to the FreeBSD team or the PostgreSQL team?

Radosław Smogura wrote:
> You need to create database with LC_COLLATE="hu_HU.utf8", e.g.
>
> CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
> LC_COLLATE='pl_PL.utf8';
>

Are you running it under FreeBSD?

--
Adam PAPAI
BSD Support Service
http://www.bsdsupportservice.hu
E-mail: adam.papai@bsdsupportservice.hu
Phone: +36 30 33-55-735 (Hungary)

Re: PostgreSQL 9.0.3 - collate/ctype order by hu_HU.UTF8 is not correct sorting

From
Radosław Smogura
Date:
Adam PAPAI <adam.papai@bsdsupportservice.hu> Sunday 06 February 2011 14:13:51
> Radosław Smogura wrote:
> > You need to create database with LC_COLLATE="hu_HU.utf8", e.g.
> >
> > CREATE DATABASE tx2 ENCODING='UTF-8' TEMPLATE=template0
> > LC_COLLATE='pl_PL.utf8';
>
> Are you running it under FreeBSD?

No, Gentoo. But, without creating db with LC_COLLATE I get wrong sroting too.