Re: different sort order in windows and linux version - Mailing list pgsql-general

From Tomi NA
Subject Re: different sort order in windows and linux version
Date
Msg-id d487eb8e0606301029k7a217d41p45e269d23ad200f2@mail.gmail.com
Whole thread Raw
In response to Re: different sort order in windows and linux version  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: different sort order in windows and linux version  (Martijn van Oosterhout <kleptog@svana.org>)
Re: different sort order in windows and linux version  (Dragan Matic <mlists@panforma.co.yu>)
List pgsql-general
On 6/30/06, Martijn van Oosterhout <kleptog@svana.org> wrote:
> On Fri, Jun 30, 2006 at 11:56:19AM +0200, Dragan Matic wrote:
> > I have two postgres servers, one on linux (fedora core 5), one on
> > windows, both are version 8.1.4.
> >
> > Both databases are initialized with locale Croatian and win1250 encoding.
> >
> > running pg_controldata on windows returns this
> >
> > LC_COLLATE:          Croatian_Croatia.1250
> > LC_CTYPE:            Croatian_Croatia.1250
> >
> > the same command on linux returns this
> >
> > LC_COLLATE:                    hr_HR
> > LC_CTYPE:                      hr_HR
> >
> > which is the same, I suppose.
>
> Well, apparently not. Postgres makes no attempt to understand
> collations nor try to determine whether they make sense. If you want to
> have the same collation on Windows and Linux, I think you're going to
> have trouble.

Croatian_Croatia and hr_HR are, in fact, the same in that there is no
other collation for the Croatian language. Whatsmore, Dragan ran the
test using characters which are encoded exactly the same in cp1250,
utf8, iso8859-2, hell, probably even us-ascii. The fact remains that
different OSes collate differently, even for the same locale.

In C++, people use things like GTK, wxWidgets and GCL so that they
could think about "C++ code instead of the platform they're coding on.
In Java, people use things like File.separator instead of "\" or "/"
so that they could think about "Java code".
There are dozens of examples like these and most of the exceptions
stem from the influence of the at the time monopoly-holder.
When you code in the RDBMS environment, you want to code in terms of
pgsql or Oracle or MySQL or whatever: you don't want to program for
Oracle on Solaris vs. Oracle on Linux vs. Oracle on Plan9 or...well,
you get the idea.
Not beeing able to depend on the engine to consistently collate
strings as simple as the ones Dragan listed is closer to a serious bug
(non-deterministic behaviour in otherwise deterministic functions)
than a RFE, but is certainly nowhere near "it's not our problem" as it
regularly seems made up to be. The OS(es) simply and obviously
do(es)n't do a good enough job of it.

> In the past there have existed patches to allow postgres to use ICU for
> locale support. It's supposedly not quite as fast, but you will be able
> get consistant results across platforms.

Personally, I'd be perfectly happy with pgsql if I could choose to
make text operations up to 2-3x slower without the fuss of how it's
going to work on a certain platform, in each pgsql version.
Furthermore, compiling the server myself is not an option for live
usage: on my current project, I'm not even the one installing the
database servers...sending administrators a binary I configured and
compiled (on Windows, in this case!) and noone but me
tested...brrrr...I get the shivers just thinking about it.

If I sound harsh, please excuse me, but I feel like I'm the only one
who thinks these encoding problems (collation, upper/lowercase,
multiple languages in a single database) are serious...nobody seems to
share the sentiment. Ah well...

t.n.a.

pgsql-general by date:

Previous
From: "Merlin Moncure"
Date:
Subject: Re: pgsql vs mysql
Next
From: David Fetter
Date:
Subject: Re: Notes on converting from MySQL 5.0.x to PostgreSQL