Re: String comparison problem in select - too many results - Mailing list pgsql-general

From Durumdara
Subject Re: String comparison problem in select - too many results
Date
Msg-id CAEcMXhkB5=PLaBbyZWipZ8QR682ASYMo_ZfHbrme0toTBVfvDg@mail.gmail.com
Whole thread Raw
In response to Re: String comparison problem in select - too many results  (Francisco Olarte <folarte@peoplecall.com>)
Responses Re: String comparison problem in select - too many results
List pgsql-general
Dear Francesco!

My "bug" is that I commonly used Windows environment where the default collation is ANSI, based on Windows language which is Hungarian here (Windows1250).
But because of special characters we used UTF8 to store data in database.
I supposed that UTF8.hu_HU is working like local natural order here, and the common ASCII chars are (like '/') in same position.
Python/Delphi/LibreOffice can sort these numbers correctly (based on local ANSI sort).

I supposed that UTF8.hu_HU is using mostly same order which is valid here and it contains all ASCII + ANSI characters we are using here in daily work, and they are in very similar order.
I never thought that it can't handle normal characters in 7 bit range...

For these numbers I can use C collation, it's ok.

Thank you!

dd



2018-01-11 11:11 GMT+01:00 Francisco Olarte <folarte@peoplecall.com>:
On Thu, Jan 11, 2018 at 9:57 AM, Durumdara <durumdara@gmail.com> wrote:
> I tried in in different servers, different databases.
> 1.) Windows local PG:  LC_COLLATE = 'Hungarian_Hungary.1250' - ok.
> 2.) Linux remote PG: LC_CTYPE = 'en_US.UTF-8' - also wrong!!! - UTF
> problem???

Your problem seems to be you consider wrong anything that doesn't
match your expectation.

> 3.) Forcing C collation:  - ok
> 4.) Replace '/' to 'A': - ok

...  More examples zapped, as they do not prove anything.


> The main problem that we have many searches in programme where we suppose
> good evaluation, and we have more sites with different servers (and
> collation).

You must define good evaluation. In your case it seems you consider
good evaluation is lexicographical comparison of ascii char values.
This is called 'C' collation and you have been told to it.

If your "programme" is doing string comparison in server collation and
you need good comparison, defined as  C collation, that is a bug. Fix
it.

I would recommend reading
https://www.postgresql.org/docs/9.6/static/collation.html and related
docs, but someone who so boldly states postgres collations are
good/bad surely knows all about it.


Try to build from this:


with xx(x) as (values ('18/0113'),('18/0212'),('180/2010'))
select x collate "C"  from xx order by 1;

Francisco Olarte.

pgsql-general by date:

Previous
From: Andreas Joseph Krogh
Date:
Subject: ORDER BY custom type
Next
From: Andreas Joseph Krogh
Date:
Subject: Sv: ORDER BY custom type