Thread: Having the issue in while selecting the data and feltering in order by.

Having the issue in while selecting the data and feltering in order by.

From
nikhil raj
Date:
Hi ALL,

I have recently migrated to windows postgres server to linux postgres server.

while i run this query in windows PG server
SELECT *
FROM (VALUES ('a'), ('___b1'), ('_a1'),('a2'),('a3'),('a5'), ('a2')) t (val) order by val

I am getting the out put in this sort order
10.250.200.184 - Remote Desktop Connection 2019-08-08 14.46.46.png

Same when i run this in linux machine i am getting this out in  different sort order on the same query.
10.250.200.184 - Remote Desktop Connection 2019-08-08 14.42.27.png


Attachment

Re: Having the issue in while selecting the data and feltering inorder by.

From
Luca Ferrari
Date:
On Thu, Aug 8, 2019 at 11:20 AM nikhil raj <nikhilraj474@gmail.com> wrote:
> Same when i run this in linux machine i am getting this out in  different sort order on the same query.

A collation problem?
What does this query do?
SELECT *
FROM (VALUES ('a'), ('___b1'), ('_a1'),('a2'),('a3'),('a5'), ('a2')) t
(val) order by val COLLATE "C";



Re: Having the issue in while selecting the data and feltering inorder by.

From
Francisco Olarte
Date:
Nikhil ( or is it Raj? ):

On Thu, Aug 8, 2019 at 11:47 AM nikhil raj <nikhilraj474@gmail.com> wrote:
> Here are the detailes.
> Windows :
Zapped image showing windows locale "English-united-states.1252"
> linux :
Zapped image showing Linux locale "En-us.utf-8"
> both are in same version 11.3
> How can i get the sorting order by like windows how to change the lc_collate  Please can you help me over here  OR
elseis there an other way we get the sort order like window through code 

1st, it is a locale mismatch problem. I do not own windows, so I can
not test it, but IIRC win1252 is a latin-1 variant. I do not know how
their locales sort, but it seems they are doing something like the
C-utf-8 locale.

You could try to recreate the database using the same locale as in
windows. If you are able to do this ( dump, recreate, restore) it may
be the easier way. I do not know if there are windows-like locales for
linux, of how does that locale sort ( I mean, how does it process
things like á or ö or ñ, your example show only the most basic stuff
). You could probably use C locale, in ascii, win1252 or utf-8
depending on your data content to get good results ( locale comes from
the OS, but pg has the charsets ). With the restricted data in your
examples, just switching to C locale will suffice, but I do not know
if you have some more complex examples / sort orders ( someone
knowledgeable in windows may be able to help. It seems like windows
locales are not as sophisticated as Linux one, and IIRC this was true
20 years ago when I used them ).

Other option is to modify either your columns, to include a collate
option in the relevant ones, or the order-by clauses in the relevant
queries, which support the collate clause to. You may want to read
https://www.postgresql.org/docs/11/charset.html and investigate a bit
with some test dbs.

I would test something like creating a database with C locale, charset
win1252, this may be a good starting point. But test some more queries
in windows, i.e., things like this:

$ echo -e 'a\nA\nb\nB1' | LC_ALL=en_US.UTF-8 sort
a
A
b
B1
$ echo -e 'a\nA\nb\nB1' | LC_ALL=C.UTF-8 sort
A
B1
a
b

Because it may matter for you, and I doubt the interactions can be
properly tested without a proper dataset AND a windos pg server.

Francisco Olarte.



Re: Having the issue in while selecting the data and feltering inorder by.

From
Francisco Olarte
Date:
Luca:

On Thu, Aug 8, 2019 at 12:33 PM Luca Ferrari <fluca1978@gmail.com> wrote:
> On Thu, Aug 8, 2019 at 11:20 AM nikhil raj <nikhilraj474@gmail.com> wrote:
> > Same when i run this in linux machine i am getting this out in  different sort order on the same query.
> A collation problem?
> What does this query do?
> SELECT *
> FROM (VALUES ('a'), ('___b1'), ('_a1'),('a2'),('a3'),('a5'), ('a2')) t
> (val) order by val COLLATE "C";

It will probacly do the expected thing for him, but I think another
problem may be what does this query:

SELECT * FROM (VALUES
('0'),('1'),('a'),('A'),('á'),('à'),('b'),('B'),('ñ'),('Ñ')) order by
1;

Does on the Windows server, what he is trying to replicate. Finding a
locale which does not ignore _ as the windows one does is easy, but
replicating all the doodahs may be difficult without knowing how the
windows locale sorts all the chars ( and I'm not getting at 1a vs a1,
or _ vs a vs A).

I suspect C locale may be ok though.

Francisco Olarte.