Thread: Q: How ORDER BY is being done inetrnally?
Greetings, We are trying to fix another problem we have with Turkish locale. Order in which tuples are returned by SELECT .. ORDER BY . statement differs from the one returned by UNIX sort utility or what strcoll() function will suggest. It is specific to a locale so you will not likely be able to reproduce it. I understand that with ORDER BY clause, values are compared in case-insensitive manner as well as it should be the case with the values stored in indices. And I suspect that problem is hidden somewhere there. I tried to find where this conversion and comparison is being done in source code but could not, despite all my efforts. I would very much appreciate if someone would point out what functions are being called while sorting data for return for ORDER BY clause. Thanks in advance, Nicolai Tufar
"Nicolai Tufar" <ntufar@pisem.net> writes: > We are trying to fix another problem we have with Turkish locale. > Order in which tuples are returned by SELECT .. ORDER BY . > statement differs from the one returned by UNIX sort utility or > what strcoll() function will suggest. Text sorting depends on strcoll() and nothing but. See varstr_cmp(). regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > > > Text sorting depends on strcoll() and nothing but. See varstr_cmp(). I see, apparently sort done for "ORDER BY" clause is case-sensitive. But problem is still there. It is about "I"-with-dot and "I"-without-dot in Turkish again. While all UNIX programs put "I"-without-dot before "I"-with-dot, as it should be, PostgreSQL puts it in reverse order. I examine the code for any possible gotchas, but I am confused about what function is being called by what. Especially that all those sort methods and functions are not hard-coded but stored in pg_am* catalogue tables. Could someone please explain -very briefly- what exactly is happening when a sort is performed. A kind of stack trace: which function calls which one would be very appreciated. Best regards, Nicolai Tufar
Okay, I figured out what was the problem. glibc's LC_COLLATE file under /usr/lib/locale/tr_TR is wrong! And it has been wrong for many years now. And nobody noticed it. PostgreSQL is innocent here. I checked it many times over and over again with test programs and different environment settings and it is wrong! I will contact glibc team now. Thanks a lot for help. Regards, Nicolai Tufar