Thread: sorting and spaces in postgresql with en_US locale

sorting and spaces in postgresql with en_US locale

From
Chris Kratz
Date:
We are having a weird problem that we ran into recently.  If I use the
following statements to create a test table and then run the select statement
at the end, we get a very strange sort order.  It appears that to do the
sorting, all the spaces are removed from the strings. It would appear that in
the example below 'ab e' should be before 'abd'.

create table testing_sort(col1 text);
insert into testing_sort values('a');
insert into testing_sort values('ab');
insert into testing_sort values('ab c');
insert into testing_sort values('abd');
insert into testing_sort values('ab e');

select * from testing_sort order by col1;
 col1
------
 a
 ab
 ab c
 abd
 ab e
(5 rows)

pg_controldata reports...
LC_COLLATE:                           en_US
LC_CTYPE:                             en_US

on another box, which has both LC_COLLAGE and LC_CTYPE set to C, the sorting
works as expected...

select * from testing_sort order by col1;
 col1
------
 a
 ab
 ab c
 ab e
 abd

Does anyone know if there is any other way to get the sorting to work as
expected short of doing an dumping, doing an initdb, and reloading?  Or is
there some other setting that is causing the sort to do strange things.

The only work around we have found is to create a sort column and replace all
spaces with 0 and then sort on that column.  Any other suggestions or
workarounds?

Issue was tested on both 7.4.1 and 7.3.4, both running on linux.

Thanks,

-Chris



--
Chris Kratz
Systems Analyst/Programmer
VistaShare LLC
www.vistashare.com

Re: sorting and spaces in postgresql with en_US locale

From
Oliver Elphick
Date:
On Wed, 2004-07-21 at 20:58, Chris Kratz wrote:
> We are having a weird problem that we ran into recently.  If I use the
> following statements to create a test table and then run the select statement
> at the end, we get a very strange sort order.  It appears that to do the
> sorting, all the spaces are removed from the strings. It would appear that in
> the example below 'ab e' should be before 'abd'.

> LC_COLLATE:                           en_US
> LC_CTYPE:                             en_US

That is how the en_US (and en_GB) locales are defined.

Their sorting is dictionary style: spaces and capitalisation are
ignored.  This is a glibc issue, not a PostgreSQL issue.

--
Oliver Elphick                                          olly@lfix.co.uk
Isle of Wight                              http://www.lfix.co.uk/oliver
GPG: 1024D/A54310EA  92C8 39E7 280E 3631 3F0E  1EC0 5664 7A2F A543 10EA
                 ========================================
     "Greater love hath no man than this, that a man lay
      down his life for his friends."      John 15:13