Thread: Sort order confusion
I am suffering some sort order confusion. Given a database, "foo", with a single character(4) column of data left padded with spaces I get: select * from foo order by somechars; somechars ----------- 0 1 10 100 1011 111 1512 2 222 3 333 The output I desire is a basic ASCII sort: somechars ----------- 0 1 2 3 10 100 111 222 333 1011 1512 (This sample set just has spaces and numeric digits but could have other characters - I want the output in ASCII sort order) 1. What is the correct way to do this? 2. How do I verify the locale setting of an existing database cluster (to verify that I really initialized it as "initdb -d --locale=C -D /var/lib/pgsql/data")? 3. Should I have included "--enable-locale" or similar option when I built Postgresql (the build is vanilla 7.4.1 "./configure ; make ; make install")? Cheers, Steve
On Fri, 2004-02-06 at 11:12, Steve Crawford wrote: > I am suffering some sort order confusion. Given a database, "foo", > with a single character(4) column of data left padded with spaces I > get: > > select * from foo order by somechars; > > somechars > ----------- > 0 > 1 > 10 > 100 > 1011 > 111 > 1512 > 2 > 222 > 3 > 333 > > The output I desire is a basic ASCII sort: > > somechars > ----------- > 0 > 1 > 2 > 3 > 10 > 100 > 111 > 222 > 333 > 1011 > 1512 > > (This sample set just has spaces and numeric digits but could have > other characters - I want the output in ASCII sort order) Your original sort is a basic lexigraphic ("alphabetical" by ASCII character set number) sort. What you appear to want is a numeric sort, where the numbers come out in the order of numbers, rather than in their ASCII character set order. If there were just digits that would be fairly easy, but I can't see any way (short of post-processing the list in other software or writing yourself a stored procedure) to do it in SQL. Stephen
Attachment
On Thursday 05 February 2004 4:21 pm, Stephen Robert Norris wrote: > On Fri, 2004-02-06 at 11:12, Steve Crawford wrote: > > I am suffering some sort order confusion. Given a database, > > "foo", with a single character(4) column of data left padded with > > spaces I get: > > > > select * from foo order by somechars; > > > > somechars > > ----------- > > 0 > > 1 > > 10 > > 100 > > 1011 > > 111 > > 1512 > > 2 > > 222 > > 3 > > 333 <snip> > Your original sort is a basic lexigraphic ("alphabetical" by ASCII > character set number) sort. > > What you appear to want is a numeric sort, where the numbers come > out in the order of numbers, rather than in their ASCII character > set order. Not exactly. I _DO_ want it in ASCII character set order which includes spaces (0x20) sorting ahead of digits (0x30 - 0x39). This is not what is happening. The first sort is some SQL sort order that seems to ignore certain characters. Note the different sort order if I pad with 'x' instead of '<space>': somechars ----------- 1011 1512 x100 x111 x222 x333 x444 x555 x666 x777 x888 xx10 xx44 xx55 xxx0 xxx1 xxx2 xxx3 xxx4 xxx5 xxx6 xxx7 xxx8 xxx9 xxxx Naturally if I were dealing with fields guaranteed to have something that would convert to an int I could just order by, say, int4(somechars) but that is not the case. I even tried the to_ascii function but apparently that's the wrong approach: ERROR: encoding conversion from SQL_ASCII to ASCII not supported Cheers, Steve
On Fri, Feb 06, 2004 at 11:21:41AM +1100, Stephen Robert Norris wrote: > On Fri, 2004-02-06 at 11:12, Steve Crawford wrote: > > I am suffering some sort order confusion. Given a database, "foo", > > with a single character(4) column of data left padded with spaces I > > get: > > > > select * from foo order by somechars; Cast it: alvherre=> select * from foo order by bar::text::int; bar ------ 0 1 10 33 100 101 333 503 (8 filas) -- Alvaro Herrera (<alvherre[a]dcc.uchile.cl>) "La naturaleza, tan frágil, tan expuesta a la muerte... y tan viva"
On Thursday 05 February 2004 5:01 pm, Alvaro Herrera wrote: > On Fri, Feb 06, 2004 at 11:21:41AM +1100, Stephen Robert Norris wrote: > > On Fri, 2004-02-06 at 11:12, Steve Crawford wrote: > > > I am suffering some sort order confusion. Given a database, > > > "foo", with a single character(4) column of data left padded > > > with spaces I get: > > > > > > select * from foo order by somechars; > > Cast it: > > alvherre=> select * from foo order by bar::text::int; > bar > ------ > 0 > 1 > 10 > 33 > 100 > 101 > 333 > 503 > (8 filas) Can't. As noted in the original post the column may contain data that won't convert to an int (all spaces, characters, punctuation). I'm seeking generic true ASCII sort order. Cheers, Steve
On Thu, 5 Feb 2004, Steve Crawford wrote: > 2. How do I verify the locale setting of an existing database cluster > (to verify that I really initialized it as "initdb -d --locale=C -D > /var/lib/pgsql/data")? I think it'd be pg_controldata /var/lib/pgsql/data
Steve Crawford <scrawford@pinpointresearch.com> writes: > Not exactly. I _DO_ want it in ASCII character set order which > includes spaces (0x20) sorting ahead of digits (0x30 - 0x39). This is > not what is happening. The first sort is some SQL sort order that > seems to ignore certain characters. Sounds to me like you've got the database in a non-C locale. See past discussions ... regards, tom lane
On Thursday 05 February 2004 6:08 pm, Tom Lane wrote: > Steve Crawford <scrawford@pinpointresearch.com> writes: > > Not exactly. I _DO_ want it in ASCII character set order which > > includes spaces (0x20) sorting ahead of digits (0x30 - 0x39). > > This is not what is happening. The first sort is some SQL sort > > order that seems to ignore certain characters. > > Sounds to me like you've got the database in a non-C locale. See > past discussions ... That was my first inclination (as noted in the full version of my original post - now lost to the thread) but my installation history showed: initdb -d --locale=C -D /var/lib/pgsql/data Nonetheless pg_controldata shows: LC_COLLATE: en_US LC_CTYPE: en_US I now suspect that currently active "real" database was in-fact created not by my several test initializations but by the SuSE startup script which will run an "initdb" if the database has not been initialized and which reads /etc/sysconfig/language to get default locale settings. So... Is there a method of changing the locale of an extant database or do I need to dump ; delete db ; reinit ; restore ? Cheers, Steve