Hi Tom,
I did understand what you said, I apologize that it came out otherwise.
I'm just looking for the correct workaround.
If initdb was done with a C locale, and thus lc_collate and friends where all C, but the database and client encoding was set to UTF-8, would postgres convert data on the fly from UTF-8(storage) to ASCII for sorting or would things just blow up when a >1 byte character hit the mix?
The docs say bad things would happen:
http://www.postgresql.org/docs/8.2/static/multibyte.html Important: Although you can specify any encoding you want for a database, it is unwise to choose an encoding that is not what is expected by the locale you have selected. The
LC_COLLATE and
LC_CTYPE settings imply a particular encoding, and locale-dependent operations (such as sorting) are likely to misinterpret data that is in an incompatible encoding.
Right now for me ORDER BY LOWER(ASCII(column)), LOWER(column) (or some variation there of) works, but is there a better workaround?
Thanks,
-Cody
Tom Lane wrote:
Cody Pisto <cpisto@rvweb.com> writes:
If this is potentially a problem in postgres somewhere, point me in the
general direction and I'm more than willing to fix it myself..
You seem not to have absorbed what I said. This *is* the correct result
according to that locale's definition of sorting. You can demonstrate
that without any use of Postgres:
[tgl@rh2 ~]$ cat fooey
Somethang
-SOMETHING ELSE-
Something else
[tgl@rh2 ~]$ LANG=C sort fooey
-SOMETHING ELSE-
Somethang
Something else
[tgl@rh2 ~]$ LANG=en_US sort fooey
Somethang
Something else
-SOMETHING ELSE-
[tgl@rh2 ~]$
If you prefer C sort ordering, run Postgres in C locale. It's as
simple as that.
regards, tom lane
--
*Cody Pisto*
Redzia RVs
10555 Montgomery NE
Suite 80
Albuquerque, NM 87111
Phone: (866) 844-1986
Fax: (866) 204-4403