Thread: a strange order by behavior

a strange order by behavior

From
Eyal Wilde
Date:
the database collation is: en_US.UTF-8

drop table t1;
create table t1 (recid int ,f1 varchar(20));
insert into t1 values (1,'a');
insert into t1 values (2,' ');
insert into t1 values (3,'aa');
insert into t1 values (4,' a');
select * from t1 order by f1

result:

recid  f1
2      " "        
1      "a"        -- 2 comes before 1 because space is smaller then 'a'. fine.
4      " a"       -- now is see that 1 comes before 4 because space is greater then 'a' !?
3      "aa"       -- now again, 4 comes before 3 because space is smaller the 'a' !?!

can someone explain what's going on?

thanks!

Re: a strange order by behavior

From
Samuel Gendler
Date:


On Tue, Jun 21, 2011 at 3:37 AM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
the database collation is: en_US.UTF-8

drop table t1;
create table t1 (recid int ,f1 varchar(20));
insert into t1 values (1,'a');
insert into t1 values (2,' ');
insert into t1 values (3,'aa');
insert into t1 values (4,' a');
select * from t1 order by f1

result:

recid  f1
2      " "        
1      "a"        -- 2 comes before 1 because space is smaller then 'a'. fine.
4      " a"       -- now is see that 1 comes before 4 because space is greater then 'a' !?
3      "aa"       -- now again, 4 comes before 3 because space is smaller the 'a' !?!


I seem to recall a thread here about it ignoring spaces entirely in that collation (and maybe ignoring capitalization, too?).  I went to go test that assertion by initializing a database with C collation and got some complaint about it being incompatible with my template1 template database.  I initialized a db off of template0 and then got side tracked and you've only just reminded me of it.  I was planning to test whether it is safe to use UTF-8 for encoding but use C collation, and then maybe investigate other collations.

This worked:

createdb  -E UTF-8 --lc-collate=C some_db

so it should be easy enough to play around with it some.  I'm not sure how to get a list of valid collations for any given charset, and it seems like C collation would generate somewhat strange results with non-ascii characters (french accented characters are supposed to alphabetize in some unexpected manner, I believe), so there must be something better - closer to UTF-8 collation but without ignoring whitespace and such.  A quick google search reveals that there is some kind of standard for unicode collation (http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is represented by the en_US.UTF-8 collation or not.  I've got no real experience with this stuff.

It appears that there are differences regarding collation in recent versions - the beta docs for 9.1 show that you can set collation on individual operations or differently for different columns (http://www.postgresql.org/docs/9.1/static/collation.html ).  I see nothing like that in 8.4 docs.

It definitely looks like we both need to have a read of the localization chapter of the docs for our database version as there is a bunch of stuff in there that I was surprised to read when I just did a quick scan - like using anything but C or posix is much slower and can produce incorrect results in a 'like' query

It looks like the docs prior to 9.1beta have no real reference to collation at all, so it's down to trial and error unless someone in the know speaks up.

--sam


Re: a strange order by behavior

From
Pavel Stehule
Date:
Hello

a equalent of C collate for UTF8 is ucs_basic

Regards

Pavel Stehule

2011/6/22 Samuel Gendler <sgendler@ideasculptor.com>:
>
>
> On Tue, Jun 21, 2011 at 3:37 AM, Eyal Wilde <eyal@impactsoft.co.il> wrote:
>>
>> the database collation is: en_US.UTF-8
>> drop table t1;
>> create table t1 (recid int ,f1 varchar(20));
>> insert into t1 values (1,'a');
>> insert into t1 values (2,' ');
>> insert into t1 values (3,'aa');
>> insert into t1 values (4,' a');
>> select * from t1 order by f1
>> result:
>> recid  f1
>> 2      " "
>> 1      "a"        -- 2 comes before 1 because space is smaller then 'a'.
>> fine.
>> 4      " a"       -- now is see that 1 comes before 4 because space is
>> greater then 'a' !?
>> 3      "aa"       -- now again, 4 comes before 3 because space is smaller
>> the 'a' !?!
>
> I seem to recall a thread here about it ignoring spaces entirely in that
> collation (and maybe ignoring capitalization, too?).  I went to go test that
> assertion by initializing a database with C collation and got some complaint
> about it being incompatible with my template1 template database.  I
> initialized a db off of template0 and then got side tracked and you've only
> just reminded me of it.  I was planning to test whether it is safe to use
> UTF-8 for encoding but use C collation, and then maybe investigate other
> collations.
> This worked:
> createdb  -E UTF-8 --lc-collate=C some_db
> so it should be easy enough to play around with it some.  I'm not sure how
> to get a list of valid collations for any given charset, and it seems like C
> collation would generate somewhat strange results with non-ascii characters
> (french accented characters are supposed to alphabetize in some unexpected
> manner, I believe), so there must be something better - closer to UTF-8
> collation but without ignoring whitespace and such.  A quick google search
> reveals that there is some kind of standard for unicode collation
> (http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what
> is represented by the en_US.UTF-8 collation or not.  I've got no real
> experience with this stuff.
> It appears that there are differences regarding collation in recent versions
> - the beta docs for 9.1 show that you can set collation on individual
> operations or differently for different columns
> (http://www.postgresql.org/docs/9.1/static/collation.html ).  I see nothing
> like that in 8.4 docs.
> It definitely looks like we both need to have a read of the localization
> chapter of the docs for our database version as there is a bunch of stuff in
> there that I was surprised to read when I just did a quick scan - like using
> anything but C or posix is much slower and can produce incorrect results in
> a 'like' query
> It looks like the docs prior to 9.1beta have no real reference to collation
> at all, so it's down to trial and error unless someone in the know speaks
> up.
> --sam
>
>


Re: a strange order by behavior

From
Samuel Gendler
Date:
Interesting.  The original thread to which I was referring has a subject of "Sorting Issue" and the original request showed a list of vehicle model names which were sorting as though there were no spaces. The user had collation set to en_US.UTF-8.  However, my database (on OS X) sorts both his example and the example that started this thread correctly, despite my lc_collate being set to the same value.

Then I just ran the exact same test on a Centos 5 linux host (postgresql 8.4 as well) and I am seeing the 'erroneous' sort order that prompted both this thread and the other.  So you can't even assume the same behaviour for the same collation on different platforms.

On OS X:

# \l
       Name        |   Owner    | Encoding |  Collation  |    Ctype    |   Access privileges   
-------------------+------------+----------+-------------+-------------+-----------------------
 col_test          | u1         | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 
 
# show lc_collate;
 lc_collate  
-------------
 en_US.UTF-8

# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan

On CentOS 5:

# \l
       Name        |   Owner    | Encoding |  Collation  |    Ctype    |   Access privileges   
-------------------+------------+----------+-------------+-------------+-----------------------
 col_test          | u1         | UTF8     | en_US.UTF-8 | en_US.UTF-8 | 

# show lc_collate
ecorithm-# ;
 lc_collate  
-------------
 en_US.UTF-8

# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Further testing would seem to reveal that OS X is using something resembling C collation order, despite the fact that it says en_US.UTF-8. I say this because it is also case sensitive.  CentOS, on the other hand, is ignoring spaces, but is also case-insensitive.

# select * from t1 order by f1 asc;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | cx Sedan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXS Sedan

Pavel suggested using a collation of ucs_basic, but I get an error when I try that on linux:

$ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test 
createdb: database creation failed: ERROR:  invalid locale name ucs_basic

I was able to create the db with --lc_collate=C and get case-sensitive sorting that treats spaces 'correctly,' but I have no idea how reliable that is with multibyte characters and it almost certainly doesn't handle accented characters correctly in languages that have a mix of ascii and non-ascii characters, since the non-ascii chars will all sort as greater than the ascii chars, which is probably not how things are alphabetized in those languages.

I happen to have my computer set so I can type cyrillic and it appears to work correctly with non-ascii chars when the entire charset is non-ascii, but it'd be a lot of effort to construct a test that utilizes a mix of 1,2,3, and 4 byte characters, since I'm not familiar with enough alphabets.

test=# select * from t1 order by f1;
 id |        f1         
----+-------------------
  1 | CX Hatchback
  2 | CX Minivan
  2 | CX Plus Minivan
  2 | CX Sedan
  2 | CX Sport Utility
  2 | CXL Minivan
  2 | CXL Premium Sedan
  2 | CXL Sedan
  2 | CXL Sport Utility
  2 | CXL Turbo Sedan
  2 | CXS Sedan
  2 | cx Sedan
  2 | АИВЕТ
  2 | П АИВЕТ
  2 | П РИВЕТ
  2 | П СИВЕТ
  2 | ПРИВЕТ
  2 | РИВЕТ
  2 | СИВЕТ
  2 | привет


Re: a strange order by behavior

From
Pavel Stehule
Date:
Hello Peter


> Pavel suggested using a collation of ucs_basic, but I get an error when I
> try that on linux:
> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> createdb: database creation failed: ERROR:  invalid locale name ucs_basic

isn't this a bug in collations?

Regards

Pavel

> I was able to create the db with --lc_collate=C and get case-sensitive
> sorting that treats spaces 'correctly,' but I have no idea how reliable that
> is with multibyte characters and it almost certainly doesn't handle accented


Re: a strange order by behavior

From
Samuel Gendler
Date:


On Wed, Jun 22, 2011 at 2:39 AM, Samuel Gendler <sgendler@ideasculptor.com> wrote:

I was able to create the db with --lc_collate=C and get case-sensitive sorting that treats spaces 'correctly,' but I have no idea how reliable that is with multibyte characters and it almost certainly doesn't handle accented characters correctly in languages that have a mix of ascii and non-ascii characters, since the non-ascii chars will all sort as greater than the ascii chars, which is probably not how things are alphabetized in those languages.


'locale -a' at a command prompt will list all locale's available on a host. I guess there is no universal utf-8 collation, which isn't that surprising, given how utf-8 works, though I had hoped that there might be one that at least attempts to handle all languages that share character codes in some kind of sensible manner.  Perhaps the details of UTF-8 encoding make that effectively impossible.  But it looks as though if you want to handle accented characters, you would need to set collation to something like fr_FR.UTF-8, though it looks like all of the UTF-8 collations ignore spaces and are case-insensitive.  I don't know of a way to determine what the sorting behaviour of an arbitrary collation might be other than experimentation.  You can set LC_ALL to the locale in question and pipe a test file through the sort command to do a quick test:

LC_ALL=en_US.UTF-8 sort < test.txt

I checked variants like .UTF-8, .iso88591, .iso885915@euro and they all ignore spaces in all of the languages I checked.  It appears that the only way to get a space-sensitive sort order is to use C collation.  Same goes for capitalization, I think.  Which is pretty ugly, if you ask me.
It looks like it is theoretically possible to modify a locale's collation behaviour.  On my linux system, all locale's with the space-ignoring, capital-ignoring collation have this entry in the locale definitions in /usr/share/i18n/locales:
LC_COLLATE 
 % Copy the template from ISO/IEC 14651 
copy "iso14651_t1" 
 END LC_COLLATE
Looking at the iso14651_t1 file, I see this:
# Any character not precisely specified will be considered as a special
# character and considered only at the last level.
# <U0000>......<U7FFFFFFF> IGNORE;IGNORE;IGNORE;<U0000>......<U7FFFFFFF>
#
# SYMB.                                N° GLY
#
<U0020> IGNORE;IGNORE;IGNORE;<U0020> # 32 <SP>
<U005F> IGNORE;IGNORE;IGNORE;<U005F> # 33 _
<U0332> IGNORE;IGNORE;IGNORE;<U0332> # 34 <"_>
<U00AF> IGNORE;IGNORE;IGNORE;<U00AF> # 35 - (MACRON)
<U00AD> IGNORE;IGNORE;IGNORE;<U00AD> # 36 <SHY>
<U002D> IGNORE;IGNORE;IGNORE;<U002D> # 37 -
<U002C> IGNORE;IGNORE;IGNORE;<U002C> # 38 ,
<U003B> IGNORE;IGNORE;IGNORE;<U003B> # 39 ;
<U003A> IGNORE;IGNORE;IGNORE;<U003A> # 40 :
 

There doesn't seem to be any obvious discussion of how to structure the LC_COLLATE section of a locale definition, but it is clear that telling it to ignore spaces and punctuation is the problem we are seeing.  That seems like a very strange choice to make as the default for all UTF-8 locales.  I can't find the locale definitions for OS X, which might reveal a more sensible LC_COLLATE or may just be defaulting back to C collation for everything.



Re: a strange order by behavior

From
Samuel Gendler
Date:


On Wed, Jun 22, 2011 at 3:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello Peter


> Pavel suggested using a collation of ucs_basic, but I get an error when I
> try that on linux:
> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> createdb: database creation failed: ERROR:  invalid locale name ucs_basic

isn't this a bug in collations?

The more I read about this, the more this would appear to be the case. It looks like the SQL standard has some baseline collations that are required and it isn't at all clear how one would access those in postgres if the host in question doesn't have those locale's defined on the host. UCS_BASIC is a SQL collation, but doesn't appear to have an explicit definition on a 'standard' linux host (CentOS 5, in my case).  There is another SQL collation called 'UNICODE' which is supposed to obey the Unicode Collation Algorithm with the Default Unicode Collation Element Table defined in Unicode10.  It looks like that collation is relatively sensitive to language-specific sort orders, though it isn't a required collation in the sql standard.  I suspect that it is the UNICODE collation which actually would appear to be the most 'sensible' within the context of this discussion - characters in expected order, spaces honoured, case sensitive. I have so little experience with localization that I'm not sure if I'm reading this all correctly, though.

 

Re: a strange order by behavior

From
Samuel Gendler
Date:


On Wed, Jun 22, 2011 at 3:39 AM, Samuel Gendler <sgendler@ideasculptor.com> wrote:


On Wed, Jun 22, 2011 at 3:01 AM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
Hello Peter


> Pavel suggested using a collation of ucs_basic, but I get an error when I
> try that on linux:
> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> createdb: database creation failed: ERROR:  invalid locale name ucs_basic

isn't this a bug in collations?

The more I read about this, the more this would appear to be the case. It looks like the SQL standard has some baseline collations that are required and it isn't at all clear how one would access those in postgres if the host in question doesn't have those locale's defined on the host. UCS_BASIC is a SQL collation, but doesn't appear to have an explicit definition on a 'standard' linux host (CentOS 5, in my case).  There is another SQL collation called 'UNICODE' which is supposed to obey the Unicode Collation Algorithm with the Default Unicode Collation Element Table defined in Unicode10.  It looks like that collation is relatively sensitive to language-specific sort orders, though it isn't a required collation in the sql standard.  I suspect that it is the UNICODE collation which actually would appear to be the most 'sensible' within the context of this discussion - characters in expected order, spaces honoured, case sensitive. I have so little experience with localization that I'm not sure if I'm reading this all correctly, though.

 

My final bit of research before going to bed - 

If you are stuck with UTF-8 collation, you can do C-style collation in order by clauses like this:

select * from t1 order by f1 using ~<~
select * from t1 order by f1 using ~>~

If you want it to be able to use an index for that (or you want to use LIKE in your where clause with that column), then you need to have an index which uses C-style collation.  You can create an index on the column using text_pattern_ops or varchar_pattern_ops and then LIKE, <, and > comparisons will use that index and do comparisons like C collation, as will order by clauses with ~>~ and ~<~

create index t1_f1_pattern ON t1 (f1 varchar_pattern_ops);
create index t1_f1 ON t1 (f1);

If you have both of those indexes, postgresql will, apparently, do the correct thing based on which operations you ask of it in your query.  The pattern_ops thing is mentioned in the docs (but on a page about indexes rather than on a page about varchar and text types) and I found the ~>~ operator in an ORDER BY clause in an old mailing list email.  It may be in the docs, I'm not sure.  Is there a documentation page or wiki page consisting of 'stuff that may surprise you in postgresql?'  Such a page would be a handy place to point this kind of thing out to new users.  Most users, even relatively experienced ones, probably haven't read the documentation cover to cover.  I certainly hadn't.

9.1 allows collation to be specified on a per-column basis, but since all of the utf-8 collations ignore punctuation and space characters, you'll still have to do C-style collation to get alphabetical order that honors whitespace, punctuation, and case, but then non-ascii characters will be in the wrong order. So you can either have language sensitive sorting or punctuation- and case- sensitive sorting, but never both - at least not without defining a custom locale. Punting this defect to the OS and saying it has to be fixed there seems like a somewhat weak answer - at least without offering up a source for alternative locale definitions that do sorting according to UNICODE collation.  

I'm actually surprised that european users aren't complaining about this all the time, but maybe european users are used to seeing things ordered in a manner which doesn't honour the 'correct' ordering of accented characters.  Actually, I wonder if the probable explanation of the lack of complaint is the fact that the kinds of fields most apps would tend to do alphabetical sorts on probably don't tend to have lots of punctuation other than spaces, so perhaps the language sensitive sorts are deemed sufficient because most people don't notice the funky behaviour with punctuation and whitespace while case-insensitive sort is probably desired most of the time.

Re: a strange order by behavior

From
Pavel Stehule
Date:
> I'm actually surprised that european users aren't complaining about this all
> the time, but maybe european users are used to seeing things ordered in a
> manner which doesn't honour the 'correct' ordering of accented characters.
>  Actually, I wonder if the probable explanation of the lack of complaint is
> the fact that the kinds of fields most apps would tend to do alphabetical
> sorts on probably don't tend to have lots of punctuation other than spaces,
> so perhaps the language sensitive sorts are deemed sufficient because most
> people don't notice the funky behaviour with punctuation and whitespace
> while case-insensitive sort is probably desired most of the time.

I checked czech UTF8 collation and it is correct

postgres=# select * from x order by a collate ucs_basic;    a
-----------ChromečkaCrhaSemerádSynZárubaŠebíšek
(6 rows)

postgres=# select * from x order by a collate "cs_CZ";    a
-----------CrhaChromečkaSemerádSynŠebíšekZáruba
(6 rows)

Regards

Pavel Stehule


Re: a strange order by behavior

From
Tom Lane
Date:
Samuel Gendler <sgendler@ideasculptor.com> writes:
> Interesting.  The original thread to which I was referring has a subject of
> "Sorting Issue" and the original request showed a list of vehicle model
> names which were sorting as though there were no spaces. The user had
> collation set to en_US.UTF-8.  However, my database (on OS X) sorts both his
> example and the example that started this thread correctly, despite my
> lc_collate being set to the same value.

OS X's support for comparisons in UTF-8 locales is widely understood
to be broken.  In general, as you've found out, there's little
compatibility in sort order across platforms; but OS X is just
completely unlike other modern Unixoid platforms.  I would not take it
as the definition of "correct".

AFAIK, if you're looking for strict character-by-character sorting
rather than those weird "dictionary" rules, C locale (a/k/a POSIX
locale) is your only choice, and you have to put up with the odd sort
order you will get for all non-ASCII characters.  In principle a locale
could be defined that does character-by-character but does something
reasonable with non-ASCII; but none of the major platforms seem to offer
one.

Also, just to be perfectly clear: this is not Postgres' fault, it's
just sorting the way strcoll() says to.  You'll get the same sort
order from the command-line sort(1) program, if you feed it the same
data in the same locale environment.
        regards, tom lane


Re: a strange order by behavior

From
Peter Eisentraut
Date:
On ons, 2011-06-22 at 01:43 -0700, Samuel Gendler wrote:
> I seem to recall a thread here about it ignoring spaces entirely in that
> collation (and maybe ignoring capitalization, too?).

The way it works is that every collating element (letter or other
character or character group that you sort as a unit) is assigned four
weights (primary, secondary, tertiary, and quaternary), and the sorting
then first compares the primary weights, then the secondary weights,
etc.  The primary weight typically indicates the overall sort order,
like A before B, the secondary weight has to do with diacritic marks,
the tertiary with letter case, and the fourth level is only used in
special cases.  So that's why it looks as though the capitalization is
"ignored" unless both the primary and secondary weights are the same.

> This worked:
> 
> createdb  -E UTF-8 --lc-collate=C some_db
> 
> A quick google search
> reveals that there is some kind of standard for unicode collation (
> http://www.unicode.org/reports/tr10/ ) and I have no idea if that is what is
> represented by the en_US.UTF-8 collation or not.

At least the collate category of the en_US.UTF-8 locale on glibc is
unaltered from the ISO 14651 default ordering, which is equivalent to
the Unicode default ordering.  There several other locales for which
that is also the case.  Unfortunately, this is not exposed outside of
the glibc source code.  So you can't just select "give me a neutral
default ordering".




Re: a strange order by behavior

From
Peter Eisentraut
Date:
On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
> Pavel suggested using a collation of ucs_basic, but I get an error
> when I
> try that on linux:
> 
> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> createdb: database creation failed: ERROR:  invalid locale name ucs_basic 

ucs_basic is a collation name, which is an SQL object.  The argument of
createdb --lc-collate is an operating system locale name.  You can't mix
the two, even though they are similar.




Re: a strange order by behavior

From
Pavel Stehule
Date:
2011/6/22 Peter Eisentraut <peter_e@gmx.net>:
> On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
>> Pavel suggested using a collation of ucs_basic, but I get an error
>> when I
>> try that on linux:
>>
>> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
>> createdb: database creation failed: ERROR:  invalid locale name ucs_basic
>
> ucs_basic is a collation name, which is an SQL object.  The argument of
> createdb --lc-collate is an operating system locale name.  You can't mix
> the two, even though they are similar.
>

ok, what I can to select, when I would to use a C like default order?

Regards

Pavel

>
>


Re: a strange order by behavior

From
Samuel Gendler
Date:


On Wed, Jun 22, 2011 at 8:57 PM, Pavel Stehule <pavel.stehule@gmail.com> wrote:
2011/6/22 Peter Eisentraut <peter_e@gmx.net>:
> On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
>> Pavel suggested using a collation of ucs_basic, but I get an error
>> when I
>> try that on linux:
>>
>> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
>> createdb: database creation failed: ERROR:  invalid locale name ucs_basic
>
> ucs_basic is a collation name, which is an SQL object.  The argument of
> createdb --lc-collate is an operating system locale name.  You can't mix
> the two, even though they are similar.
>

ok, what I can to select, when I would to use a C like default order?


You can specify 'order by f1 using ~<~' or 'order by f1 using ~>~' in your query.  If you want your queries to use an index when ordering or comparing strings via LIKE, then you need to create a separate index that uses varchar_pattern_ops instead of the default varchar_ops.  The db will automatically use the varchar_pattern_ops index when appropriate.  varchar_pattern_ops uses C-like collation.

create index t1_f1_pattern on t1 (f1 varchar_pattern_ops)

 

Re: a strange order by behavior

From
Peter Eisentraut
Date:
On tor, 2011-06-23 at 05:57 +0200, Pavel Stehule wrote:
> 2011/6/22 Peter Eisentraut <peter_e@gmx.net>:
> > On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
> >> Pavel suggested using a collation of ucs_basic, but I get an error
> >> when I
> >> try that on linux:
> >>
> >> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
> >> createdb: database creation failed: ERROR:  invalid locale name ucs_basic
> >
> > ucs_basic is a collation name, which is an SQL object.  The argument of
> > createdb --lc-collate is an operating system locale name.  You can't mix
> > the two, even though they are similar.
> >
> 
> ok, what I can to select, when I would to use a C like default order?

createdb --locale=C --encoding=UTF8




Re: a strange order by behavior

From
Pavel Stehule
Date:
2011/6/23 Peter Eisentraut <peter_e@gmx.net>:
> On tor, 2011-06-23 at 05:57 +0200, Pavel Stehule wrote:
>> 2011/6/22 Peter Eisentraut <peter_e@gmx.net>:
>> > On ons, 2011-06-22 at 02:39 -0700, Samuel Gendler wrote:
>> >> Pavel suggested using a collation of ucs_basic, but I get an error
>> >> when I
>> >> try that on linux:
>> >>
>> >> $ createdb -U u1 --lc-collate=ucs_basic -E UTF-8 test
>> >> createdb: database creation failed: ERROR:  invalid locale name ucs_basic
>> >
>> > ucs_basic is a collation name, which is an SQL object.  The argument of
>> > createdb --lc-collate is an operating system locale name.  You can't mix
>> > the two, even though they are similar.
>> >
>>
>> ok, what I can to select, when I would to use a C like default order?
>
> createdb --locale=C --encoding=UTF8
>

ok, thank you

Pavel

>
>