Thread: Can't specify default collation?

Can't specify default collation?

From
Tom Lane
Date:
This seems a tad unfriendly:

db1=# create table foo (f1 text collate "default");
ERROR:  collation "default" for current database encoding "UTF8" does not exist
LINE 1: create table foo (f1 text collate "default");                                 ^

Not being able to explicitly specify the default behavior is a no-no
according to most people who have thought about language design for more
than a moment.

The reason it's failing is that "default" is entered into pg_collation
with collencoding = 0 (SQL_ASCII), and the lookup code is designed to
ignore all entries with collencoding different from the current
database's encoding.  So, in fact, the above command *will* work if
you're in a SQL_ASCII database.  Just not elsewhere.

What I'm inclined to do about this is set "default"'s collencoding to
-1, with the semantics of "works for any encoding", and fix the lookup
routines to try -1 if they don't get a match with the database encoding.
Having done that, we could also use -1 for "C" and "POSIX", thus
avoiding having to make a bunch of duplicate entries for them.

BTW, I would like to eventually have "C" and "POSIX" in there all the
time (ie created by pg_collation.h), so that they can be used even in
machines that don't have locale_t support.  I haven't yet gotten around
to reading the parts of the collation patch that might need to change
to support this, so I'm not sure how much work it'd be.  But I'd say
that being able to do COLLATE "C" in an otherwise non-C database would
cover a very large fraction of the user requests I've read about this,
so being able to handle that case even without locale_t support would be
really useful IMO.
        regards, tom lane


Re: Can't specify default collation?

From
Peter Eisentraut
Date:
On tor, 2011-03-10 at 18:12 -0500, Tom Lane wrote:
> What I'm inclined to do about this is set "default"'s collencoding to
> -1, with the semantics of "works for any encoding", and fix the lookup
> routines to try -1 if they don't get a match with the database
> encoding.  Having done that, we could also use -1 for "C" and "POSIX",
> thus avoiding having to make a bunch of duplicate entries for them.

Good idea.

> BTW, I would like to eventually have "C" and "POSIX" in there all the
> time (ie created by pg_collation.h), so that they can be used even in
> machines that don't have locale_t support.  I haven't yet gotten
> around to reading the parts of the collation patch that might need to
> change to support this, so I'm not sure how much work it'd be.  But
> I'd say that being able to do COLLATE "C" in an otherwise non-C
> database would cover a very large fraction of the user requests I've
> read about this, so being able to handle that case even without
> locale_t support would be really useful IMO.

That should actually already work.  The relevant logic is in
varstr_cmp().

But good point.  We should support this out of the box on all platforms.