Re: WIP patch: Collation support - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP patch: Collation support
Date
Msg-id 48C79886.9030504@enterprisedb.com
Whole thread Raw
In response to Re: WIP patch: Collation support  (Martijn van Oosterhout <kleptog@svana.org>)
Responses Re: WIP patch: Collation support  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Martijn van Oosterhout wrote:
> On Wed, Sep 10, 2008 at 11:29:14AM +0300, Heikki Linnakangas wrote:
>> Radek Strnad wrote:
>>> - because of pg_collation and pg_charset are catalogs individual for each
>>> database, if you want to create a database with collation other than
>>> specified, create it in template1 and then create database
>> I have to wonder, is all that really necessary? The feature you're 
>> trying to implement is to support database-level collation at first, and 
>> perhaps column-level collation later. We don't need support for 
>> user-defined collations and charsets for that.
> 
> Since the set of collations isn't exactly denumerable, we need some way
> to allow the user to specify the collation they want. The only
> collation PostgreSQL knows about is the C collation. Anything else is
> user-defined.

Let's just use the name of the OS locale, like we do now. Having a 
pg_collation catalog just moves the problem elsewhere: we'd still need 
something in pg_collation to tie the collation to the OS locale.

>>> Design & functionality changes left:
>>> - move retrieveing collation from pg_database to pg_type
>> I don't understand this item. What will you move?
> 
> Long term, the collation is a property of the type, ...

You might want to provide a default collation for a type as well, but 
the very finest grade is that you can specify collation for every (text) 
comparison operator in your query. Of course you don't want to do that 
for every query, which is why we should provide defaults at different 
levels: columns, tables, database. And perhaps types as well, but that's 
not the most interesting case.

I'm not sure what the SQL spec says about that, but I believe it 
provides syntax and rules for all that.

>> That's a tricky one. One idea is to prohibit choosing a different 
>> collation than the one in the template database, unless we know it's 
>> safe to do so without reindexing.
> 
> But that put us back where we started: every database having the same
> collation. We're trying to move away from that. Just reindex everything
> and be done with it.

That's easier said than done, unfortunately.

>> Note that we already have the same problem with encodings. If you create 
>> a database with LATIN1 encoding, load it with data, and then use that as 
>> a template for a database with UTF-8 encoding, the text data will be 
>> incorrectly encoded. We should probably fix that too.
> 
> I'd say forbid more than one encoding in a cluster, but that's just my
> opinion :)

Yeah, that's pretty useless, at least without support for different 
locales on different databases. But might as well keep it unless there's 
a pressing reason to drop it.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: Synchronous Log Shipping Replication
Next
From: Zdenek Kotala
Date:
Subject: Re: WIP patch: Collation support