Re: WIP patch: Collation support - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: WIP patch: Collation support
Date
Msg-id 48C7855A.40605@enterprisedb.com
Whole thread Raw
In response to WIP patch: Collation support  ("Radek Strnad" <radek.strnad@gmail.com>)
Responses Re: WIP patch: Collation support  (Martijn van Oosterhout <kleptog@svana.org>)
Re: WIP patch: Collation support  (Zdenek Kotala <Zdenek.Kotala@Sun.COM>)
List pgsql-hackers
Radek Strnad wrote:
> Progress so far:
> - created catalogs pg_collation a pg_charset which are filled with three
> standard collations
> - initdb changes rows called "DEFAULT" in both catalogs during the bki
> bootstrap phase with current system LC_COLLATE and LC_CTYPE or those set by
> command line.
> - new collations can be defined with command CREATE COLLATION <collation
> name> FOR <character set specification>  FROM <existing collation name>
> [STRCOLFN <fn name>]
> [ <pad characteristic> ] [ <case sensitive> ] [ LCCOLLATE <lc_collate> ] [
> LCCTYPE <lc_ctype> ]
> - because of pg_collation and pg_charset are catalogs individual for each
> database, if you want to create a database with collation other than
> specified, create it in template1 and then create database

I have to wonder, is all that really necessary? The feature you're 
trying to implement is to support database-level collation at first, and 
perhaps column-level collation later. We don't need support for 
user-defined collations and charsets for that.

If leave all that out of the patch for now, we'll have a much slimmer, 
and just as useful patch, implementing database-level collation. We can 
add those catalogs later if we need them, but I don't think there's much 
point in adding all that infrastructure if they just reflect the locales 
installed in the operating system.

> - when connecting to database, it retrieves locales from pg_database and
> sets them

This is the real gist of this patch.

> Design & functionality changes left:
> - move retrieveing collation from pg_database to pg_type

I don't understand this item. What will you move?

> - get case sensitivity and pad characteristic working

I feel we should leave this to the collation implementation.

> - when creating database with different collation than database cluster, the
> database has to be reindexed. Any idea how to do it? Function
> ReindexDatabase works only when database is opened.

That's a tricky one. One idea is to prohibit choosing a different 
collation than the one in the template database, unless we know it's 
safe to do so without reindexing. The problem is that we don't know 
whether it's safe. A simple but limiting solution would be to require 
that the template database has the same collation as the database that's 
being created, except that template0 can always be used as template. 
template0 is safe, because there's no indexes on text columns there.

Note that we already have the same problem with encodings. If you create 
a database with LATIN1 encoding, load it with data, and then use that as 
a template for a database with UTF-8 encoding, the text data will be 
incorrectly encoded. We should probably fix that too.

--   Heikki Linnakangas  EnterpriseDB   http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: Synchronous Log Shipping Replication
Next
From: Heikki Linnakangas
Date:
Subject: Re: Synchronous Log Shipping Replication