Re: Proposal: CREATE CONVERSION - Mailing list pgsql-hackers

From Peter Eisentraut
Subject Re: Proposal: CREATE CONVERSION
Date
Msg-id Pine.LNX.4.44.0207091858540.1247-100000@localhost.localdomain
Whole thread Raw
In response to Re: Proposal: CREATE CONVERSION  (Thomas Lockhart <lockhart@fourpalms.org>)
List pgsql-hackers
Thomas Lockhart writes:

> An aside: I was thinking about this some, from the PoV of using our
> existing type system to handle this (as you might remember, this is an
> inclination I've had for quite a while). I think that most things line
> up fairly well to allow this (and having transaction-enabled features
> may require it), but do notice that the SQL feature of allowing a
> different character set for every column *name* does not map
> particularly well to our underlying structures.

There more I think about it, the more I come to the conclusion that the
SQL framework for "character sets" is both bogus and a red herring.  (And
it begins with figuring out exactly what a character set is, as opposed
to a form-of-use, a.k.a.(?) encoding, but let's ignore that.)

The ability to store each column value in a different encoding sounds
interesting, because it allows you to create tables such as
   product_id | product_name_en | product_name_kr | product_name_jp

but you might as well create a table such as
   product_id | lang | product_name

with product_name in Unicode, and have a more extensible application that
way, too.

I think it's fine to have the encoding fixed for the entire database.  It
sure makes coding easier.  If you want to be international, you use
Unicode.  If not you can "optimize" your database by using a more
efficient encoding.  In fact, I think we should consider making UTF-8 the
default encoding sometime.

The real issue is the collation.  But the collation is a small subset of
the whole locale/character set gobbledigook.  Standardized collation rules
in standardized forms exist.  Finding/creating routines to interpret and
apply them should be the focus.  SQL's notion to funnel the decision which
collation rule to apply through the character sets is bogus.  It's
impossible to pick a default collation rule for many character sets
without applying bias.

-- 
Peter Eisentraut   peter_e@gmx.net



pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: (A) native Windows port
Next
From: Peter Eisentraut
Date:
Subject: Re: Proposal: CREATE CONVERSION