Re: Fixed length data types issue - Mailing list pgsql-hackers

From Martijn van Oosterhout
Subject Re: Fixed length data types issue
Date
Msg-id 20060907124102.GL10093@svana.org
Whole thread Raw
In response to Re: Fixed length data types issue  (Gregory Stark <stark@enterprisedb.com>)
Responses Re: Fixed length data types issue  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
On Thu, Sep 07, 2006 at 01:27:01PM +0100, Gregory Stark wrote:
> ... If you look again at the columns in my example you'll
> see none of them are appropriate targets for i18n anyways. They're all codes
> and even numbers.

Which begs the question of why they don't store the numbers in numeric
columns? That'll take far less space than any string.

> In other words if you're actually storing localized text then you almost
> certainly will be using a text or varchar and probably won't even have a
> maximum size. The use case for CHAR(n) is when you have fixed length
> statically defined strings that are always the same length. it doesn't make
> sense to store these in UTF8.

It makes sense to store them as numbers, or perhaps an enum.

> Currently Postgres has a limitation that you can only have one encoding per
> database and one locale per cluster. Personally I'm of the opinion that the
> only correct choice for that is "C" and all localization should be handled in
> the client and with pg_strxfrm. Putting the whole database into non-C locales
> guarantees that the columns that should not be localized will have broken
> semantics and there's no way to work around things in the other direction.

Quite. So if someone would code up SQL COLLATE support and integrate
ICU, everyone would be happy and we could all go home.

BTW, requireing localisation to happen in the client is silly. SQL
provides the ORDER BY clause for strings and it'd be silly to have the
client resort them just because they're not using C locale. The point
of a database was to make your life easier, right?

> Perhaps given the current situation what we should have is a cvarchar and
> cchar data types that are like varchar and char but guaranteed to always be
> interpreted in the c locale with ascii encoding.

I think bytea gives you that, pretty much.

Have a nice day,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> From each according to his ability. To each according to his ability to litigate.

pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: UUID/GUID discussion leading to request for hexstring bytea?
Next
From: "Merlin Moncure"
Date:
Subject: Re: Template0 age is increasing speedily.