Re: Fixed length data types issue - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Fixed length data types issue
Date
Msg-id 873bb3dfqi.fsf@enterprisedb.com
Whole thread Raw
In response to Re: Fixed length data types issue  (Andrew - Supernews <andrew+nonews@supernews.com>)
Responses Re: Fixed length data types issue  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-hackers
Andrew - Supernews <andrew+nonews@supernews.com> writes:

> Are you sure? Perhaps you are assuming that a char(1) field can be made
> to be fixed-length; this is not the case (consider utf-8 for example).

Well that could still be fixed length, it would just be a longer fixed length.
(In theory it would have to be 6 bytes long which I suppose would open up the
argument that if you're usually storing 7-bit ascii then a varlena would
usually be shorter.)

In any case I think the intersection of columns for which you care about i18n
and columns that you're storing according to an old-fashioned fixed column
layout is pretty much nil. And not just because it hasn't been updated to
modern standards either. If you look again at the columns in my example you'll
see none of them are appropriate targets for i18n anyways. They're all codes
and even numbers.

In other words if you're actually storing localized text then you almost
certainly will be using a text or varchar and probably won't even have a
maximum size. The use case for CHAR(n) is when you have fixed length
statically defined strings that are always the same length. it doesn't make
sense to store these in UTF8.

Currently Postgres has a limitation that you can only have one encoding per
database and one locale per cluster. Personally I'm of the opinion that the
only correct choice for that is "C" and all localization should be handled in
the client and with pg_strxfrm. Putting the whole database into non-C locales
guarantees that the columns that should not be localized will have broken
semantics and there's no way to work around things in the other direction.

Perhaps given the current situation what we should have is a cvarchar and
cchar data types that are like varchar and char but guaranteed to always be
interpreted in the c locale with ascii encoding.

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com


pgsql-hackers by date:

Previous
From: Dave Cramer
Date:
Subject: getting access to gborg, specifically the jdbc CVS files
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Fixed length data types issue