On Thu, Nov 03, 2005 at 01:49:46PM +0000, Simon Riggs wrote:
> In other databases, CHAR(12) and NUMERIC(12) are fixed length datatypes.
> In PostgreSQL, they are dynamically varying datatypes.
Please explain how a CHAR(12) can store 12 UTF-8 characters when each
character may be 1 to 4 bytes, unless the CHAR itself is variable
length...
> What actually happens is that in many other systems the datatype is the
> same, but additional metadata is provided for that particular attribute.
> So CHAR(12) is a datatype of CHAR with a metadata item called length
> which is set to 12 for that attribute.
We already have this metadata, it's called atttypmod and it's stored in
pg_attribute. That's where the 12 for CHAR(12) is stored BTW.
> On PostgreSQL, CHAR(12) is a bpchar datatype with all instantiations of
> that datatype having a 4 byte varlena header. In this example, all of
> those instantiations having the varlena header set to 12, so essentially
> wasting the 4 byte header.
Nope, the verlena header stores the actual length on disk. If you store
"hello" in a char(12) field it takes only 9 bytes (4 for the header, 5
for the data), which is less than 12.
Good ideas, but it all hinges on the fact that CHAR(12) can take a
fixed amount of space, which simply isn't true in a multibyte encoding.
Having a different header for things shorter than 255 bytes has been
discussed before, that's another argument though.
Have a nice day,
--
Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/
> Patent. n. Genius is 5% inspiration and 95% perspiration. A patent is a
> tool for doing 5% of the work and then sitting around waiting for someone
> else to do the other 95% so you can sue them.