Tom Lane wrote:
> Because the length specification is in *characters*, which is not by any
> means the same as *bytes*.
>
> We could possibly put enough intelligence into the low-level tuple
> manipulation routines to count characters in whatever encoding we happen
> to be using, but it's a lot faster and more robust to insist on a count
> word for every variable-width field.
I guess what you're saying is that PostgreSQL stores characters in
varying-length encodings. If it stored character data in Unicode (UCS-16) it
would always take up two-bytes per character. Have you considered supporting
NCHAR/NVARCHAR, aka NATIONAL character data? Wouldn't UCS-16 be needed to
support multi-locale clusters (as someone as inquiring about recently)?
Joe