Progress on char(n) default-value problem - Mailing list pgsql-hackers

From Tom Lane
Subject Progress on char(n) default-value problem
Date
Msg-id 13915.926643418@sss.pgh.pa.us
Whole thread Raw
Responses Re: [HACKERS] Progress on char(n) default-value problem  (Bruce Momjian <maillist@candle.pha.pa.us>)
List pgsql-hackers
It looks like the problem is that the default value is getting inserted
without benefit of conversion, ie, whatever the given text is will get
dropped into the finished tuple without padding/truncation to the
specified char(n) length.

Later, when we try to read out the tuple, the tuple access routines
figure they know how big a char(n) is, so they don't actually look
to see what the varlena count is.  This results in misalignment of
following fields, causing either wrong data readout or a full-bore
crash.

Test case:

CREATE TABLE test (
plt int2 PRIMARY KEY,
state CHAR(5) NOT NULL DEFAULT 'new',
used boolean NOT NULL DEFAULT 'f',
id int4
);

INSERT INTO test (plt, id) VALUES (2, 3);

Examination of the stored tuple shows it contains 32 bytes of data:

0x400d7f30:     0x00    0x02    0x00    0x00    0x00    0x00    0x00    0x07
0x400d7f38:     0x6e    0x65    0x77    0x00    0x00    0x00    0x00    0x03

which deconstructs as follows:

00 02       int2 '2' (bigendian hardware here)
00 00        pad space to align varlena char field to long boundary
00 00 00 07    varlena header, size 7 => 3 bytes of actual data (whoops)
6e 65 77    ASCII 'new'
00        boolean 'f' (no pad needed for bool)
00 00 00 03    int4 '3' (no pad, it's on a long boundary already)

But the tuple readout routines will assume without looking that char(5)
occupies 9 bytes altogether, so they pick up the bool field 2 bytes over
from where it actually was put and pick up the int4 field 4 bytes over
from where it should be (due to alignment); result is garbage.  If there
were another varlena field after the char(n) field, they'd pick up a
wrong field length and probably crash.


So, the question still remains "where and why"?  My guess at this point
is that this is a bad side-effect of the fact that text and char(n) are
considered binary-equivalent.  Probably, whatever bit of code ought to
be coercing the default value into the correct type for the column is
deciding that it doesn't have to do anything because they're already
equivalent types.  I'm not sure where to look for that code (help
anyone?).  But I am sure that it needs to be coercing the value to the
specified number of characters for char(n).

It also strikes me that there should be a check in the low-level
tuple construction routines that what they are handed for a char(n)
field is the right length.  If tuple readout is going to assume that
char(n) is always n bytes of data, good software engineering dictates
that the tuple-writing code ought to enforce that assumption.  At
the very least there should be an Assert() for it.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Some progress on INSERT/SELECT/GROUP BY bugs
Next
From: Tatsuo Ishii
Date:
Subject: Re: [HACKERS] Report on NetBSD/mac port of Postgres 6.4.2