Garbage pad bytes within datums are bad news - Mailing list pgsql-hackers

From Tom Lane
Subject Garbage pad bytes within datums are bad news
Date
Msg-id 7408.1207339044@sss.pgh.pa.us
Whole thread Raw
Responses Re: Garbage pad bytes within datums are bad news  (Teodor Sigaev <teodor@sigaev.ru>)
Re: Garbage pad bytes within datums are bad news  (Gregory Stark <stark@enterprisedb.com>)
List pgsql-hackers
I tracked down the problem reported here:
http://archives.postgresql.org/pgsql-admin/2008-04/msg00038.php
What it boils down to is that equal() doesn't see these two Consts
as equal:
              {CONST               :consttype 1009               :consttypmod -1               :constlen -1
 :constbyval false               :constisnull false               :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0
00 3 0 0               0 1 0 0 0 5 49 127 127 127 0 0 0 5 50 127 127 127 0 0 0 5 51 12              7 127 127 ]
    }
 
              {CONST               :consttype 1009               :consttypmod -1               :constlen -1
 :constbyval false               :constisnull false               :constvalue 48 [ 0 0 0 48 0 0 0 1 0 0 0 0 0 0 0 25 0
00 3 0 0               0 1 0 0 0 5 49 0 0 0 0 0 0 5 50 0 0 0 0 0 0 5 51 0 0 0 ]              }
 

The datums are arrays of text, and the bytes that are different are
garbage pad bytes between array entries.  Since equal() uses simple
bytewise equality (cf datumIsEqual()) it sees the constants as unequal.
The reason the behavior is a bit erratic is that the array constructor
isn't bothering to initialize these bytes, so you might or might not
get a failure depending on what happened to be there before.

Now, in large chunks of the system, a false not-equal result doesn't
cause anything worse than inefficiency, but in the particular case here
you actually get an error :-(.  I'm surprised that we've not seen
something like this reported before, because this has been busted since
forever.

From a semantic point of view it would be nicer if equal() used a
type-specific equality operator to compare Datums, but that idea runs up
against the same problem we saw in connection with HOT comparison of
index-column values: how do you know which equality operator to use,
if a data type has more than one?  Not to mention it'd be slow.

The alternative seems to be to forbid uninitialized pad bytes within
Datums.  That's not very pleasant to contemplate either, since it'll
forever be vulnerable to sins of omission.

Thoughts?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Patch queue -> wiki
Next
From: Tom Lane
Date:
Subject: Re: modules