Thread: Two aesthetic bugs in the 1-byte packed varlena code

Two aesthetic bugs in the 1-byte packed varlena code

From
Gregory Stark
Date:
Korry spotted two violations of the rule about accessing the first varlena
field directly which I had missed. In both cases I believe the accesses are
actually safe because they follow an int4 column and the data are either
explicitly detoasted or passed to textout which can handle 1-byte headers
fine.

One instance is in flatfiles.c when it accesses the password field which is a
text field. Here I don't see any reasonable way to avoid it and think we
should just document that we're bending the rules.

The other instance is in inv_api.c where it would be quite possible to use
fastgetattr() instead. But the column is always at the same fixed offset and
again it follows an int4 so it'll always be 4-byte aligned and work fine. So
for performance reasons perhaps we should keep this as well?

I've attached three patches. A patch which adds a comment for flatfiles.c. And
two patches for inv_api.c one which just adds a comment and one which converts
it to use fastgetattr().

Again, credit to Korry for spotting these. To do so he commented out all the
varlena fields (excluding int2vector and oidvector) from the struct
definitions and saw what broke.

Why do we even have those fields in the structs if they're unsafe to use?
Perhaps we should #if 0 out all the unsafe fields from all the struct
definitions? That would avoid one of the most common new programmer bugs and
also let us document the few exceptions and why they're allowed.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Attachment

Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Why do we even have those fields in the structs if they're unsafe to use?

1. genbki.sh

2. As you note, they're not always unsafe to use.

            regards, tom lane

Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> Why do we even have those fields in the structs if they're unsafe to use?
>
> 1. genbki.sh

But genbki.sh wouldn't care if we #if 0 around the unsafe ones would it?

> 2. As you note, they're not always unsafe to use.

Well, it seems like it would be nice to mark which ones are and aren't unsafe
rather than have them all be there waiting to trip people up.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> Well, it seems like it would be nice to mark which ones are and aren't unsafe
> rather than have them all be there waiting to trip people up.

We already do mark 'em --- that's what all the VARIABLE LENGTH FIELDS
START HERE comments are about.  I'm not especially worried about
mechanizing it since in practice code that tries to access those fields
will crash pretty obviously and consistently, except for the first such
field which is safe anyway.

            regards, tom lane

Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> The other instance is in inv_api.c where it would be quite possible to use
> fastgetattr() instead. But the column is always at the same fixed offset and
> again it follows an int4 so it'll always be 4-byte aligned and work fine. So
> for performance reasons perhaps we should keep this as well?

After looking closer, I think there are worse problems here: the code is
still using VARSIZE/VARDATA etc, which it should not be because the
field could easily be in 1-byte-header form.  And it seems like we
should be checking for NULL, too, because initdb only marks the first
two fields as NOT NULL.  The latter would qualify as a security hole
except that you'd have to be superuser to put a record with a null data
field into pg_largeobject.

I concur that we probably want to avoid adding fastgetattr for
performance reasons, seeing that this table's layout seems unlikely
to change.  But it seems like we might want to trouble with a null
test.  Thoughts?

            regards, tom lane

Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> The other instance is in inv_api.c where it would be quite possible to use
>> fastgetattr() instead. But the column is always at the same fixed offset and
>> again it follows an int4 so it'll always be 4-byte aligned and work fine. So
>> for performance reasons perhaps we should keep this as well?
>
> After looking closer, I think there are worse problems here: the code is
> still using VARSIZE/VARDATA etc, which it should not be because the
> field could easily be in 1-byte-header form.

Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so
it'll detoast them first. We could avoid the detoasting but given that it's
expecting the chunks to be compressed anyways the memcpys of the smallest
chunks probably don't matter much either way. I'm assuming it's like toast in
that only the last chunk will be smaller than LOBLKSIZE anyways, right?

> I concur that we probably want to avoid adding fastgetattr for
> performance reasons, seeing that this table's layout seems unlikely
> to change.  But it seems like we might want to trouble with a null
> test.  Thoughts?

There should never even be a null bitmap right? Maybe we should just
elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all.

  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com


Re: Two aesthetic bugs in the 1-byte packed varlena code

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> After looking closer, I think there are worse problems here: the code is
>> still using VARSIZE/VARDATA etc, which it should not be because the
>> field could easily be in 1-byte-header form.

> Well that's ok because VARATT_IS_EXTENDED returns true for 1-byte forms so
> it'll detoast them first.

Ah, right.

> We could avoid the detoasting but given that it's
> expecting the chunks to be compressed anyways the memcpys of the smallest
> chunks probably don't matter much either way. I'm assuming it's like toast in
> that only the last chunk will be smaller than LOBLKSIZE anyways, right?

Well, it's like toast except that there can be unwritten "holes" in a LO.
Still, in normal cases you'd expect only the last partial page to be
potentially short enough for 1-B format, and even then only about 1/16th
of the time.  OK, not worth changing then.

> There should never even be a null bitmap right? Maybe we should just
> elog(ERROR) if we find HeapTupleHasNulls(tuple) to be true at all.

That sounds like a good and cheap test.  Will make it so.

            regards, tom lane