Re: [PATCH] Clarify the comments about varlena header encoding - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PATCH] Clarify the comments about varlena header encoding
Date
Msg-id CAFBsxsEriQKtU9fEDA8cf=0FME62EZyUXZ+csgz73mTUoZcSwQ@mail.gmail.com
Whole thread Raw
In response to Re: [PATCH] Clarify the comments about varlena header encoding  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: [PATCH] Clarify the comments about varlena header encoding
List pgsql-hackers
On Tue, Sep 6, 2022 at 5:19 PM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi John,
>
> Thanks for the feedback.
>
> > Maybe "00xxxxxx 4-byte length word (aligned)," is more clear about
> > what is aligned. Also, adding all those xxx's obscures the point that
> > we only need to examine one byte to figure out what to do next.
>
> IMO "00xxxxxx 4-byte length word" is still confusing. One can misread
> this as a 00-xx-xx-xx hex value, where the first byte (not two bits)
> is 00h.

The top of the comment literally says

 * Bit layouts for varlena headers on big-endian machines:

...but maybe we can say at the top that we inspect the first byte to
determine what kind of header it is. Or put the now-standard 0b in
front.

> > The patch has:
> >
> > + * In the third case the va_tag field (see varattrib_1b_e) is used to discern
> > + * the specific type and length of the pointer datum. On disk the "xxx" bits
> > + * currently always store sizeof(varatt_external) + 2.
> >
> > ...so not sure where 17 came from.
>
> Right, AFTER applying the patch it's clear that it's actually 18
> bytes.

Okay, I see now that this quote from your first email:

"This is misleading too. The comments above this line say that `struct
varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
plus 1 byte equals 17, right? However the documentation [1] claims the
result should be 18:"

...is not your thought, but one of a fictional misled reader. I
actually found this phrase more misleading than the header comments.
:-)

I think the problem is ambiguity about what a "toast pointer" is. This comment:

 * struct varatt_external is a traditional "TOAST pointer", that is, the

has caused people to think a toasted value in the main relation takes
up 16 bytes on disk sizeof(varatt_external) = 16, when it's actually
18. Is the 16 the "toast pointer" or the 18?

> > - * 10000000 1-byte length word, unaligned, TOAST pointer
> > + * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external)
> >
> > This implies that the header is two bytes, which is not accurate. That
> > next byte is a type tag:
> > [...]
> > ...and does not always represent the on-disk length:
>
> Well, the comments don't say what is the header and what is the type
> tag.

Because the comments explain the following macros that read bits in
the *first* byte of a 1- or 4-byte header to determine what kind it
is.

> They merely describe the bit layouts. The patch doesn't seem to
> make things worse in this respect. Do you think we should address this
> too? I suspect that describing the difference between the header and
> the type tag here will create even more confusion.

I said nothing about describing the difference between the header and
type tag. The patch added xxx's for the type tag in a comment about
the header. This is more misleading than what is there now.

-- 
John Naylor
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Handle infinite recursion in logical replication setup
Next
From: John Naylor
Date:
Subject: Re: failing to build preproc.c on solaris with sun studio