Re: [PATCH] Clarify the comments about varlena header encoding - Mailing list pgsql-hackers

From John Naylor
Subject Re: [PATCH] Clarify the comments about varlena header encoding
Date
Msg-id CAFBsxsGkYq_atJ7i157cWG-YE2EPV2Bwq60Vw1LHJ7vjx+gFgg@mail.gmail.com
Whole thread Raw
In response to [PATCH] Clarify the comments about varlena header encoding  (Aleksander Alekseev <aleksander@timescale.com>)
Responses Re: [PATCH] Clarify the comments about varlena header encoding
List pgsql-hackers
On Thu, Aug 18, 2022 at 1:06 AM Aleksander Alekseev
<aleksander@timescale.com> wrote:
>
> Hi hackers,
>
> I noticed that the comments regarding bit layouts for varlena headers
> in postgres.h are somewhat misleading. For instance, when reading:

I agree it's confusing, but I don't think this patch is the right direction.

> ```
> 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
> ```
>
> ... one can assume this is a 00xxxxxx byte followed by another 4 bytes
> (which is wrong). Also one can read this as "aligned, uncompressed
> data" (which again is wrong).

- * 00xxxxxx 4-byte length word, aligned, uncompressed data (up to 1G)
+ * 00xxxxxx xxxxxxxx xxxxxxxx xxxxxxxx, uncompressed data (up to 1G)

Maybe "00xxxxxx 4-byte length word (aligned)," is more clear about
what is aligned. Also, adding all those xxx's obscures the point that
we only need to examine one byte to figure out what to do next.

> ```
> 10000000 1-byte length word, unaligned, TOAST pointer
> ```
>
> This is misleading too. The comments above this line say that `struct
> varatt_external` is a TOAST pointer. sizeof(varatt_external) = 16,
> plus 1 byte equals 17, right? However the documentation [1] claims the
> result should be 18:

The patch has:

+ * In the third case the va_tag field (see varattrib_1b_e) is used to discern
+ * the specific type and length of the pointer datum. On disk the "xxx" bits
+ * currently always store sizeof(varatt_external) + 2.

...so not sure where 17 came from.

- * 10000000 1-byte length word, unaligned, TOAST pointer
+ * 10000000 xxxxxxxx, TOAST pointer (struct varatt_external)

This implies that the header is two bytes, which is not accurate. That
next byte is a type tag:

/* TOAST pointers are a subset of varattrib_1b with an identifying tag byte */
typedef struct
{
uint8 va_header; /* Always 0x80 or 0x01 */
uint8 va_tag; /* Type of datum */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Type-specific data */
} varattrib_1b_e;

...and does not always represent the on-disk length:

/*
 * Type tag for the various sorts of "TOAST pointer" datums.  The peculiar
 * value for VARTAG_ONDISK comes from a requirement for on-disk compatibility
 * with a previous notion that the tag field was the pointer datum's length.
 */
typedef enum vartag_external
{
VARTAG_INDIRECT = 1,
VARTAG_EXPANDED_RO = 2,
VARTAG_EXPANDED_RW = 3,
VARTAG_ONDISK = 18
} vartag_external;

And I don't think the new comments referring to "third case", "first
two cases", etc make it easier to follow.
-- 
John Naylor
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Nikita Glukhov
Date:
Subject: Re: SQL/JSON features for v15
Next
From: Noah Misch
Date:
Subject: Re: static libpq (and other libraries) overwritten on aix