Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers

From Nikhil Kumar Veldanda
Subject Re: ZStandard (with dictionaries) compression support for TOAST compression
Date
Msg-id CAFAfj_GzTtJX_KXOmYY_a+5X=FOAdAo8y0g5PhfimpO91U84oA@mail.gmail.com
Whole thread Raw
In response to Re: ZStandard (with dictionaries) compression support for TOAST compression  (Michael Paquier <michael@paquier.xyz>)
Responses Re: ZStandard (with dictionaries) compression support for TOAST compression
List pgsql-hackers
Hi Michael, Thanks for the feedback.

On Wed, May 7, 2025 at 12:49 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> I have been reading 0001 and I'm finding that the integration does not
> seem to fit much with the existing varatt_external, making the whole
> result slightly confusing.  A simple thing: the last bit that we can
> use is in varatt_external's va_extinfo, where the patch is using
> VARATT_4BCE_MASK to track that we need to go beyond varatt_external to
> know what kind of compression information we should use.  This is an
> important point, and it is not documented around varatt_external which
> still assumes that the last bit could be used for a compression
> method.  With what you are doing in 0001 (or even 0002), this becomes
> wrong.

This is the current logic used in patch for varatt_external.

When a datum is compressed with an extended algorithm and must live in
external storage, we set the top two bits of
va_extinfo(varatt_external) to 0b11.

To figure out the compression method for an external TOAST datum:

1. Inspect the top two bits of va_extinfo.
2. If they equal 0b11(VARATT_4BCE_MASK), call
toast_get_compression_id, which invokes detoast_external_attr to fetch
the datum in its 4-byte varattrib form (no decompression) and then
reads its compression header to find the compression method.
3. Otherwise, fall back to the existing
VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression
method.

We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to
determine if the compression method is extended or not.

Across the entire codebase, external TOAST‐pointer compression methods
are only inspected in the following functions:
1. pg_column_compression
2. check_tuple_attribute (verify_heapam pg function)
3. detoast_attr_slice (just to check pglz or not)

Could you please help me understand what’s incorrect about this approach?

> Shouldn't we have a new struct portion in varattrib_4b's union for
> this purpose at least (I don't recall that we rely on varattrib_4b's
> size which would get larger with this extra byte for the new extended
> data with the three bits set for the compression are set in
> va_extinfo, correct me if I'm wrong here).
> --

In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
adding it isn’t strictly necessary. If we do want to fold it into the
varattrib_4b union, we could define it like this:

```
typedef union
{
struct /* Normal varlena (4-byte length) */
{
uint32 va_header;
char va_data[FLEXIBLE_ARRAY_MEMBER];
} va_4byte;
struct /* Compressed-in-line format */
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
} va_compressed;
struct
{
uint32 va_header;
uint32 va_tcinfo; /* Original data size (excludes header) and
* compression method; see va_extinfo */
uint8 cmp_alg;
char cmp_data[FLEXIBLE_ARRAY_MEMBER];
} varatt_cmp_extended;
} varattrib_4b;
```
we don't depend on varattrib_4b size anywhere.

--
Nikhil Veldanda



pgsql-hackers by date:

Previous
From: David Rowley
Date:
Subject: Improve docs for n_distinct_inherited
Next
From: Matthew Sterrett
Date:
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations