Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: ZStandard (with dictionaries) compression support for TOAST compression
Date
Msg-id aBv9EAq_tkOG_2nV@paquier.xyz
Whole thread Raw
In response to Re: ZStandard (with dictionaries) compression support for TOAST compression  (Nikhil Kumar Veldanda <veldanda.nikhilkumar17@gmail.com>)
List pgsql-hackers
On Wed, May 07, 2025 at 04:39:17PM -0700, Nikhil Kumar Veldanda wrote:
> In patch v21, va_compressed.va_data points to varatt_cmp_extended, so
> adding it isn’t strictly necessary. If we do want to fold it into the
> varattrib_4b union, we could define it like this:
>
> ```
> typedef union
> {
>     struct /* Normal varlena (4-byte length) */
>     {
>         uint32 va_header;
>         char va_data[FLEXIBLE_ARRAY_MEMBER];
>     } va_4byte;
>     struct /* Compressed-in-line format */
>     {
>         uint32 va_header;
>         uint32 va_tcinfo; /* Original data size (excludes header) and
>         * compression method; see va_extinfo */
>         char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */
>     } va_compressed;
>     struct
>     {
>         uint32 va_header;
>         uint32 va_tcinfo; /* Original data size (excludes header) and
>         * compression method; see va_extinfo */
>         uint8 cmp_alg;
>         char cmp_data[FLEXIBLE_ARRAY_MEMBER];
>     } varatt_cmp_extended;
> } varattrib_4b;
> ```
> we don't depend on varattrib_4b size anywhere.

Yes, I was wondering if this is not the most natural approach in terms
of structure once if we plug an extra byte into the varlena header if
all the bits of va_extinfo for the compression information are used.
Having all the bits may not mean that this necessarily means that the
information would be cmp_data all the time, just that this a natural
option when plugging in a new compression method in the new byte
available.

FWIW, I've tested this exact change yesterday, wondering if we depend
on sizeof(varattrib_4b) after looking at the code and getting the
impression that we don't even for some the in-memory comparisons, and
noted two things:
- check-world was OK.
- a pg_upgrade'd instance with a regression database seems kind of
OK, but I've not done that much in-depth checking on this side so I
have less confidence about that.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Matthew Sterrett
Date:
Subject: Re: Proposal for enabling auto-vectorization for checksum calculations
Next
From: Bruce Momjian
Date:
Subject: Re: PG 18 release notes draft committed