Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers
From | Nikhil Kumar Veldanda |
---|---|
Subject | Re: ZStandard (with dictionaries) compression support for TOAST compression |
Date | |
Msg-id | CAFAfj_GzTtJX_KXOmYY_a+5X=FOAdAo8y0g5PhfimpO91U84oA@mail.gmail.com Whole thread Raw |
In response to | Re: ZStandard (with dictionaries) compression support for TOAST compression (Michael Paquier <michael@paquier.xyz>) |
Responses |
Re: ZStandard (with dictionaries) compression support for TOAST compression
|
List | pgsql-hackers |
Hi Michael, Thanks for the feedback. On Wed, May 7, 2025 at 12:49 AM Michael Paquier <michael@paquier.xyz> wrote: > > I have been reading 0001 and I'm finding that the integration does not > seem to fit much with the existing varatt_external, making the whole > result slightly confusing. A simple thing: the last bit that we can > use is in varatt_external's va_extinfo, where the patch is using > VARATT_4BCE_MASK to track that we need to go beyond varatt_external to > know what kind of compression information we should use. This is an > important point, and it is not documented around varatt_external which > still assumes that the last bit could be used for a compression > method. With what you are doing in 0001 (or even 0002), this becomes > wrong. This is the current logic used in patch for varatt_external. When a datum is compressed with an extended algorithm and must live in external storage, we set the top two bits of va_extinfo(varatt_external) to 0b11. To figure out the compression method for an external TOAST datum: 1. Inspect the top two bits of va_extinfo. 2. If they equal 0b11(VARATT_4BCE_MASK), call toast_get_compression_id, which invokes detoast_external_attr to fetch the datum in its 4-byte varattrib form (no decompression) and then reads its compression header to find the compression method. 3. Otherwise, fall back to the existing VARATT_EXTERNAL_GET_COMPRESS_METHOD path to get the compression method. We use this macro VARATT_EXTERNAL_COMPRESS_METHOD_EXTENDED to determine if the compression method is extended or not. Across the entire codebase, external TOAST‐pointer compression methods are only inspected in the following functions: 1. pg_column_compression 2. check_tuple_attribute (verify_heapam pg function) 3. detoast_attr_slice (just to check pglz or not) Could you please help me understand what’s incorrect about this approach? > Shouldn't we have a new struct portion in varattrib_4b's union for > this purpose at least (I don't recall that we rely on varattrib_4b's > size which would get larger with this extra byte for the new extended > data with the three bits set for the compression are set in > va_extinfo, correct me if I'm wrong here). > -- In patch v21, va_compressed.va_data points to varatt_cmp_extended, so adding it isn’t strictly necessary. If we do want to fold it into the varattrib_4b union, we could define it like this: ``` typedef union { struct /* Normal varlena (4-byte length) */ { uint32 va_header; char va_data[FLEXIBLE_ARRAY_MEMBER]; } va_4byte; struct /* Compressed-in-line format */ { uint32 va_header; uint32 va_tcinfo; /* Original data size (excludes header) and * compression method; see va_extinfo */ char va_data[FLEXIBLE_ARRAY_MEMBER]; /* Compressed data */ } va_compressed; struct { uint32 va_header; uint32 va_tcinfo; /* Original data size (excludes header) and * compression method; see va_extinfo */ uint8 cmp_alg; char cmp_data[FLEXIBLE_ARRAY_MEMBER]; } varatt_cmp_extended; } varattrib_4b; ``` we don't depend on varattrib_4b size anywhere. -- Nikhil Veldanda
pgsql-hackers by date: