Re: ZStandard (with dictionaries) compression support for TOAST compression - Mailing list pgsql-hackers

From Nikhil Kumar Veldanda
Subject Re: ZStandard (with dictionaries) compression support for TOAST compression
Date
Msg-id CAFAfj_HPvuzXqbkgQ6F=ocR=JHe9PA7iVazGaCcrtPm-hKZ6pQ@mail.gmail.com
Whole thread Raw
In response to Re: ZStandard (with dictionaries) compression support for TOAST compression  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi Robert

> But I don't quite understand the point of this
> response: it seems like you're just restating what the design does
> without really justifying it. The question here isn't whether a 3-byte
> header can describe a length up to 16MB; I think we all know our
> powers of two well enough to agree on the answer to that question. The
> question is whether it's a good use of 3 bytes, and I don't think it
> is.

My initial decision to include a 3‑byte length field was driven by two goals:
    1. Avoid introducing separate callbacks for each algorithm.
    2. Provide a single, algorithm-agnostic mechanism for handling
metadata length.

After re-evaluating based on your feedback, I agree that the fixed
overhead of a 3-byte length field outweighs its benefit; per-algorithm
callbacks deliver the same functionality while saving three bytes per
datum.

> I did consider the fact that future compression algorithms might want
> to use variable-length headers; but I couldn't see a reason why we
> shouldn't let each of those compression algorithms decide for
> themselves how to encode whatever information they need. If a
> compression algorithm needs a variable-length header, then it just
> needs to make that header self-describing. Worst case scenario, it can
> make the first byte of that variable-length header a length byte, and
> then go from there; but it's probably possible to be even smarter and
> use less than a full byte. Say for example we store a dictionary ID
> that in concept is a 32-bit quantity but we use a variable-length
> integer representation for it. It's easy to see that we shouldn't ever
> need more than 3 bits for that so a full length byte is overkill and,
> in fact, would undermine the value of a variable-length representation
> rather severely. (I suspect it's a bad idea anyway, but it's a worse
> idea if you burn a full byte on a length header.)
>

I agree. Each compression algorithm can decide its own metadata size
overhead. Callbacks can provide this information as well rather than
storing in fixed length bytes(3 bytes). The revised patch introduces a
"toast_cmpid_meta_size(const varatt_cmp_extended *hdr)", which
calculates the metadata size.

> But there's an even larger question here too, which is why we're
> having some kind of discussion about generalized metadata when the
> current project seemingly only requires a 4-byte dictionary OID. If
> you have some other use of this space in mind, I don't think you've
> told us what it is. If you don't, then I'm not sure why we're
> designing around an up-to-16MB variable-length quantity when what we
> have before us is a 4-byte fixed-length quantity.

This project only requires 4 bytes of fixed-size metadata to store the
dictionary ID.

Updated design for extending varattrib_4b compression

1. extensible header

/*
 * varatt_cmp_extended: an optional per‐datum header for extended
compression method.
 * Only used when va_tcinfo's top two bits are "11".
 */
typedef struct varatt_cmp_extended
{
        uint8           cmp_alg;
        char            cmp_meta[FLEXIBLE_ARRAY_MEMBER];        /*
algorithm‐specific metadata */
} varatt_cmp_extended;

2. Algorithm registry and metadata size dispatch

static inline uint32
unsupported_meta_size(const varatt_cmp_extended *hdr)
{
        elog(ERROR, "toast_cmpid_meta_size called for unsupported
compression algorithm");
        return 0;                                       /* unreachable */
}

/* no metadata for plain-ZSTD */
static inline uint32
zstd_nodict_meta_size(const varatt_cmp_extended *hdr)
{
        return 0;
}

static inline uint32
zstd_dict_meta_size(const varatt_cmp_extended *hdr)
{
        return sizeof(Oid);
}

/*
 * TOAST compression methods enumeration.
 *
 * NAME         : algorithm identifier
 * VALUE        : enum value
 * META-SIZE-FN : Calculates algorithm metadata size.
 */
#define TOAST_COMPRESSION_LIST                                  \
        X(PGLZ,                 0, unsupported_meta_size)       \
        X(LZ4,                  1, unsupported_meta_size)       \
        X(ZSTD_NODICT,          2, zstd_nodict_meta_size)       \
        X(ZSTD_DICT,            3, zstd_dict_meta_size)         \
        X(INVALID,              4, unsupported_meta_size)       /* sentinel */

/* Compression algorithm identifiers */
typedef enum ToastCompressionId
{
#define X(name,val,fn) TOAST_##name##_COMPRESSION_ID = (val),
        TOAST_COMPRESSION_LIST
#undef X
} ToastCompressionId;

/* lookup table to check if compression method uses extended format */
static const bool toast_cmpid_extended[] = {
#define X(name,val,fn)                                          \
        /* PGLZ, LZ4 don't use extended format */               \
        [TOAST_##name##_COMPRESSION_ID] =                       \
                        ((val) != TOAST_PGLZ_COMPRESSION_ID &&  \
                        (val) != TOAST_LZ4_COMPRESSION_ID  &&   \
                        (val) != TOAST_INVALID_COMPRESSION_ID),
        TOAST_COMPRESSION_LIST
#undef X
};

#define TOAST_CMPID_EXTENDED(alg) (toast_cmpid_extended[alg])

/*
 * Prototype for a per-datum metadata-size callback:
 *   given a pointer to the extended header, return
 *   how many metadata bytes follow it.
 */
typedef uint32 (*ToastMetaSizeFn) (const varatt_cmp_extended *hdr);

/* Callback table—indexed by ToastCompressionId */
static const ToastMetaSizeFn toast_meta_size_fns[] = {
#define X(name,val,fn) [TOAST_##name##_COMPRESSION_ID] = fn,
        TOAST_COMPRESSION_LIST
#undef X
};

/* Calculates algorithm metadata size */
static inline uint32
toast_cmpid_meta_size(const varatt_cmp_extended *hdr)
{
        Assert(hdr != NULL);
        return toast_meta_size_fns[hdr->cmp_alg] (hdr);
}

Each compression algorithm provides a static callback that returns the
size of its metadata, given a pointer to the varatt_cmp_extended
header. Algorithms with fixed-size metadata return a constant, while
algorithms with variable-length metadata are responsible for defining
and parsing their own internal headers to compute the metadata size.

3. Resulting on-disk layouts for zstd

ZSTD (nodict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32)               |
+----------------------------------+
| va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg  (uint8)                 |  ← (ZSTD_NODICT)
+----------------------------------+
| compressed bytes …               |  ← ZSTD frame
+----------------------------------+


ZSTD(dict) — datum on‑disk layout

+----------------------------------+
| va_header (uint32)               |
+----------------------------------+
| va_tcinfo (uint32)               |  ← top two bits = 11 (extended)
+----------------------------------+
| cmp_alg  (uint8)                 |  ← (ZSTD_DICT)
+----------------------------------+
| dict_id   (uint32)               |  ← dictionary OID
+----------------------------------+
| compressed bytes …               |  ← ZSTD frame
+----------------------------------+

I hope this updated design addresses your concerns. I would appreciate
any further feedback you may have. Thanks again for your guidance—it's
been very helpful.

v20-0001-varattrib_4b-design-proposal-to-make-it-extended.patch:
varattrib_4b extensibility – adds varatt_cmp_extended, metadata size
dispatch and useful macros; behaviour unchanged.
v20-0002-zstd-nodict-compression.patch: Plain ZSTD (non dict) support.


--
Nikhil Veldanda

Attachment

pgsql-hackers by date:

Previous
From: Etsuro Fujita
Date:
Subject: Re: Options to control remote transactions’ access/deferrable modes in postgres_fdw
Next
From: Wolfgang Walther
Date:
Subject: Re: [PoC] Federated Authn/z with OAUTHBEARER