Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format - Mailing list pgsql-hackers

From Dharin Shah
Subject Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format
Date
Msg-id CAOj6k6dG0M-G0gs_Htray5y_pfvacXSjKYw0uJaaYJvOeqk7xw@mail.gmail.com
Whole thread Raw
In response to Re: Fwd: [PATCH] Add zstd compression for TOAST using extended header format  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
Thanks Michael & Robert,

Agreed — I don’t think it’s realistic or practical to talk about deprecating or replacing pglz (or lz4) given on-disk compatibility requirements.

> Note that I am not on board with simply reusing varatt_external for
> zstd-compressed entries, neither do I think that this is the best move
> ever.  It makes the core patch simpler, but it makes things like
> ToastCompressionId more complicated to think about.  If anything, I'd
> consider a rename of varatt_external as the best way to go with an
> intermediate "translation" structure only used in memory as I am
> proposing on the other thread (something that others seem meh enough
> about but I am not seeing alternate proposals floating around,
> either).  This would make things like detoast_external_attr() less
> confusing, I think, as the latest patch posted on this thread is
> actually proving with its shortcut for toast_fetch_datum as one
> example of something I'd rather not do..

On the design: I understand & share the same concerns that we’d end up with multiple “sources of truth” for external compression method identification (pglz/lz4 via va_extinfo bits, zstd via vartag), and that this pushes method-specific shortcuts into detoast paths.

Would you be OK if I split this into two steps?

1.First, a refactor-only patch introducing a small decoded/in-memory representation of an external TOAST pointer, so detoast/toast code paths don’t have to reason directly about tcinfo vs vartag vs va_extinfo. This would be a cleanup with no on-disk format change and no behavioral change for existing methods. Is this the same “translation structure” approach you mentioned in the other thread? If you can point me to it, I’ll align with that proposal.

2. Then, a follow-up patch adding zstd using VARTAG_ONDISK_ZSTD, implemented on top of that abstraction to keep zstd handling centralized and minimize special-casing in detoast.
If that direction matches what you had in mind, I can first post the proposed translation structure/API for feedback before respinning the zstd patch.

Thanks,
Dharin


On Thu, Dec 25, 2025 at 1:25 AM Michael Paquier <michael@paquier.xyz> wrote:
On Wed, Dec 24, 2025 at 11:50:48AM -0500, Robert Treat wrote:
> Agreed that I can't see pglz being removed any time soon, if ever.
> Thinking through what a conversion process would look like seems
> unwieldy at best, so I think we definitely need it for backwards
> compatibility, plus I think it is useful to have a self-contained
> option. I'd almost suggest we should look at replacing lz4, but I
> don't think that is significantly easier, it just has a smaller, more
> invested, blast radius.

Backward-compatibility requirements make a replacement of LZ4
basically impossible to me, for the same reasons as pglz.  We could
not replace the bit used in the va_extinfo to track if LZ4 compression
is used, either.  One thing that I do wonder is if it would make
things simpler in the long-run if we introduced a new separated vartag
for LZ4-compressed external TOAST pointers as well.  At least we'd
have a leaner design: it means that we have to keep the
varatt_external available on read, but we could update to the new
format when writing entries.  Or perhaps that's not worth the
complication based on the last sentence you are writing.. 

> That said, I do suspect ztsd could quickly
> become a popular recommendation and/or default among users /
> consultants / service providers.

..  Because I strongly suspect that this is going to be true, and that
zstd would just be a better replacement over lz4.  That's a trend that
I see is already going on for wal_compression.

Note that I am not on board with simply reusing varatt_external for
zstd-compressed entries, neither do I think that this is the best move
ever.  It makes the core patch simpler, but it makes things like
ToastCompressionId more complicated to think about.  If anything, I'd
consider a rename of varatt_external as the best way to go with an
intermediate "translation" structure only used in memory as I am
proposing on the other thread (something that others seem meh enough
about but I am not seeing alternate proposals floating around,
either).  This would make things like detoast_external_attr() less
confusing, I think, as the latest patch posted on this thread is
actually proving with its shortcut for toast_fetch_datum as one
example of something I'd rather not do..
--
Michael

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring
Next
From: Michael Paquier
Date:
Subject: Re: [Proposal] Expose internal MultiXact member count function for efficient monitoring