Re: PostgreSQL in-transit compression for a client connection - Mailing list pgsql-general

From Dominique Devienne
Subject Re: PostgreSQL in-transit compression for a client connection
Date
Msg-id CAFCRh-9ihsf1WQNJGc0UMQcBW7SnjCeCZN+mkm7SScZOVV2Tpw@mail.gmail.com
Whole thread Raw
In response to Re: PostgreSQL in-transit compression for a client connection  (Magnus Hagander <magnus@hagander.net>)
List pgsql-general
On Fri, Apr 28, 2023 at 9:03 AM Magnus Hagander <magnus@hagander.net> wrote:
On Thu, Apr 27, 2023 at 11:55 AM Laurenz Albe <laurenz.albe@cybertec.at> wrote:
>
> On Thu, 2023-04-27 at 11:44 +0200, Dominique Devienne wrote:
> > as someone who must store ZLIB (from ZIP files)
> > and sometimes LZ4 compressed `bytea` values, I often find it's a shame that I have
> > to decompress them, send them over the wire uncompressed, to have the PostgreSQL
> > backend recompress them when TOAST'ed. That's a waste of CPU and IO bandwidth...
>
> That's not what you were looking for, but why not store the compressed data
> in the database (after SET STORAGE EXTERNAL on the column) and uncompress
> them after you have received them on the client side?

Laurenz is right of course. But then like Magnus is saying, I lose transparent decompression,
on read. But also for server-side processing. Some of those compressed values are actually
XML, and sometimes it can be useful to process (usualyl extract subset) of those server-side.

Unless there are ways to uncompress values explicitly in SQL or PG/PLSQL?
 
That assumes you only have one client. You may want to use the
transparent compression/decompression from some clients and not for
others.

Which brings up something I forgot to mention earlier, where I concrentrated from the write-side,
which is that clients would also ideally need a way to fetch the values still-compressed, when
explicitly requesting it, while others implicitly get the transparent decompression.

BTW, such a mechanism would open the door for libpq doing that itself transparently too, I guess.
That would allow network-transport of the compressed values, and client-side decompression.
Of course, when getting the whole value only, not when getting a subset. And possibly opt-in only.
 
I think it'd be a useful feature to have, but it's not something that
we have today or that I'm aware of being on anybodys radar. So most
likely, for now you're stuck with either what you're doing today, or
as Laurenz suggests handle it completely in the application. You can't
do the mix.

It's a tricky feature, because the client and server have to cooperate and agree on the exact
format used for compressed values, including meta-data (uncompressed size, and checksum,
and compression algo+setting and checksum type; e.g. CRC32 vs XXHASH). That's why I
suspect this won't happen anytime soon or ever. It's just a pie-in-the-sky brainstorming exercise.

pgsql-general by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: PostgreSQL in-transit compression for a client connection
Next
From: Bruce Momjian
Date:
Subject: Re: HOT update: why must ALL indexes should be update?