Re: Zstandard support for toast compression - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Zstandard support for toast compression
Date
Msg-id CA+TgmoYXFyvJzxF9rqA2nC9qZsoWzW+rPmvtj3uF9+1ertoL=A@mail.gmail.com
Whole thread Raw
In response to Re: Zstandard support for toast compression  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Fri, May 20, 2022 at 4:17 PM Stephen Frost <sfrost@snowman.net> wrote:
> A thought I've had before is that it'd be nice to specify a particular
> compression method on a data type basis.  Wasn't the direction that this
> was taken, for reasons, but I wonder about perhaps still having a data
> type compression method and perhaps one of these bits might be "the data
> type's (default?) compression method".  Even so though, having an
> extensible way to add new compression methods would be a good thing.

If we look at pglz vs. LZ4, there's no argument that it makes more
sense to use LZ4 for some data types and PGLZ for others. Indeed, it's
unclear why you would ever use PGLZ if you had LZ4 as an option. Even
if we imagine a world in which a full spectrum of modern compressors -
Zstandard, bzip2, gzip, and whatever else you want - it's basically a
time/space tradeoff. You will either want a fast compressor or a good
one.

The situation in which this sort of thing might make sense is if we
had a compressor that is specifically designed to work well on a
certain data type, and especially if the code for that data type could
perform some operations directly on the compressed representation.
From what I understand, the ideas that people have in this area around
jsonb require that there be a dictionary available. For instance, you
might scan a jsonb column, collect all the keys that occur frequently,
put them in a dictionary, and then use them to compress the column. I
can see that being effective, but the infrastructure to store that
dictionary someplace is infrastructure we have not got.

It may be better to try to handle these use cases by building the
compression into the data type representation proper, perhaps
disabling the general-purpose TOAST compression stuff, rather than by
making it part of TOAST compression. We found during the
implementation of LZ4 TOAST compression that it's basically impossible
to keep a compressed datum from "leaking out" into other parts of the
system. We have to assume that any datum we create by TOAST
compression may continue to exist somewhere in the system long after
the table in which it was originally stored is gone. So, while a
dictionary could be used for compression, it would have to be done in
a way where that dictionary wasn't required to decompress, unless
we're prepared to prohibit ever dropping a dictionary, which sounds
like not a lot of fun. If the compression were part of the data type
instead of part of TOAST compression, we would dodge this problem.

I think that might be a better way to go.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Zstandard support for toast compression
Next
From: torikoshia
Date:
Subject: fix typos in storing statistics in shared memory