Re: adding 'zstd' as a compression algorithm - Mailing list pgsql-hackers

From Robert Haas
Subject Re: adding 'zstd' as a compression algorithm
Date
Msg-id CA+TgmoYo1unP9y_tDATd7G7SOycxtzbtT0vsc+ZZ1G6pftXRBw@mail.gmail.com
Whole thread Raw
In response to Re: adding 'zstd' as a compression algorithm  (Peter Geoghegan <pg@bowt.ie>)
Responses Re: adding 'zstd' as a compression algorithm  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
On Tue, Feb 15, 2022 at 2:40 PM Peter Geoghegan <pg@bowt.ie> wrote:
> On Tue, Feb 15, 2022 at 10:20 AM Robert Haas <robertmhaas@gmail.com> wrote:
> > In general, deciding on new compression algorithms can feel a bit like
> > debating the merits of vi vs. emacs, or one political party vs.
> > another.
>
> Really? I don't get that impression myself. (That's not important, though.)

Well, that's a good thing. Maybe we're getting better at keeping
things constructive...

> > What I imagine if this patch is accepted is that we (or our users)
> > will end up using lz4 for places where compression needs to be very
> > lightweight, and zstd for places where it's acceptable or even
> > desirable to spend more CPU cycles in exchange for better compression.
>
> Sounds reasonable to me.

Cool.

> > Likewise, I still download the .tar.gz version of anything
> > that gives me that option, basically because I'm familiar with the
> > format and it's easy for me to just carry on using it -- and in a
> > similar way I expect a lot of people will be happy to continue to
> > compress backups with gzip for many years to come.
>
> Maybe, but it seems more likely that they'll usually do whatever the
> easiest reasonable-seeming thing is. Perhaps we need to distinguish
> between "container format" (e.g., a backup that is produced by a tool
> like pgBackrest, including backup manifest) and compression algorithm.

I'm not sure I completely follow this. There are cases where we use
compression algorithms for internal purposes, and we can change the
defaults without users knowing or caring. For example, we could decide
that LZ4-compressing FPIs in WAL records is a sensible default and
just start doing it, and users need not care. But backups are
different, because when you pg_basebackup -Ft, you get .tar or .tar.gz
or .tar.lz4 files which we don't give you tools to extract. You have
to be able to extract them with OS tools. In cases like that, people
are going to be more likely to pick formats with which they are
familiar even if they are not technically best, and I don't think we
really dare change the defaults. That seems OK, though.

> Isn't it an incontrovertible fact that LZ4 is superior to pglz in
> every way? LZ4 is pretty much its successor. And so it seems totally
> fine to assume that users will always want to use the clearly better
> option, and that that option will be generally available going
> forward. TOAST compression is applied selectively already, based on
> various obscure implementation details, some of which are quite
> arbitrary.

I pretty much agree with all of that. Nevertheless, there's a lot of
pglz-compressed data that's already on a disk someplace which people
are not necessarily keen to rewrite, and that will probably remain
true for the foreseable future. If we change the default to something
that's not pglz, and then wait 10 years, we MIGHT be able to get rid
of it without pushback. But I wouldn't be surprised to learn that even
then there are a lot of people who have just pg_upgrade'd the same
database over and over again. And honestly I think that's fine. Yeah,
lz4 is better. But, on some data sets, there's very little real
difference in the compression ratio you get. Until keeping pglz around
starts costing us something tangible, there's no reason to spend any
effort worrying about it.

> I know less about zstd, but I imagine that a similar dynamic exists
> there. Overall, everything that you've said makes sense to me.

Great, thanks for chiming in.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: adding 'zstd' as a compression algorithm
Next
From: Robert Haas
Date:
Subject: Re: adding 'zstd' as a compression algorithm