Re: adding 'zstd' as a compression algorithm - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: adding 'zstd' as a compression algorithm |
Date | |
Msg-id | CA+TgmoYo1unP9y_tDATd7G7SOycxtzbtT0vsc+ZZ1G6pftXRBw@mail.gmail.com Whole thread Raw |
In response to | Re: adding 'zstd' as a compression algorithm (Peter Geoghegan <pg@bowt.ie>) |
Responses |
Re: adding 'zstd' as a compression algorithm
|
List | pgsql-hackers |
On Tue, Feb 15, 2022 at 2:40 PM Peter Geoghegan <pg@bowt.ie> wrote: > On Tue, Feb 15, 2022 at 10:20 AM Robert Haas <robertmhaas@gmail.com> wrote: > > In general, deciding on new compression algorithms can feel a bit like > > debating the merits of vi vs. emacs, or one political party vs. > > another. > > Really? I don't get that impression myself. (That's not important, though.) Well, that's a good thing. Maybe we're getting better at keeping things constructive... > > What I imagine if this patch is accepted is that we (or our users) > > will end up using lz4 for places where compression needs to be very > > lightweight, and zstd for places where it's acceptable or even > > desirable to spend more CPU cycles in exchange for better compression. > > Sounds reasonable to me. Cool. > > Likewise, I still download the .tar.gz version of anything > > that gives me that option, basically because I'm familiar with the > > format and it's easy for me to just carry on using it -- and in a > > similar way I expect a lot of people will be happy to continue to > > compress backups with gzip for many years to come. > > Maybe, but it seems more likely that they'll usually do whatever the > easiest reasonable-seeming thing is. Perhaps we need to distinguish > between "container format" (e.g., a backup that is produced by a tool > like pgBackrest, including backup manifest) and compression algorithm. I'm not sure I completely follow this. There are cases where we use compression algorithms for internal purposes, and we can change the defaults without users knowing or caring. For example, we could decide that LZ4-compressing FPIs in WAL records is a sensible default and just start doing it, and users need not care. But backups are different, because when you pg_basebackup -Ft, you get .tar or .tar.gz or .tar.lz4 files which we don't give you tools to extract. You have to be able to extract them with OS tools. In cases like that, people are going to be more likely to pick formats with which they are familiar even if they are not technically best, and I don't think we really dare change the defaults. That seems OK, though. > Isn't it an incontrovertible fact that LZ4 is superior to pglz in > every way? LZ4 is pretty much its successor. And so it seems totally > fine to assume that users will always want to use the clearly better > option, and that that option will be generally available going > forward. TOAST compression is applied selectively already, based on > various obscure implementation details, some of which are quite > arbitrary. I pretty much agree with all of that. Nevertheless, there's a lot of pglz-compressed data that's already on a disk someplace which people are not necessarily keen to rewrite, and that will probably remain true for the foreseable future. If we change the default to something that's not pglz, and then wait 10 years, we MIGHT be able to get rid of it without pushback. But I wouldn't be surprised to learn that even then there are a lot of people who have just pg_upgrade'd the same database over and over again. And honestly I think that's fine. Yeah, lz4 is better. But, on some data sets, there's very little real difference in the compression ratio you get. Until keeping pglz around starts costing us something tangible, there's no reason to spend any effort worrying about it. > I know less about zstd, but I imagine that a similar dynamic exists > there. Overall, everything that you've said makes sense to me. Great, thanks for chiming in. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: