Re: zstd compression for pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: zstd compression for pg_dump
Date
Msg-id 20210104025321.GA9712@telsasoft.com
Whole thread Raw
In response to Re: zstd compression for pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: zstd compression for pg_dump  (Andrey Borodin <x4mmm@yandex-team.ru>)
Re: zstd compression for pg_dump  (Andreas Karlsson <andreas@proxel.se>)
Re: zstd compression for pg_dump  (David Steele <david@pgmasters.net>)
List pgsql-hackers
On Mon, Dec 21, 2020 at 01:49:24PM -0600, Justin Pryzby wrote:
> a big disadvantage of piping through zstd is that it's not identified as a
> PGDMP file, and, /usr/bin/file on centos7 fails to even identify zstd by its
> magic number..

Other reasons are that pg_dump |zstd >output.zst loses the exit status of
pg_dump, and that it's not "transparent" (one needs to type
"zstd -dq |pg_restore").

On Mon, Dec 21, 2020 at 08:32:35PM -0600, Justin Pryzby wrote:
> On Mon, Dec 21, 2020 at 03:02:40PM -0500, Tom Lane wrote:
> > Justin Pryzby <pryzby@telsasoft.com> writes:
> > > I found that our largest tables are 40% smaller and 20% faster to pipe
> > > pg_dump -Fc -Z0 |zstd relative to native zlib
> > 
> > The patch might be a tad smaller if you hadn't included a core file in it.
> 
> About 89% smaller.
> 
> This also fixes the extension (.zst)
> And fixes zlib default compression.
> And a bunch of cleanup.

I rebased so the "typedef struct compression" patch is first and zstd on top of
that (say, in case someone wants to bikeshed about which compression algorithm
to support).  And made a central struct with all the compression-specific info
to further isolate the compress-specific changes.

And handle compression of "plain" archive format.
And fix compilation for MSVC and make --without-zstd the default.

And fix cfgets() (which I think is actually unused code for the code paths for
compressed FP).

And add fix for pre-existing problem: ftello() on unseekable input.

I also started a patch to allow compression of "tar" format, but I didn't
include that here yet.

Note, there's currently several "compression" patches in CF app.  This patch
seems to be independent of the others, but probably shouldn't be totally
uncoordinated (like adding lz4 in one and ztsd in another might be poor
execution).

https://commitfest.postgresql.org/31/2897/
 - Faster pglz compression
https://commitfest.postgresql.org/31/2813/
 - custom compression methods for toast
https://commitfest.postgresql.org/31/2773/
 - libpq compression

-- 
Justin

Attachment

pgsql-hackers by date:

Previous
From: Pavel Stehule
Date:
Subject: Re: [HACKERS] [PATCH] Generic type subscripting
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: A failure of standby to follow timeline switch