Re: zstd compression for pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: zstd compression for pg_dump
Date
Msg-id 20210104070600.GB9712@telsasoft.com
Whole thread Raw
In response to Re: zstd compression for pg_dump  (Andrey Borodin <x4mmm@yandex-team.ru>)
List pgsql-hackers
On Mon, Jan 04, 2021 at 11:04:57AM +0500, Andrey Borodin wrote:
> > 4 янв. 2021 г., в 07:53, Justin Pryzby <pryzby@telsasoft.com> написал(а):
> > Note, there's currently several "compression" patches in CF app.  This patch
> > seems to be independent of the others, but probably shouldn't be totally
> > uncoordinated (like adding lz4 in one and ztsd in another might be poor
> > execution).
> > 
> > https://commitfest.postgresql.org/31/2897/
> > - Faster pglz compression
> > https://commitfest.postgresql.org/31/2813/
> > - custom compression methods for toast
> > https://commitfest.postgresql.org/31/2773/
> > - libpq compression
> 
> I think that's downside of our development system: patch authors do not want to create dependencies on other
patches.

I think in these cases, someone who notices common/overlapping patches should
suggest that the authors review each other's work.  In some cases, I think it's
appropriate to come up with a "shared" preliminary patch(es), which both (all)
patch authors can include as 0001 until its finalized and merged.  That might
be true for some things like the tableam work, or the two "online checksum"
patches.

> I'd say that both lz4 and zstd should be supported in TOAST, FPIs, libpq, and pg_dump. As to pglz - I think we should
notproliferate it any further.
 

pg_basebackup came up as another use on another thread, I think related to
libpq protocol compression.

> Libpq compression encountered some problems with memory consumption which
> required some extra config efforts. Did you measure memory usage for this
> patchset?

RAM use is not significantly different from zlib, except that zstd --long adds
more memory.

$ command time -v pg_dump -d ts -t ... -Fc -Z0 |wc -c
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.77
        Maximum resident set size (kbytes): 40504
    1397288924 # no compression: 1400MB

$ command time -v pg_dump -d ts -t ... -Fc |wc -c
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.17
        Maximum resident set size (kbytes): 40504
    132932415 # default (zlib) compression: 132 MB

$ command time -v ./pg_dump -d ts -t ... -Fc |wc -c
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.28
        Maximum resident set size (kbytes): 40568
    86048139 # zstd: 86MB

$ command time -v ./pg_dump -d ts -t ... -Fc -Z 'alg=zstd opt=zstdlong' |wc -c
        Elapsed (wall clock) time (h:mm:ss or m:ss): 0:30.49
        Maximum resident set size (kbytes): 180332
    72202937 # zstd long: 180MB

> [PATCH 04/20] struct compressLibs
> I think this directive would be correct.
> +// #ifdef HAVE_LIBZ?

I'm not sure .. I'm thinking of making the COMPR_ALG_* always defined, and then
fail later if an operation is unsupported.  There's an excessive number of
#ifdefs already, so the early commits are intended to minimize as far as
possible what's needed for each additional compression
algorithm(lib/method/whatever it's called).  I haven't tested much with
pg_restore of files with unsupported compression libs.

> [PATCH 06/20] pg_dump: zstd compression
> I'd propose to build with Zstd by default. It seems other patches do it this way. Though, I there are possible
downsides.

Yes...but the cfbot turns red if the patch require zstd, so it defaults to
off until it's included in the build environments (but for now, the main patch
isn't being tested).

Thanks for looking.

-- 
Justin



pgsql-hackers by date:

Previous
From: Dilip Kumar
Date:
Subject: Re: [HACKERS] Custom compression methods
Next
From: Luc Vlaming
Date:
Subject: Re: New Table Access Methods for Multi and Single Inserts