Re: zstd compression for pg_dump - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: zstd compression for pg_dump |
Date | |
Msg-id | 20210104070600.GB9712@telsasoft.com Whole thread Raw |
In response to | Re: zstd compression for pg_dump (Andrey Borodin <x4mmm@yandex-team.ru>) |
List | pgsql-hackers |
On Mon, Jan 04, 2021 at 11:04:57AM +0500, Andrey Borodin wrote: > > 4 янв. 2021 г., в 07:53, Justin Pryzby <pryzby@telsasoft.com> написал(а): > > Note, there's currently several "compression" patches in CF app. This patch > > seems to be independent of the others, but probably shouldn't be totally > > uncoordinated (like adding lz4 in one and ztsd in another might be poor > > execution). > > > > https://commitfest.postgresql.org/31/2897/ > > - Faster pglz compression > > https://commitfest.postgresql.org/31/2813/ > > - custom compression methods for toast > > https://commitfest.postgresql.org/31/2773/ > > - libpq compression > > I think that's downside of our development system: patch authors do not want to create dependencies on other patches. I think in these cases, someone who notices common/overlapping patches should suggest that the authors review each other's work. In some cases, I think it's appropriate to come up with a "shared" preliminary patch(es), which both (all) patch authors can include as 0001 until its finalized and merged. That might be true for some things like the tableam work, or the two "online checksum" patches. > I'd say that both lz4 and zstd should be supported in TOAST, FPIs, libpq, and pg_dump. As to pglz - I think we should notproliferate it any further. pg_basebackup came up as another use on another thread, I think related to libpq protocol compression. > Libpq compression encountered some problems with memory consumption which > required some extra config efforts. Did you measure memory usage for this > patchset? RAM use is not significantly different from zlib, except that zstd --long adds more memory. $ command time -v pg_dump -d ts -t ... -Fc -Z0 |wc -c Elapsed (wall clock) time (h:mm:ss or m:ss): 0:28.77 Maximum resident set size (kbytes): 40504 1397288924 # no compression: 1400MB $ command time -v pg_dump -d ts -t ... -Fc |wc -c Elapsed (wall clock) time (h:mm:ss or m:ss): 0:37.17 Maximum resident set size (kbytes): 40504 132932415 # default (zlib) compression: 132 MB $ command time -v ./pg_dump -d ts -t ... -Fc |wc -c Elapsed (wall clock) time (h:mm:ss or m:ss): 0:29.28 Maximum resident set size (kbytes): 40568 86048139 # zstd: 86MB $ command time -v ./pg_dump -d ts -t ... -Fc -Z 'alg=zstd opt=zstdlong' |wc -c Elapsed (wall clock) time (h:mm:ss or m:ss): 0:30.49 Maximum resident set size (kbytes): 180332 72202937 # zstd long: 180MB > [PATCH 04/20] struct compressLibs > I think this directive would be correct. > +// #ifdef HAVE_LIBZ? I'm not sure .. I'm thinking of making the COMPR_ALG_* always defined, and then fail later if an operation is unsupported. There's an excessive number of #ifdefs already, so the early commits are intended to minimize as far as possible what's needed for each additional compression algorithm(lib/method/whatever it's called). I haven't tested much with pg_restore of files with unsupported compression libs. > [PATCH 06/20] pg_dump: zstd compression > I'd propose to build with Zstd by default. It seems other patches do it this way. Though, I there are possible downsides. Yes...but the cfbot turns red if the patch require zstd, so it defaults to off until it's included in the build environments (but for now, the main patch isn't being tested). Thanks for looking. -- Justin
pgsql-hackers by date: