Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | gkokolatos@pm.me |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | UsTnh6pNlF-Kichjme-H98KcAIycxLqV_BNBh4OvGLinGadL2KlsRlW2LHzB8qDbABTrLJ6CSTtENifm7mXOinRUXftSwZ3_rWj7vsPYVAc=@pm.me Whole thread Raw |
In response to | Re: Add LZ4 compression in pg_dump (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
------- Original Message ------- On Sunday, June 26th, 2022 at 5:55 PM, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > Hi, > > Will you be able to send a rebased patch for the next CF ? Thank you for taking an interest in the PR. The plan is indeed to sent a new version. > If you update for the review comments I sent in March, I'll plan to do another > round of review. Thank you. > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > version check, or else shouldn't depend on that feature (or should define a > > safe fallback version if the library header doesn't define it). > > > > https://packages.ubuntu.com/liblz4-1 > > > > 0003: typo: of legacy => or legacy > > > > There are a large number of ifdefs being added here - it'd be nice to minimize > > that. basebackup was organized to use separate files, which is one way. > > > > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c > > src/bin/pg_dump/compress_io.c:19 > > > > In last year's CF entry, I had made a union within CompressorState. LZ4 > > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, > > ZSTD_CStream). > > > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > > from commit ffd53659c. You're passing both the compression method and level. > > I think there should be a structure which includes both. In the future, that > > can also handle additional options. I hope to re-use these same things for > > wal_compression=method:level. > > > > You renamed this: > > > > |- COMPR_ALG_LIBZ > > |-} CompressionAlgorithm; > > |+ COMPRESSION_GZIP, > > |+} CompressionMethod; > > > > ..But I don't think that's an improvement. If you were to change it, it should > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > structs and typedefs. zlib is not idential to gzip, which uses a different > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. > > > > The cf* changes in pg_backup_archiver could be split out into a separate > > commit. It's strictly a code simplification - not just preparation for more > > compression algorithms. The commit message should "See also: > > bf9aa490db24b2334b3595ee33653bf2fe39208c". > > > > The changes in 0002 for cfopen_write seem insufficient: > > |+ if (compressionMethod == COMPRESSION_NONE) > > |+ fp = cfopen(path, mode, compressionMethod, 0); > > | else > > | { > > | #ifdef HAVE_LIBZ > > | char *fname; > > | > > | fname = psprintf("%s.gz", path); > > |- fp = cfopen(fname, mode, compression); > > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); > > | free_keep_errno(fname); > > | #else > > > > The only difference between the LIBZ and uncompressed case is the file > > extension, and it'll be the only difference with LZ4 too. So I suggest to > > first handle the file extension, and the rest of the code path is not > > conditional on the compression method. I don't think cfopen_write even needs > > HAVE_LIBZ - can't you handle that in cfopen_internal() ? > > > > This patch rejects -Z0, which ought to be accepted: > > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc > > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set > > > > Your 0003 patch shouldn't reference LZ4: > > +#ifndef HAVE_LIBLZ4 > > + if (*compressionMethod == COMPRESSION_LZ4) > > + supports_compression = false; > > +#endif > > > > The 0004 patch renames zlibOutSize to outsize - I think the patch series should > > be constructed such as to minimize the size of the method-specific patches. I > > say this anticipating also adding support for zstd. The preliminary patches > > should have all the boring stuff. It would help for reviewing to keep the > > patches split up, or to enumerate all the boring things that are being renamed > > (like change OutputContext to cfp, rename zlibOutSize, ...). > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > freebsd/cfbot is failing. > > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > exercise it more on CI. > > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > You're passing both the compression method and level. I think there should > > > be a structure which includes both. In the future, that can also handle > > > additional options. > > > > I'm not sure if there's anything worth saving, but I did that last year with > > 0003-Support-multiple-compression-algs-levels-opts.patch > > I sent a rebased copy off-list. > > https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 > > > > | fatal("not built with LZ4 support"); > > | fatal("not built with lz4 support"); > > > > Please use consistent capitalization of "lz4" - then the compiler can optimize > > away duplicate strings. > > > > > 0004: The include should use <lz4.h> and not "lz4.h" > > > > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908. > >
pgsql-hackers by date: