Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | 20220626155532.GA9311@telsasoft.com Whole thread Raw |
In response to | Re: Add LZ4 compression in pg_dump (Justin Pryzby <pryzby@telsasoft.com>) |
Responses |
Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump |
List | pgsql-hackers |
Hi, Will you be able to send a rebased patch for the next CF ? If you update for the review comments I sent in March, I'll plan to do another round of review. On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > shouldn't be handled more gracefully. LZ4 should either have an explicit > version check, or else shouldn't depend on that feature (or should define a > safe fallback version if the library header doesn't define it). > > https://packages.ubuntu.com/liblz4-1 > > 0003: typo: of legacy => or legacy > > There are a large number of ifdefs being added here - it'd be nice to minimize > that. basebackup was organized to use separate files, which is one way. > > $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c > src/bin/pg_dump/compress_io.c:19 > > In last year's CF entry, I had made a union within CompressorState. LZ4 > doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, > ZSTD_CStream). > > 0002: I wonder if you're able to re-use any of the basebackup parsing stuff > from commit ffd53659c. You're passing both the compression method *and* level. > I think there should be a structure which includes both. In the future, that > can also handle additional options. I hope to re-use these same things for > wal_compression=method:level. > > You renamed this: > > |- COMPR_ALG_LIBZ > |-} CompressionAlgorithm; > |+ COMPRESSION_GZIP, > |+} CompressionMethod; > > ..But I don't think that's an improvement. If you were to change it, it should > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > structs and typedefs. zlib is not idential to gzip, which uses a different > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. > > The cf* changes in pg_backup_archiver could be split out into a separate > commit. It's strictly a code simplification - not just preparation for more > compression algorithms. The commit message should "See also: > bf9aa490db24b2334b3595ee33653bf2fe39208c". > > The changes in 0002 for cfopen_write seem insufficient: > |+ if (compressionMethod == COMPRESSION_NONE) > |+ fp = cfopen(path, mode, compressionMethod, 0); > | else > | { > | #ifdef HAVE_LIBZ > | char *fname; > | > | fname = psprintf("%s.gz", path); > |- fp = cfopen(fname, mode, compression); > |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); > | free_keep_errno(fname); > | #else > > The only difference between the LIBZ and uncompressed case is the file > extension, and it'll be the only difference with LZ4 too. So I suggest to > first handle the file extension, and the rest of the code path is not > conditional on the compression method. I don't think cfopen_write even needs > HAVE_LIBZ - can't you handle that in cfopen_internal() ? > > This patch rejects -Z0, which ought to be accepted: > ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc > pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set > > Your 0003 patch shouldn't reference LZ4: > +#ifndef HAVE_LIBLZ4 > + if (*compressionMethod == COMPRESSION_LZ4) > + supports_compression = false; > +#endif > > The 0004 patch renames zlibOutSize to outsize - I think the patch series should > be constructed such as to minimize the size of the method-specific patches. I > say this anticipating also adding support for zstd. The preliminary patches > should have all the boring stuff. It would help for reviewing to keep the > patches split up, or to enumerate all the boring things that are being renamed > (like change OutputContext to cfp, rename zlibOutSize, ...). > > 0004: The include should use <lz4.h> and not "lz4.h" > > freebsd/cfbot is failing. > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > exercise it more on CI. On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > You're passing both the compression method *and* level. I think there should > > be a structure which includes both. In the future, that can also handle > > additional options. > > I'm not sure if there's anything worth saving, but I did that last year with > 0003-Support-multiple-compression-algs-levels-opts.patch > I sent a rebased copy off-list. > https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391 > > | fatal("not built with LZ4 support"); > | fatal("not built with lz4 support"); > > Please use consistent capitalization of "lz4" - then the compiler can optimize > away duplicate strings. > > > 0004: The include should use <lz4.h> and not "lz4.h" > > Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.
pgsql-hackers by date: