Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | 20220326162156.GI28503@telsasoft.com Whole thread Raw |
In response to | Add LZ4 compression in pg_dump (Georgios <gkokolatos@protonmail.com>) |
Responses |
Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump Re: Add LZ4 compression in pg_dump Re: Add LZ4 compression in pg_dump |
List | pgsql-hackers |
LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. I ran into that on an ubuntu LTS, so I don't think it's so old that it shouldn't be handled more gracefully. LZ4 should either have an explicit version check, or else shouldn't depend on that feature (or should define a safe fallback version if the library header doesn't define it). https://packages.ubuntu.com/liblz4-1 0003: typo: of legacy => or legacy There are a large number of ifdefs being added here - it'd be nice to minimize that. basebackup was organized to use separate files, which is one way. $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c src/bin/pg_dump/compress_io.c:19 In last year's CF entry, I had made a union within CompressorState. LZ4 doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer, ZSTD_CStream). 0002: I wonder if you're able to re-use any of the basebackup parsing stuff from commit ffd53659c. You're passing both the compression method *and* level. I think there should be a structure which includes both. In the future, that can also handle additional options. I hope to re-use these same things for wal_compression=method:level. You renamed this: |- COMPR_ALG_LIBZ |-} CompressionAlgorithm; |+ COMPRESSION_GZIP, |+} CompressionMethod; ..But I don't think that's an improvement. If you were to change it, it should say something like PGDUMP_COMPRESS_ZLIB, since there are other compression structs and typedefs. zlib is not idential to gzip, which uses a different header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. The cf* changes in pg_backup_archiver could be split out into a separate commit. It's strictly a code simplification - not just preparation for more compression algorithms. The commit message should "See also: bf9aa490db24b2334b3595ee33653bf2fe39208c". The changes in 0002 for cfopen_write seem insufficient: |+ if (compressionMethod == COMPRESSION_NONE) |+ fp = cfopen(path, mode, compressionMethod, 0); | else | { | #ifdef HAVE_LIBZ | char *fname; | | fname = psprintf("%s.gz", path); |- fp = cfopen(fname, mode, compression); |+ fp = cfopen(fname, mode, compressionMethod, compressionLevel); | free_keep_errno(fname); | #else The only difference between the LIBZ and uncompressed case is the file extension, and it'll be the only difference with LZ4 too. So I suggest to first handle the file extension, and the rest of the code path is not conditional on the compression method. I don't think cfopen_write even needs HAVE_LIBZ - can't you handle that in cfopen_internal() ? This patch rejects -Z0, which ought to be accepted: ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set Your 0003 patch shouldn't reference LZ4: +#ifndef HAVE_LIBLZ4 + if (*compressionMethod == COMPRESSION_LZ4) + supports_compression = false; +#endif The 0004 patch renames zlibOutSize to outsize - I think the patch series should be constructed such as to minimize the size of the method-specific patches. I say this anticipating also adding support for zstd. The preliminary patches should have all the boring stuff. It would help for reviewing to keep the patches split up, or to enumerate all the boring things that are being renamed (like change OutputContext to cfp, rename zlibOutSize, ...). 0004: The include should use <lz4.h> and not "lz4.h" freebsd/cfbot is failing. I suggested off-list to add an 0099 patch to change LZ4 to the default, to exercise it more on CI. -- Justin
pgsql-hackers by date: