Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 20220626155532.GA9311@telsasoft.com
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump
List pgsql-hackers
Hi,

Will you be able to send a rebased patch for the next CF ?

If you update for the review comments I sent in March, I'll plan to do another
round of review.

On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.
> 
> I ran into that on an ubuntu LTS, so I don't think it's so old that it
> shouldn't be handled more gracefully.  LZ4 should either have an explicit
> version check, or else shouldn't depend on that feature (or should define a
> safe fallback version if the library header doesn't define it).
> 
> https://packages.ubuntu.com/liblz4-1
> 
> 0003: typo: of legacy => or legacy
> 
> There are a large number of ifdefs being added here - it'd be nice to minimize
> that.  basebackup was organized to use separate files, which is one way.
> 
> $ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
> src/bin/pg_dump/compress_io.c:19
> 
> In last year's CF entry, I had made a union within CompressorState.  LZ4
> doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
> ZSTD_CStream).
> 
> 0002: I wonder if you're able to re-use any of the basebackup parsing stuff
> from commit ffd53659c.  You're passing both the compression method *and* level.
> I think there should be a structure which includes both.  In the future, that
> can also handle additional options.  I hope to re-use these same things for
> wal_compression=method:level.
> 
> You renamed this:
> 
> |-       COMPR_ALG_LIBZ
> |-} CompressionAlgorithm;
> |+       COMPRESSION_GZIP,
> |+} CompressionMethod;
> 
> ..But I don't think that's an improvement.  If you were to change it, it should
> say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
> structs and typedefs.  zlib is not idential to gzip, which uses a different
> header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.
> 
> The cf* changes in pg_backup_archiver could be split out into a separate
> commit.  It's strictly a code simplification - not just preparation for more
> compression algorithms.  The commit message should "See also:
> bf9aa490db24b2334b3595ee33653bf2fe39208c".
> 
> The changes in 0002 for cfopen_write seem insufficient:
> |+       if (compressionMethod == COMPRESSION_NONE)
> |+               fp = cfopen(path, mode, compressionMethod, 0);
> |        else
> |        {
> | #ifdef HAVE_LIBZ
> |                char       *fname;
> | 
> |                fname = psprintf("%s.gz", path);
> |-               fp = cfopen(fname, mode, compression);
> |+               fp = cfopen(fname, mode, compressionMethod, compressionLevel);
> |                free_keep_errno(fname);
> | #else
> 
> The only difference between the LIBZ and uncompressed case is the file
> extension, and it'll be the only difference with LZ4 too.  So I suggest to
> first handle the file extension, and the rest of the code path is not
> conditional on the compression method.  I don't think cfopen_write even needs
> HAVE_LIBZ - can't you handle that in cfopen_internal() ?
> 
> This patch rejects -Z0, which ought to be accepted:
> ./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
> pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set
> 
> Your 0003 patch shouldn't reference LZ4:
> +#ifndef HAVE_LIBLZ4
> +       if (*compressionMethod == COMPRESSION_LZ4)
> +               supports_compression = false;
> +#endif
> 
> The 0004 patch renames zlibOutSize to outsize - I think the patch series should
> be constructed such as to minimize the size of the method-specific patches.  I
> say this anticipating also adding support for zstd.  The preliminary patches
> should have all the boring stuff.  It would help for reviewing to keep the 
> patches split up, or to enumerate all the boring things that are being renamed
> (like change OutputContext to cfp, rename zlibOutSize, ...).
> 
> 0004: The include should use <lz4.h> and not "lz4.h"
> 
> freebsd/cfbot is failing.
> 
> I suggested off-list to add an 0099 patch to change LZ4 to the default, to
> exercise it more on CI.

On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote:
> On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote:
> > You're passing both the compression method *and* level.  I think there should
> > be a structure which includes both.  In the future, that can also handle
> > additional options.
> 
> I'm not sure if there's anything worth saving, but I did that last year with
> 0003-Support-multiple-compression-algs-levels-opts.patch
> I sent a rebased copy off-list.
> https://www.postgresql.org/message-id/flat/20210104025321.GA9712@telsasoft.com#ca1b9f9d3552c87fa874731cad9d8391
> 
> |    fatal("not built with LZ4 support");
> |    fatal("not built with lz4 support");
> 
> Please use consistent capitalization of "lz4" - then the compiler can optimize
> away duplicate strings.
> 
> > 0004: The include should use <lz4.h> and not "lz4.h"
> 
> Also, use USE_LZ4 rather than HAVE_LIBLZ4, per 75eae0908.



pgsql-hackers by date:

Previous
From: Erik Rijkers
Date:
Subject: JSON/SQL: jsonpath: incomprehensible error message
Next
From: "David G. Johnston"
Date:
Subject: Re: doc: Clarify Savepoint Behavior