Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id 20220326162156.GI28503@telsasoft.com
Whole thread Raw
In response to Add LZ4 compression in pg_dump  (Georgios <gkokolatos@protonmail.com>)
Responses Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump
Re: Add LZ4 compression in pg_dump
List pgsql-hackers
LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4.

I ran into that on an ubuntu LTS, so I don't think it's so old that it
shouldn't be handled more gracefully.  LZ4 should either have an explicit
version check, or else shouldn't depend on that feature (or should define a
safe fallback version if the library header doesn't define it).

https://packages.ubuntu.com/liblz4-1

0003: typo: of legacy => or legacy

There are a large number of ifdefs being added here - it'd be nice to minimize
that.  basebackup was organized to use separate files, which is one way.

$ git grep -c 'ifdef .*LZ4' src/bin/pg_dump/compress_io.c
src/bin/pg_dump/compress_io.c:19

In last year's CF entry, I had made a union within CompressorState.  LZ4
doesn't need z_streamp (and ztsd will need ZSTD_outBuffer, ZSTD_inBuffer,
ZSTD_CStream).

0002: I wonder if you're able to re-use any of the basebackup parsing stuff
from commit ffd53659c.  You're passing both the compression method *and* level.
I think there should be a structure which includes both.  In the future, that
can also handle additional options.  I hope to re-use these same things for
wal_compression=method:level.

You renamed this:

|-       COMPR_ALG_LIBZ
|-} CompressionAlgorithm;
|+       COMPRESSION_GZIP,
|+} CompressionMethod;

..But I don't think that's an improvement.  If you were to change it, it should
say something like PGDUMP_COMPRESS_ZLIB, since there are other compression
structs and typedefs.  zlib is not idential to gzip, which uses a different
header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect.

The cf* changes in pg_backup_archiver could be split out into a separate
commit.  It's strictly a code simplification - not just preparation for more
compression algorithms.  The commit message should "See also:
bf9aa490db24b2334b3595ee33653bf2fe39208c".

The changes in 0002 for cfopen_write seem insufficient:
|+       if (compressionMethod == COMPRESSION_NONE)
|+               fp = cfopen(path, mode, compressionMethod, 0);
|        else
|        {
| #ifdef HAVE_LIBZ
|                char       *fname;
| 
|                fname = psprintf("%s.gz", path);
|-               fp = cfopen(fname, mode, compression);
|+               fp = cfopen(fname, mode, compressionMethod, compressionLevel);
|                free_keep_errno(fname);
| #else

The only difference between the LIBZ and uncompressed case is the file
extension, and it'll be the only difference with LZ4 too.  So I suggest to
first handle the file extension, and the rest of the code path is not
conditional on the compression method.  I don't think cfopen_write even needs
HAVE_LIBZ - can't you handle that in cfopen_internal() ?

This patch rejects -Z0, which ought to be accepted:
./src/bin/pg_dump/pg_dump -h /tmp regression -Fc -Z0 |wc
pg_dump: error: can only specify -Z/--compress [LEVEL] when method is set

Your 0003 patch shouldn't reference LZ4:
+#ifndef HAVE_LIBLZ4
+       if (*compressionMethod == COMPRESSION_LZ4)
+               supports_compression = false;
+#endif

The 0004 patch renames zlibOutSize to outsize - I think the patch series should
be constructed such as to minimize the size of the method-specific patches.  I
say this anticipating also adding support for zstd.  The preliminary patches
should have all the boring stuff.  It would help for reviewing to keep the 
patches split up, or to enumerate all the boring things that are being renamed
(like change OutputContext to cfp, rename zlibOutSize, ...).

0004: The include should use <lz4.h> and not "lz4.h"

freebsd/cfbot is failing.

I suggested off-list to add an 0099 patch to change LZ4 to the default, to
exercise it more on CI.

-- 
Justin



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Pointer subtraction with a null pointer
Next
From: Andres Freund
Date:
Subject: Re: Pointer subtraction with a null pointer