Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: Add LZ4 compression in pg_dump |
Date | |
Msg-id | 20220705151328.GQ13040@telsasoft.com Whole thread Raw |
In response to | Re: Add LZ4 compression in pg_dump (gkokolatos@pm.me) |
List | pgsql-hackers |
This is a review of 0001. On Tue, Jul 05, 2022 at 01:22:47PM +0000, gkokolatos@pm.me wrote: > Simply this patchset had started to divert > heavily already based on comments from Mr. Paquier who had already requested for > the APIs to be refactored to use function pointers. This is happening in 0002 of > the patchset. I said something about reducing ifdefs, but I'm having trouble finding what Michael said about this ? > > On Sat, Mar 26, 2022 at 11:21:56AM -0500, Justin Pryzby wrote: > > > > > LZ4F_HEADER_SIZE_MAX isn't defined in old LZ4. > > > > > > I ran into that on an ubuntu LTS, so I don't think it's so old that it > > > shouldn't be handled more gracefully. LZ4 should either have an explicit > > > version check, or else shouldn't depend on that feature (or should define a > > > safe fallback version if the library header doesn't define it). > > > https://packages.ubuntu.com/liblz4-1 The constant still seems to be used without defining a fallback or a minimum version. > > > 0003: typo: of legacy => or legacy This is still there > > > You renamed this: > > > > > > |- COMPR_ALG_LIBZ > > > |-} CompressionAlgorithm; > > > |+ COMPRESSION_GZIP, > > > |+} CompressionMethod; > > > > > > ..But I don't think that's an improvement. If you were to change it, it should > > > say something like PGDUMP_COMPRESS_ZLIB, since there are other compression > > > structs and typedefs. zlib is not idential to gzip, which uses a different > > > header, so in WriteDataToArchive(), LIBZ is correct, and GZIP is incorrect. This comment still applies - zlib's gz* functions are "gzip" but the others are "zlib". https://zlib.net/manual.html That affects both the 0001 and 0002 patches. Actually, I think that "gzip" should not be the name of the user-facing option, since (except for "plain" format) it isn't using gzip. +Robert, since this suggests amending parse_compress_algorithm(). Maybe "zlib" should be parsed the same way as "gzip" - I don't think we ever expose both to a user, but in some cases (basebackup and pg_dump -Fp -Z1) the output is "gzip" and in some cases NO it's zlib (pg_dump -Fc -Z1). > > > The cf* changes in pg_backup_archiver could be split out into a separate > > > commit. It's strictly a code simplification - not just preparation for more > > > compression algorithms. The commit message should "See also: > > > bf9aa490db24b2334b3595ee33653bf2fe39208c". I still think this could be an early, 0000 patch. > > > freebsd/cfbot is failing. This is still failing for bsd, windows and compiler warnings. Windows also has compiler warnings. http://cfbot.cputube.org/georgios-kokolatos.html Please see: src/tools/ci/README, which you can use to run check-world on 4 OS by pushing a branch to github. > > > I suggested off-list to add an 0099 patch to change LZ4 to the default, to > > > exercise it more on CI. What about this ? I think the patch needs to pass CI on all 4 OS with default=zlib and default=lz4. > > On Sat, Mar 26, 2022 at 01:33:36PM -0500, Justin Pryzby wrote: > @@ -254,7 +251,12 @@ CreateArchive(const char *FileSpec, const ArchiveFormat fmt, > Archive * > OpenArchive(const char *FileSpec, const ArchiveFormat fmt) > { > - ArchiveHandle *AH = _allocAH(FileSpec, fmt, 0, true, archModeRead, setupRestoreWorker); > + ArchiveHandle *AH; > + pg_compress_specification compress_spec; Should this be initialized to {0} ? > @@ -969,6 +969,8 @@ NewRestoreOptions(void) > opts->format = archUnknown; > opts->cparams.promptPassword = TRI_DEFAULT; > opts->dumpSections = DUMP_UNSECTIONED; > + opts->compress_spec.algorithm = PG_COMPRESSION_NONE; > + opts->compress_spec.level = INT_MIN; Why INT_MIN ? > @@ -1115,23 +1117,28 @@ PrintTOCSummary(Archive *AHX) > ArchiveHandle *AH = (ArchiveHandle *) AHX; > RestoreOptions *ropt = AH->public.ropt; > TocEntry *te; > + pg_compress_specification out_compress_spec; Should have {0} ? I suggest to write it like my 2020 patch for this, which says: no_compression = {0}; > /* Open stdout with no compression for AH output handle */ > - AH->gzOut = 0; > - AH->OF = stdout; > + out_compress_spec.algorithm = PG_COMPRESSION_NONE; > + AH->OF = cfdopen(dup(fileno(stdout)), PG_BINARY_A, out_compress_spec); Ideally this should check the success of dup(). > @@ -3776,21 +3746,25 @@ ReadHead(ArchiveHandle *AH) > + if (AH->compress_spec.level != INT_MIN) Why is it testing the level and not the algorithm ? > --- a/src/bin/pg_dump/pg_backup_custom.c > +++ b/src/bin/pg_dump/pg_backup_custom.c > @@ -298,7 +298,7 @@ _StartData(ArchiveHandle *AH, TocEntry *te) > _WriteByte(AH, BLK_DATA); /* Block type */ > WriteInt(AH, te->dumpId); /* For sanity check */ > > - ctx->cs = AllocateCompressor(AH->compression, _CustomWriteFunc); > + ctx->cs = AllocateCompressor(AH->compress_spec, _CustomWriteFunc); Is it necessary to rename the data structure ? If not, this file can remain unchanged. > --- a/src/bin/pg_dump/pg_backup_directory.c > +++ b/src/bin/pg_dump/pg_backup_directory.c > @@ -573,6 +574,7 @@ _CloseArchive(ArchiveHandle *AH) > if (AH->mode == archModeWrite) > { > cfp *tocFH; > + pg_compress_specification compress_spec; Should use {0} ? > @@ -639,12 +642,14 @@ static void > _StartBlobs(ArchiveHandle *AH, TocEntry *te) > { > lclContext *ctx = (lclContext *) AH->formatData; > + pg_compress_specification compress_spec; Same > + /* > + * Custom and directory formats are compressed by default (zlib), others > + * not > + */ > + if (user_compression_defined == false) Should be: !user_compression_defined Your 0001+0002 patches (without 0003) fail to compile: pg_backup_directory.c: In function ‘_ReadByte’: pg_backup_directory.c:519:12: error: ‘CompressFileHandle’ {aka ‘struct CompressFileHandle’} has no member named ‘_IO_getc’ 519 | return CFH->getc(CFH); | ^~ pg_backup_directory.c:520:1: warning: control reaches end of non-void function [-Wreturn-type] 520 | } -- Justin
pgsql-hackers by date: