Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id Y8SoUuqbNoMBnFJp@paquier.xyz
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (Justin Pryzby <pryzby@telsasoft.com>)
Responses Re: Add LZ4 compression in pg_dump
List pgsql-hackers
On Sat, Jan 14, 2023 at 03:43:09PM -0600, Justin Pryzby wrote:
> On Sun, Jan 08, 2023 at 01:45:25PM -0600, Justin Pryzby wrote:
>> pg_compress_specification is being passed by value, but I think it
>> should be passed as a pointer, as is done everywhere else.
>
> ISTM that was an issue with 5e73a6048, affecting a few public and
> private functions.  I wrote a pre-preparatory patch which changes to
> pass by reference.

The functions changed by 0001 are cfopen[_write](),
AllocateCompressor() and ReadDataFromArchive().  Why is it a good idea
to change these interfaces which basically exist to handle inputs?  Is
there some benefit in changing compression_spec within the internals
of these routines before going back one layer down to their callers?
Changing the compression_spec on-the-fly in these internal paths could
be risky, actually, no?

> And addressed a handful of other issues I reported as separate fixup
> commits.  And changed to use LZ4 by default for CI.

Are your slight changes shaped as of 0003-f.patch, 0005-f.patch and
0007-f.patch on top of the original patches sent by Georgios?

> I also rebased my 2 year old patch to support zstd in pg_dump.  I hope
> it can finally added for v16.  I'll send it for the next CF if these
> patches progress.

Good idea to see if what you have done for zstd fits with what's
presented here.

>> pg_compress_algorithm is being writen directly into the pg_dump header.

Do you mean that this is what happens once the patch series 0001~0008
sent upthread is applied on HEAD?

>> Currently, I think that's not an externally-visible value (it could be
>> renumbered, theoretically even in a minor release).  Maybe there should
>> be a "private" enum for encoding the pg_dump header, similar to
>> WAL_COMPRESSION_LZ4 vs BKPIMAGE_COMPRESS_LZ4 ?  Or else a comment there
>> should warn that the values are encoded in pg_dump, and must never be
>> changed.
>
> Michael, WDYT ?

Changing the order of the members in an enum would cause an ABI
breakage, so that would not happen, and we tend to be very careful
about that.  Appending new members would be fine, though.  FWIW, I'd
rather avoid adding more enums that would just be exact maps to
pg_compress_algorithm.

-   /*
-    * For now the compression type is implied by the level.  This will need
-    * to change once support for more compression algorithms is added,
-    * requiring a format bump.
-    */
-   WriteInt(AH, AH->compression_spec.level);
+   AH->WriteBytePtr(AH, AH->compression_spec.algorithm);

I may be missing something here, but it seems to me that you ought to
store as well the level in the dump header, or it would not be
possible to report in the dump's description what was used?  Hence,
K_VERS_1_15 should imply that we have both the method compression and
the compression level.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Kyotaro Horiguchi
Date:
Subject: Re: Exit walsender before confirming remote flush in logical replication
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Exit walsender before confirming remote flush in logical replication