Re: Add LZ4 compression in pg_dump - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Add LZ4 compression in pg_dump
Date
Msg-id YkPwmndklVKG4PE+@paquier.xyz
Whole thread Raw
In response to Re: Add LZ4 compression in pg_dump  (gkokolatos@pm.me)
List pgsql-hackers
On Tue, Mar 29, 2022 at 09:46:27AM +0000, gkokolatos@pm.me wrote:
> On Tuesday, March 29th, 2022 at 9:27 AM, Michael Paquier <michael@paquier.xyz> wrote:
>> On Sat, Mar 26, 2022 at 01:14:41AM -0500, Justin Pryzby wrote:
>> Wow. This stuff is old enough to vote (c3e18804), dead since its
>> introduction. There is indeed an argument for removing that, it is
>> not good to keep around that that has never been stressed and/or
>> used. Upon review, the cleanup done looks correct, as we have never
>> been able to generate .dat.gz files in for a dump in the tar format.
>
> Correct. My driving force behind it was to ease up the cleanup/refactoring
> work that follows, by eliminating the callers of the GZ*() macros.

Makes sense to me.

>> + command_fails_like(
>>
>> + [ 'pg_dump', '--compress', '1', '--format', 'tar' ],
>> This addition depending on HAVE_LIBZ is a good thing as a reminder of
>> any work that could be done in 0002. Now that's waiting for 20 years
>> so I would not hold my breath on this support. I think that this
>> could be just applied first, with 0002 on top of it, as a first
>> improvement.
>
> Excellent, thank you.

I have applied the test for --compress and --format=tar, separating it
from the rest.

While moving on with 0002, I have noticed the following in
_StartBlob():
    if (AH->compression != 0)
        sfx = ".gz";
    else
        sfx = "";

Shouldn't this bit also be simplified, adding a fatal() like the other
code paths, for safety?

>> + compress_cmd => [
>> + $ENV{'GZIP_PROGRAM'},
>> + '-f',
>> [...]
>> + $ENV{'GZIP_PROGRAM'},
>> + '-k', '-d',
>> -f and -d are available everywhere I looked at, but is -k/--keep a
>> portable choice with a gzip command? I don't see this option in
>> OpenBSD, for one. So this test is going to cause problems on those
>> buildfarm machines, at least. Couldn't this part be replaced by a
>> simple --test to check that what has been compressed is in correct
>> shape? We know that this works, based on our recent experiences with
>> the other tests.
>
> I would argue that the simple '--test' will not do in this case, as the
> TAP tests do need a file named <test>.sql to compare the contents with.
> This file is generated either directly by pg_dump itself, or by running
> pg_restore on pg_dump's output. In the case of compression pg_dump will
> generate a <test>.sql.<compression program suffix> which can not be
> used in the comparison tests. So the intention of this block, is not to
> simply test for validity, but to also decompress pg_dump's output for it
> to be able to be used.

Ahh, I see, thanks.  I would add a comment about that in the area of
compression_gzip_plain_format.

+   my $supports_compression = check_pg_config("#define HAVE_LIBZ 1");

This part could be moved within the if block a couple of lines down.

+   my $compress_program = $ENV{GZIP_PROGRAM};

It seems to me that it is enough to rely on {compress_cmd}, hence
there should be no need for $compress_program, no?

It seems to me that we should have a description for compress_cmd at
the top of 002_pg_dump.pl (close to "Definition of the pg_dump runs to
make").  There is an order dependency with restore_cmd.

> I updated the patch to simply remove the '-k' flag.

Okay.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Identify missing publications from publisher while create/alter subscription.
Next
From: Michael Paquier
Date:
Subject: Re: Extend compatibility of PostgreSQL::Test::Cluster