Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Teach pg_receivewal to use lz4 compression
Date
Msg-id YOe5LouYuxXOrL0H@paquier.xyz
Whole thread Raw
In response to Re: Teach pg_receivewal to use lz4 compression  (gkokolatos@pm.me)
Responses Re: Teach pg_receivewal to use lz4 compression  (gkokolatos@pm.me)
Re: Teach pg_receivewal to use lz4 compression  (gkokolatos@pm.me)
List pgsql-hackers
On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:
> please find v2 of the patch which tries to address the commends
> received so far.

Thanks!

> Michael Paquier wrote:
>> +   system_or_bail('lz4', '-t', $lz4_wals[0]);
>> I think that you should just drop this part of the test.  The only
>> part of LZ4 that we require to be present when Postgres is built with
>> --with-lz4 is its library liblz4.  Commands associated to it may not
>> be around, causing this test to fail.  The test checking that one .lz4
>> file has been created is good to have.  It may be worth adding a test
>> with a .lz4.partial segment generated and --endpos pointing to a LSN
>> that does not finish the segment that gets switched.
>
> I humbly disagree with the need for the test. It is rather easily possible
> to generate a file that can not be decoded, thus becoming useless. Having the
> test will provide some guarantee for the fact. In the current patch, there
> is code to find out if the program lz4 is available in the system. If it is
> not available, then that specific test is skipped. The rest remains as it
> were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> so that the test counted and the execution of tests continues upon failure.

Check.  I can see what you are using in the new patch.  We could live
with that.

>> It seems to me that you are missing some logic in
>> FindStreamingStart() to handle LZ4-compressed segments, in relation
>> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
>
> Very correct. The logic is now added. Given the lz4 api, one has to fill
> in the uncompressed size during creation time. Then one can read the
> headers and verify the expectations.

Yeah, I read that as well with lz4 --list and the kind.  That's weird
compared to how ZLIB gives an easy access to it.  We may want to do an
effort and tell about more lz4 --content-size/--list, telling that we
add the size in the segment compressed because we have to and LZ4 does
not do it by default?

>> Should we have more tests for ZLIB, while on it?  That seems like a
>> good addition as long as we can skip the tests conditionally when
>> that's not supported.
>
> Agreed. Please allow me to provide a distinct patch for this code.

Thanks.  Looking forward to seeing it.  That may be better on a
separate thread, even if I digressed in this thread :)

>> I think we can somehow use "acceleration" parameter of lz4 compression
>> to map on compression level, It is not direct mapping but
>> can't we create some internal mapping instead of completely ignoring
>> this option for lz4, or we can provide another option for lz4?
>
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

Agreed.

>> What I think is important for the user when it comes to this
>> option is the consistency of its naming across all the tools
>> supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
>> already use most of the short options you could use for pg_receivewal,
>> having only a long one gives a bit more flexibility.
>
> Done.

     * http://www.zlib.org/rfc-gzip.html.
+    * For lz4 compressed segments
     */
This comment is incomplete.

+#define IsLZ4CompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated.  Would it be better to change this stuff
and switch to a routine that checks if a segment has a valid name, is
partial, and the type of compression that applied to it?  It seems to
me that we should group iszlibcompress and islz4compress together with
the options available through compression_method.

+   if (compresslevel != 0)
+   {
+       if (compression_method == COMPRESSION_NONE)
+       {
+           compression_method = COMPRESSION_ZLIB;
+       }
+       if (compression_method != COMPRESSION_ZLIB)
+       {
+           pg_log_error("cannot use --compress when "
+                        "--compression-method is not gzip");
+           fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+                   progname);
+           exit(1);
+       }
+   }
For compatibility where --compress enforces the use of zlib that would
work, but this needs a comment explaining the goal of this block.  I
am wondering if it would be better to break the will and just complain
when specifying --compress without --compression-method though.  That
would cause compatibility issues, but this would make folks aware of
the presence of LZ4, which does not sound bad to me either as ZLIB is
slower than LZ4 on all fronts.

+   else if (compression_method == COMPRESSION_ZLIB)
+   {
+       pg_log_error("cannot use --compression-method gzip when "
+                    "--compression is 0");
+       fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+               progname);
+       exit(1);
+   }
Hmm.  It would be more natural to enforce a default compression level
in this case?  The user is asking for a compression with zlib here.

+   my $lz4 = $ENV{LZ4};
[...]
+   # Verify that the stored file is readable if program lz4 is available
+   skip "program lz4 is not found in your system", 1
+     if (!defined $lz4 || $lz4 eq '');
Okay, this is acceptable.  Didn't know the existing trick with TAR
either.

+       /*
+        * XXX: this is crap... lz4preferences now does show the uncompressed
+        * size via lz4 --list <filename> but the compression goes down the
+        * window... also it is not very helpfull to have it at the startm, does
+        * it?
+        */
What do you mean here by "the compression goes out the window"?
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Bharath Rupireddy
Date:
Subject: Re: Inaccurate error message when set fdw batch_size to 0
Next
From: Japin Li
Date:
Subject: Re: Why ALTER SUBSCRIPTION ... SET (slot_name='none') requires subscription disabled?