Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers

From gkokolatos@pm.me
Subject Re: Teach pg_receivewal to use lz4 compression
Date
Msg-id m9tNOULZ1-ZZwdCVNSfWe1HkASTFeobtXkMeewSY--McJMAe1lIx4nmVSiUh8xU1Ano7lwwdCJTLdjPWxqoRVNep325VH8sLCvjHkH9RvOM=@pm.me
Whole thread Raw
In response to Re: Teach pg_receivewal to use lz4 compression  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:
>
> > please find v2 of the patch which tries to address the commends
> >
> > received so far.
>
> Thanks!
>
> > Michael Paquier wrote:
> >
> > > -   system_or_bail('lz4', '-t', $lz4_wals[0]);
> > >
> > >     I think that you should just drop this part of the test. The only
> > >
> > >     part of LZ4 that we require to be present when Postgres is built with
> > >
> > >     --with-lz4 is its library liblz4. Commands associated to it may not
> > >
> > >     be around, causing this test to fail. The test checking that one .lz4
> > >
> > >     file has been created is good to have. It may be worth adding a test
> > >
> > >     with a .lz4.partial segment generated and --endpos pointing to a LSN
> > >
> > >     that does not finish the segment that gets switched.
> >
> > I humbly disagree with the need for the test. It is rather easily possible
> >
> > to generate a file that can not be decoded, thus becoming useless. Having the
> >
> > test will provide some guarantee for the fact. In the current patch, there
> >
> > is code to find out if the program lz4 is available in the system. If it is
> >
> > not available, then that specific test is skipped. The rest remains as it
> >
> > were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> >
> > so that the test counted and the execution of tests continues upon failure.
>
> Check. I can see what you are using in the new patch. We could live
>
> with that.

Great. Thank you.

>
> > > It seems to me that you are missing some logic in
> > >
> > > FindStreamingStart() to handle LZ4-compressed segments, in relation
> > >
> > > with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
> >
> > Very correct. The logic is now added. Given the lz4 api, one has to fill
> >
> > in the uncompressed size during creation time. Then one can read the
> >
> > headers and verify the expectations.
>
> Yeah, I read that as well with lz4 --list and the kind. That's weird
>
> compared to how ZLIB gives an easy access to it. We may want to do an
>
> effort and tell about more lz4 --content-size/--list, telling that we
>
> add the size in the segment compressed because we have to and LZ4 does
>
> not do it by default?

I am afraid I do not follow. In the patch we do add the uncompressed size
and then, the uncompressed size is checked against a known value WalSegSz.
What the compressed size will be checked against?

>
> > > Should we have more tests for ZLIB, while on it? That seems like a
> > >
> > > good addition as long as we can skip the tests conditionally when
> > >
> > > that's not supported.
> >
> > Agreed. Please allow me to provide a distinct patch for this code.
>
> Thanks. Looking forward to seeing it. That may be better on a
>
> separate thread, even if I digressed in this thread :)

Thank you for the comments. I will sent in a separate thread.

>
> > > I think we can somehow use "acceleration" parameter of lz4 compression
> > >
> > > to map on compression level, It is not direct mapping but
> > >
> > > can't we create some internal mapping instead of completely ignoring
> > >
> > > this option for lz4, or we can provide another option for lz4?
> >
> > We can, though I am not in favour of doing so. There is seemingly
> >
> > little benefit for added complexity.
>
> Agreed.
>
> > > What I think is important for the user when it comes to this
> > >
> > > option is the consistency of its naming across all the tools
> > >
> > > supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
> > >
> > > already use most of the short options you could use for pg_receivewal,
> > >
> > > having only a long one gives a bit more flexibility.
> >
> > Done.
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
>
>         */
>
>         This comment is incomplete.

It is. I will fix.

>
>         +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
>     +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
>     This is getting complicated. Would it be better to change this stuff
>
>     and switch to a routine that checks if a segment has a valid name, is
>
>     partial, and the type of compression that applied to it? It seems to
>
>     me that we should group iszlibcompress and islz4compress together with
>
>     the options available through compression_method.

I agree with you. I will refactor.


> -   if (compresslevel != 0)
> -   {
> -         if (compression_method == COMPRESSION_NONE)
>
>
> -         {
>
>
> -             compression_method = COMPRESSION_ZLIB;
>
>
> -         }
>
>
> -         if (compression_method != COMPRESSION_ZLIB)
>
>
> -         {
>
>
> -             pg_log_error("cannot use --compress when "
>
>
> -                          "--compression-method is not gzip");
>
>
> -             fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> -                     progname);
>
>
> -             exit(1);
>
>
> -         }
>
>
> -   }
>
>     For compatibility where --compress enforces the use of zlib that would
>
>     work, but this needs a comment explaining the goal of this block. I
>
>     am wondering if it would be better to break the will and just complain
>
>     when specifying --compress without --compression-method though. That
>
>     would cause compatibility issues, but this would make folks aware of
>
>     the presence of LZ4, which does not sound bad to me either as ZLIB is
>
>     slower than LZ4 on all fronts.


I would vote to break the compatibility if that is an option. I chose the
less invasive approach thinking that breaking the compatibility would not
be an option.

Unless others object, I will include --compress as a complimentary option
to --compression-method in updated version of the patch.


> -   else if (compression_method == COMPRESSION_ZLIB)
> -   {
> -         pg_log_error("cannot use --compression-method gzip when "
>
>
> -                      "--compression is 0");
>
>
> -         fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> -                 progname);
>
>
> -         exit(1);
>
>
> -   }
>
>     Hmm. It would be more natural to enforce a default compression level
>
>     in this case? The user is asking for a compression with zlib here.


You are correct, in the current patch passing --compression-method=gzip alone
is equivalent to passing --compression=0 in the current master version. This
behaviour may be confusing for the user. What should the default compression
be then? I am inclined to say '5' as a compromise between speed and compression
ration.


> -   my $lz4 = $ENV{LZ4};
>
>     [...]
> -   Verify that the stored file is readable if program lz4 is available
>     ===================================================================
>
> -   skip "program lz4 is not found in your system", 1
> -       if (!defined $lz4 || $lz4 eq '');
>
>
>
> Okay, this is acceptable. Didn't know the existing trick with TAR
>
> either.

Thank you.

>
> -         /*
>
>
> -          * XXX: this is crap... lz4preferences now does show the uncompressed
>
>
> -          * size via lz4 --list <filename> but the compression goes down the
>
>
> -          * window... also it is not very helpfull to have it at the startm, does
>
>
> -          * it?
>
>
> -          */
>
>
>
> What do you mean here by "the compression goes out the window"?

Please consider me adequately embarrassed. This was a personal comment while I was
working on the code. It is not correct and it should have never seen the public
light.


Cheers,
//Georgios


> ---------------------------------------------------------------
>
> Michael



pgsql-hackers by date:

Previous
From: Peter Smith
Date:
Subject: psql tab auto-complete for CREATE PUBLICATION
Next
From: Kyotaro Horiguchi
Date:
Subject: Re: Incorrect usage of strtol, atoi for non-numeric junk inputs