Thread: Teach pg_receivewal to use lz4 compression

Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

29 June 2021, 17:45:17

Hi,

The program pg_receivewal can use gzip compression to store the received WAL.
This patch teaches it to be able to use lz4 compression if the binary is build
using the -llz4 flag.

Previously, the user had to use the option --compress with a value between [0-9]
to denote that gzip compression was requested. This specific behaviour is
maintained. A newly introduced option --compress-program=lz4 can be used to ask
for the logs to be compressed using lz4 instead. In that case, no compression
values can be selected as it does not seem too useful.

Under the hood there is nothing exceptional to be noted. Tar based archives have
not yet been taught to use lz4 compression. Those are used by pg_basebackup. If
is is felt useful, then it is easy to be added in a new patch.

Cheers,
//Georgios

Attachment

v1-0001-Teach-pg_receivewal-to-use-lz4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

30 June 2021, 06:38:03

On Tue, Jun 29, 2021 at 02:45:17PM +0000, gkokolatos@pm.me wrote:
> The program pg_receivewal can use gzip compression to store the received WAL.
> This patch teaches it to be able to use lz4 compression if the binary is build
> using the -llz4 flag.

Nice.

> Previously, the user had to use the option --compress with a value between [0-9]
> to denote that gzip compression was requested. This specific behaviour is
> maintained. A newly introduced option --compress-program=lz4 can be used to ask
> for the logs to be compressed using lz4 instead. In that case, no compression
> values can be selected as it does not seem too useful.

Yes, I am not convinced either that we should care about making the
acceleration customizable.

> Under the hood there is nothing exceptional to be noted. Tar based archives have
> not yet been taught to use lz4 compression. Those are used by pg_basebackup. If
> is is felt useful, then it is easy to be added in a new patch.

Documentation is missing from the patch.

+   LZ4F_compressionContext_t ctx;
+   size_t      outbufCapacity;
+   void       *outbuf;
It may be cleaner to refer to lz4 in the name of those variables?

+       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
+       outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default preferences */);
Interesting.  So this cannot be done at compilation time because of
the auto-flush mode looking at the LZ4 code.  That looks about right.

getopt_long() is forgotting the new option 'I'.

+   system_or_bail('lz4', '-t', $lz4_wals[0]);
I think that you should just drop this part of the test.  The only
part of LZ4 that we require to be present when Postgres is built with
--with-lz4 is its library liblz4.  Commands associated to it may not
be around, causing this test to fail.  The test checking that one .lz4
file has been created is good to have.  It may be worth adding a test
with a .lz4.partial segment generated and --endpos pointing to a LSN
that does not finish the segment that gets switched.

It seems to me that you are missing some logic in
FindStreamingStart() to handle LZ4-compressed segments, in relation
with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

+   pg_log_error("invalid compress-program \"%s\"", optarg);
"compress-program" sounds weird.  Shouldn't that just say "invalid
compression method" or similar?

+   printf(_("  -Z, --compress=0-9     compress logs with given
compression level (available only with compress-program=zlib)\n"));
This line is too long.

Should we have more tests for ZLIB, while on it?  That seems like a
good addition as long as we can skip the tests conditionally when
that's not supported.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

Dilip Kumar

Date:

30 June 2021, 09:33:55

On Tue, Jun 29, 2021 at 8:15 PM <gkokolatos@pm.me> wrote:
>
> Hi,
>
> The program pg_receivewal can use gzip compression to store the received WAL.
> This patch teaches it to be able to use lz4 compression if the binary is build
> using the -llz4 flag.

+1 for the idea

Some comments/suggestions on the patch

1.
@@ -90,7 +91,8 @@ usage(void)
  printf(_("      --synchronous      flush write-ahead log immediately
after writing\n"));
  printf(_("  -v, --verbose          output verbose messages\n"));
  printf(_("  -V, --version          output version information, then exit\n"));
- printf(_("  -Z, --compress=0-9     compress logs with given
compression level\n"));
+ printf(_("  -I, --compress-program use this program for compression\n"));

Wouldn't it be better to call it compression method instead of
compression program?

2.
+ printf(_("  -Z, --compress=0-9     compress logs with given
compression level (available only with compress-program=zlib)\n"));

I think we can somehow use "acceleration" parameter of lz4 compression
to map on compression level, It is not direct mapping but
can't we create some internal mapping instead of completely ignoring
this option for lz4, or we can provide another option for lz4?

3. Should we also support LZ4 compression using dictionary?

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Magnus Hagander

Date:

01 July 2021, 13:28:08

On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar <dilipbalaut@gmail.com> wrote:
>
> On Tue, Jun 29, 2021 at 8:15 PM <gkokolatos@pm.me> wrote:
> >
> > Hi,
> >
> > The program pg_receivewal can use gzip compression to store the received WAL.
> > This patch teaches it to be able to use lz4 compression if the binary is build
> > using the -llz4 flag.
>
> +1 for the idea
>
> Some comments/suggestions on the patch
>
> 1.
> @@ -90,7 +91,8 @@ usage(void)
>   printf(_("      --synchronous      flush write-ahead log immediately
> after writing\n"));
>   printf(_("  -v, --verbose          output verbose messages\n"));
>   printf(_("  -V, --version          output version information, then exit\n"));
> - printf(_("  -Z, --compress=0-9     compress logs with given
> compression level\n"));
> + printf(_("  -I, --compress-program use this program for compression\n"));
>
> Wouldn't it be better to call it compression method instead of
> compression program?

I came here to say exactly that, just had to think up what I thought
was the better name first. Either method or algorithm, but method
seems like the much simpler choice and therefore better in this case.

Should is also then not be --compression-method, rather than --compress-method?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

01 July 2021, 16:39:29


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 12:28, Magnus Hagander <magnus@hagander.net> wrote:

> On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:
>
> > On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:
> >
> > > Hi,
> > >
> > > The program pg_receivewal can use gzip compression to store the received WAL.
> > >
> > > This patch teaches it to be able to use lz4 compression if the binary is build
> > >
> > > using the -llz4 flag.
> >
> > +1 for the idea
> >
> > Some comments/suggestions on the patch
> >
> > @@ -90,7 +91,8 @@ usage(void)
> >
> > printf((" --synchronous flush write-ahead log immediately
> >
> > after writing\n"));
> >
> > printf((" -v, --verbose output verbose messages\n"));
> >
> > printf(_(" -V, --version output version information, then exit\n"));
> >
> > -   printf(_(" -Z, --compress=0-9 compress logs with given
> >
> >     compression level\n"));
> >
> > -   printf(_(" -I, --compress-program use this program for compression\n"));
> >
> > Wouldn't it be better to call it compression method instead of
> >
> > compression program?
>
> I came here to say exactly that, just had to think up what I thought
>
> was the better name first. Either method or algorithm, but method
>
> seems like the much simpler choice and therefore better in this case.
>
> Should is also then not be --compression-method, rather than --compress-method?

Not a problem. To be very transparent, I first looked what was already out there.
For example `tar` is using
    -I, --use-compress-program=PROG
yet the 'use-' bit would push the alignment of the --help output, so I removed it.

To me, as a non native English speaker, `--compression-method` does sound better.
I can just re-align the rest of the help output.

Updated patch is on the making.

>
>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Magnus Hagander
>
> Me: https://www.hagander.net/
>
> Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

Magnus Hagander

Date:

01 July 2021, 16:58:24

On Thu, Jul 1, 2021 at 3:39 PM <gkokolatos@pm.me> wrote:
>
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Thursday, July 1st, 2021 at 12:28, Magnus Hagander <magnus@hagander.net> wrote:
>
> > On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:
> >
> > > On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:
> > >
> > > > Hi,
> > > >
> > > > The program pg_receivewal can use gzip compression to store the received WAL.
> > > >
> > > > This patch teaches it to be able to use lz4 compression if the binary is build
> > > >
> > > > using the -llz4 flag.
> > >
> > > +1 for the idea
> > >
> > > Some comments/suggestions on the patch
> > >
> > > @@ -90,7 +91,8 @@ usage(void)
> > >
> > > printf((" --synchronous flush write-ahead log immediately
> > >
> > > after writing\n"));
> > >
> > > printf((" -v, --verbose output verbose messages\n"));
> > >
> > > printf(_(" -V, --version output version information, then exit\n"));
> > >
> > > -   printf(_(" -Z, --compress=0-9 compress logs with given
> > >
> > >     compression level\n"));
> > >
> > > -   printf(_(" -I, --compress-program use this program for compression\n"));
> > >
> > > Wouldn't it be better to call it compression method instead of
> > >
> > > compression program?
> >
> > I came here to say exactly that, just had to think up what I thought
> >
> > was the better name first. Either method or algorithm, but method
> >
> > seems like the much simpler choice and therefore better in this case.
> >
> > Should is also then not be --compression-method, rather than --compress-method?
>
> Not a problem. To be very transparent, I first looked what was already out there.
> For example `tar` is using
>     -I, --use-compress-program=PROG
> yet the 'use-' bit would push the alignment of the --help output, so I removed it.

I think the difference there is that tar actually calls an external
program to do the work... And we are using the built-in library,
right?

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

01 July 2021, 17:10:17


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, July 1st, 2021 at 15:58, Magnus Hagander <magnus@hagander.net> wrote:

> On Thu, Jul 1, 2021 at 3:39 PM gkokolatos@pm.me wrote:
>
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >
> > On Thursday, July 1st, 2021 at 12:28, Magnus Hagander magnus@hagander.net wrote:
> >
> > > On Wed, Jun 30, 2021 at 8:34 AM Dilip Kumar dilipbalaut@gmail.com wrote:
> > >
> > > > On Tue, Jun 29, 2021 at 8:15 PM gkokolatos@pm.me wrote:
> > > >
> > > > > Hi,
> > > > >
> > > > > The program pg_receivewal can use gzip compression to store the received WAL.
> > > > >
> > > > > This patch teaches it to be able to use lz4 compression if the binary is build
> > > > >
> > > > > using the -llz4 flag.
> > > >
> > > > +1 for the idea
> > > >
> > > > Some comments/suggestions on the patch
> > > >
> > > > @@ -90,7 +91,8 @@ usage(void)
> > > >
> > > > printf((" --synchronous flush write-ahead log immediately
> > > >
> > > > after writing\n"));
> > > >
> > > > printf((" -v, --verbose output verbose messages\n"));
> > > >
> > > > printf(_(" -V, --version output version information, then exit\n"));
> > > >
> > > > -   printf(_(" -Z, --compress=0-9 compress logs with given
> > > >
> > > >     compression level\n"));
> > > >
> > > > -   printf(_(" -I, --compress-program use this program for compression\n"));
> > > >
> > > >
> > > > Wouldn't it be better to call it compression method instead of
> > > >
> > > > compression program?
> > >
> > > I came here to say exactly that, just had to think up what I thought
> > >
> > > was the better name first. Either method or algorithm, but method
> > >
> > > seems like the much simpler choice and therefore better in this case.
> > >
> > > Should is also then not be --compression-method, rather than --compress-method?
> >
> > Not a problem. To be very transparent, I first looked what was already out there.
> >
> > For example `tar` is using
> >
> > -I, --use-compress-program=PROG
> >
> > yet the 'use-' bit would push the alignment of the --help output, so I removed it.
>
> I think the difference there is that tar actually calls an external
>
> program to do the work... And we are using the built-in library,
>
> right?

You are very correct :) I am not objecting the change at all. Just let you know
how I chose that. You know, naming is dead easy and all...

On a more serious note, what about the `-I` short flag? Should we keep it or
is there a better one to be used?

Micheal suggested on the same thread to move my entry in the help output so that
the output remains ordered. I would like the options for the compression method and
the already existing compression level to next to each other if possible. Then it
should be either 'X' or 'Y'.

Thoughts?



>
>
------------------------------------------------------------------------------------------------------------------------------------------------
>
> Magnus Hagander
>
> Me: https://www.hagander.net/
>
> Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

02 July 2021, 04:10:16

On Thu, Jul 01, 2021 at 02:10:17PM +0000, gkokolatos@pm.me wrote:
> Micheal suggested on the same thread to move my entry in the help output so that
> the output remains ordered. I would like the options for the compression method and
> the already existing compression level to next to each other if possible. Then it
> should be either 'X' or 'Y'.

Hmm.  Grouping these together makes sense for the user.  One choice
that we have here is to drop the short option, and just use a long
one.  What I think is important for the user when it comes to this
option is the consistency of its naming across all the tools
supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
already use most of the short options you could use for pg_receivewal,
having only a long one gives a bit more flexibility.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

02 July 2021, 10:35:00


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 2nd, 2021 at 03:10, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Jul 01, 2021 at 02:10:17PM +0000, gkokolatos@pm.me wrote:
>
> > Micheal suggested on the same thread to move my entry in the help output so that
> >
> > the output remains ordered. I would like the options for the compression method and
> >
> > the already existing compression level to next to each other if possible. Then it
> >
> > should be either 'X' or 'Y'.
>
> Hmm. Grouping these together makes sense for the user. One choice
>
> that we have here is to drop the short option, and just use a long
>
> one. What I think is important for the user when it comes to this
>
> option is the consistency of its naming across all the tools
>
> supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
>
> already use most of the short options you could use for pg_receivewal,
>
> having only a long one gives a bit more flexibility.


Good point. I am going with that one.


>
--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

08 July 2021, 17:18:40

Hi,

please find v2 of the patch which tries to address the commends received so far.

Thank you all for your comments.

Michael Paquier wrote:

> Documentation is missing from the patch.
>
It has now been added.

> +   LZ4F_compressionContext_t ctx;
> +   size_t      outbufCapacity;
> +   void       *outbuf;
> It may be cleaner to refer to lz4 in the name of those variables?

Agreed and done

> +       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
> +       outbufCapacity = LZ4F_compressBound(LZ4_IN_SIZE, NULL /* default preferences */);
> Interesting.  So this cannot be done at compilation time because of
> the auto-flush mode looking at the LZ4 code.  That looks about right.

This is also my understanding.

> +   system_or_bail('lz4', '-t', $lz4_wals[0]);
> I think that you should just drop this part of the test.  The only
> part of LZ4 that we require to be present when Postgres is built with
> --with-lz4 is its library liblz4.  Commands associated to it may not
> be around, causing this test to fail.  The test checking that one .lz4
> file has been created is good to have.  It may be worth adding a test
> with a .lz4.partial segment generated and --endpos pointing to a LSN
> that does not finish the segment that gets switched.

I humbly disagree with the need for the test. It is rather easily possible
to generate a file that can not be decoded, thus becoming useless. Having the
test will provide some guarantee for the fact. In the current patch, there
is code to find out if the program lz4 is available in the system. If it is
not available, then that specific test is skipped. The rest remains as it
were. Also `system_or_bail` is not used anymore in favour of the `system_log`
so that the test counted and the execution of tests continues upon failure.

> It seems to me that you are missing some logic in
> FindStreamingStart() to handle LZ4-compressed segments, in relation
> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().

Very correct. The logic is now added. Given the lz4 api, one has to fill
in the uncompressed size during creation time. Then one can read the
headers and verify the expectations.

> Should we have more tests for ZLIB, while on it?  That seems like a
> good addition as long as we can skip the tests conditionally when
> that's not supported.

Agreed. Please allow me to provide a distinct patch for this code.

Dilip Kumar wrote:

> Wouldn't it be better to call it compression method instead of
> compression program?

Agreed. This is inline with the suggestions of other reviewers.
Find the change in the attached patch.

> I think we can somehow use "acceleration" parameter of lz4 compression
> to map on compression level, It is not direct mapping but
> can't we create some internal mapping instead of completely ignoring
> this option for lz4, or we can provide another option for lz4?

We can, though I am not in favour of doing so. There is seemingly
little benefit for added complexity.

> Should we also support LZ4 compression using dictionary?

I would we should not do that. If my understanding is correct,
decompression would require the dictionary to be passed along.
The algorithm seems to be very competitive to the current compression
as is.

Magnus Hagander wrote:

> I came here to say exactly that, just had to think up what I thought
> was the better name first. Either method or algorithm, but method
> seems like the much simpler choice and therefore better in this case.
>
> Should is also then not be --compression-method, rather than --compress-method?

Agreed and changed throughout.

Michael Paquier wrote:

> What I think is important for the user when it comes to this
> option is the consistency of its naming across all the tools
> supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
> already use most of the short options you could use for pg_receivewal,
> having only a long one gives a bit more flexibility.

Done.

Cheers,
//Georgios

Attachment

v2-0001-Teach-pg_receivewal-to-use-lz4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

09 July 2021, 05:49:18

On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:
> please find v2 of the patch which tries to address the commends
> received so far.

Thanks!

> Michael Paquier wrote:
>> +   system_or_bail('lz4', '-t', $lz4_wals[0]);
>> I think that you should just drop this part of the test.  The only
>> part of LZ4 that we require to be present when Postgres is built with
>> --with-lz4 is its library liblz4.  Commands associated to it may not
>> be around, causing this test to fail.  The test checking that one .lz4
>> file has been created is good to have.  It may be worth adding a test
>> with a .lz4.partial segment generated and --endpos pointing to a LSN
>> that does not finish the segment that gets switched.
>
> I humbly disagree with the need for the test. It is rather easily possible
> to generate a file that can not be decoded, thus becoming useless. Having the
> test will provide some guarantee for the fact. In the current patch, there
> is code to find out if the program lz4 is available in the system. If it is
> not available, then that specific test is skipped. The rest remains as it
> were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> so that the test counted and the execution of tests continues upon failure.

Check.  I can see what you are using in the new patch.  We could live
with that.

>> It seems to me that you are missing some logic in
>> FindStreamingStart() to handle LZ4-compressed segments, in relation
>> with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
>
> Very correct. The logic is now added. Given the lz4 api, one has to fill
> in the uncompressed size during creation time. Then one can read the
> headers and verify the expectations.

Yeah, I read that as well with lz4 --list and the kind.  That's weird
compared to how ZLIB gives an easy access to it.  We may want to do an
effort and tell about more lz4 --content-size/--list, telling that we
add the size in the segment compressed because we have to and LZ4 does
not do it by default?

>> Should we have more tests for ZLIB, while on it?  That seems like a
>> good addition as long as we can skip the tests conditionally when
>> that's not supported.
>
> Agreed. Please allow me to provide a distinct patch for this code.

Thanks.  Looking forward to seeing it.  That may be better on a
separate thread, even if I digressed in this thread :)

>> I think we can somehow use "acceleration" parameter of lz4 compression
>> to map on compression level, It is not direct mapping but
>> can't we create some internal mapping instead of completely ignoring
>> this option for lz4, or we can provide another option for lz4?
>
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

Agreed.

>> What I think is important for the user when it comes to this
>> option is the consistency of its naming across all the tools
>> supporting it.  pg_dump and pg_basebackup, where we could plug LZ4,
>> already use most of the short options you could use for pg_receivewal,
>> having only a long one gives a bit more flexibility.
>
> Done.

     * http://www.zlib.org/rfc-gzip.html.
+    * For lz4 compressed segments
     */
This comment is incomplete.

+#define IsLZ4CompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
+#define IsLZ4PartialCompressXLogFileName(fname)    \
+   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
+    strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN &&     \
+    strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
This is getting complicated.  Would it be better to change this stuff
and switch to a routine that checks if a segment has a valid name, is
partial, and the type of compression that applied to it?  It seems to
me that we should group iszlibcompress and islz4compress together with
the options available through compression_method.

+   if (compresslevel != 0)
+   {
+       if (compression_method == COMPRESSION_NONE)
+       {
+           compression_method = COMPRESSION_ZLIB;
+       }
+       if (compression_method != COMPRESSION_ZLIB)
+       {
+           pg_log_error("cannot use --compress when "
+                        "--compression-method is not gzip");
+           fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+                   progname);
+           exit(1);
+       }
+   }
For compatibility where --compress enforces the use of zlib that would
work, but this needs a comment explaining the goal of this block.  I
am wondering if it would be better to break the will and just complain
when specifying --compress without --compression-method though.  That
would cause compatibility issues, but this would make folks aware of
the presence of LZ4, which does not sound bad to me either as ZLIB is
slower than LZ4 on all fronts.

+   else if (compression_method == COMPRESSION_ZLIB)
+   {
+       pg_log_error("cannot use --compression-method gzip when "
+                    "--compression is 0");
+       fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
+               progname);
+       exit(1);
+   }
Hmm.  It would be more natural to enforce a default compression level
in this case?  The user is asking for a compression with zlib here.

+   my $lz4 = $ENV{LZ4};
[...]
+   # Verify that the stored file is readable if program lz4 is available
+   skip "program lz4 is not found in your system", 1
+     if (!defined $lz4 || $lz4 eq '');
Okay, this is acceptable.  Didn't know the existing trick with TAR
either.

+       /*
+        * XXX: this is crap... lz4preferences now does show the uncompressed
+        * size via lz4 --list <filename> but the compression goes down the
+        * window... also it is not very helpfull to have it at the startm, does
+        * it?
+        */
What do you mean here by "the compression goes out the window"?
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

09 July 2021, 10:47:47


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote:
>
> > please find v2 of the patch which tries to address the commends
> >
> > received so far.
>
> Thanks!
>
> > Michael Paquier wrote:
> >
> > > -   system_or_bail('lz4', '-t', $lz4_wals[0]);
> > >
> > >     I think that you should just drop this part of the test. The only
> > >
> > >     part of LZ4 that we require to be present when Postgres is built with
> > >
> > >     --with-lz4 is its library liblz4. Commands associated to it may not
> > >
> > >     be around, causing this test to fail. The test checking that one .lz4
> > >
> > >     file has been created is good to have. It may be worth adding a test
> > >
> > >     with a .lz4.partial segment generated and --endpos pointing to a LSN
> > >
> > >     that does not finish the segment that gets switched.
> >
> > I humbly disagree with the need for the test. It is rather easily possible
> >
> > to generate a file that can not be decoded, thus becoming useless. Having the
> >
> > test will provide some guarantee for the fact. In the current patch, there
> >
> > is code to find out if the program lz4 is available in the system. If it is
> >
> > not available, then that specific test is skipped. The rest remains as it
> >
> > were. Also `system_or_bail` is not used anymore in favour of the `system_log`
> >
> > so that the test counted and the execution of tests continues upon failure.
>
> Check. I can see what you are using in the new patch. We could live
>
> with that.

Great. Thank you.

>
> > > It seems to me that you are missing some logic in
> > >
> > > FindStreamingStart() to handle LZ4-compressed segments, in relation
> > >
> > > with IsCompressXLogFileName() and IsPartialCompressXLogFileName().
> >
> > Very correct. The logic is now added. Given the lz4 api, one has to fill
> >
> > in the uncompressed size during creation time. Then one can read the
> >
> > headers and verify the expectations.
>
> Yeah, I read that as well with lz4 --list and the kind. That's weird
>
> compared to how ZLIB gives an easy access to it. We may want to do an
>
> effort and tell about more lz4 --content-size/--list, telling that we
>
> add the size in the segment compressed because we have to and LZ4 does
>
> not do it by default?

I am afraid I do not follow. In the patch we do add the uncompressed size
and then, the uncompressed size is checked against a known value WalSegSz.
What the compressed size will be checked against?

>
> > > Should we have more tests for ZLIB, while on it? That seems like a
> > >
> > > good addition as long as we can skip the tests conditionally when
> > >
> > > that's not supported.
> >
> > Agreed. Please allow me to provide a distinct patch for this code.
>
> Thanks. Looking forward to seeing it. That may be better on a
>
> separate thread, even if I digressed in this thread :)

Thank you for the comments. I will sent in a separate thread.

>
> > > I think we can somehow use "acceleration" parameter of lz4 compression
> > >
> > > to map on compression level, It is not direct mapping but
> > >
> > > can't we create some internal mapping instead of completely ignoring
> > >
> > > this option for lz4, or we can provide another option for lz4?
> >
> > We can, though I am not in favour of doing so. There is seemingly
> >
> > little benefit for added complexity.
>
> Agreed.
>
> > > What I think is important for the user when it comes to this
> > >
> > > option is the consistency of its naming across all the tools
> > >
> > > supporting it. pg_dump and pg_basebackup, where we could plug LZ4,
> > >
> > > already use most of the short options you could use for pg_receivewal,
> > >
> > > having only a long one gives a bit more flexibility.
> >
> > Done.
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
>
>         */
>
>         This comment is incomplete.

It is. I will fix.

>
>         +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
>     +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
>     This is getting complicated. Would it be better to change this stuff
>
>     and switch to a routine that checks if a segment has a valid name, is
>
>     partial, and the type of compression that applied to it? It seems to
>
>     me that we should group iszlibcompress and islz4compress together with
>
>     the options available through compression_method.

I agree with you. I will refactor.


> -   if (compresslevel != 0)
> -   {
> -         if (compression_method == COMPRESSION_NONE)
>
>
> -         {
>
>
> -             compression_method = COMPRESSION_ZLIB;
>
>
> -         }
>
>
> -         if (compression_method != COMPRESSION_ZLIB)
>
>
> -         {
>
>
> -             pg_log_error("cannot use --compress when "
>
>
> -                          "--compression-method is not gzip");
>
>
> -             fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> -                     progname);
>
>
> -             exit(1);
>
>
> -         }
>
>
> -   }
>
>     For compatibility where --compress enforces the use of zlib that would
>
>     work, but this needs a comment explaining the goal of this block. I
>
>     am wondering if it would be better to break the will and just complain
>
>     when specifying --compress without --compression-method though. That
>
>     would cause compatibility issues, but this would make folks aware of
>
>     the presence of LZ4, which does not sound bad to me either as ZLIB is
>
>     slower than LZ4 on all fronts.


I would vote to break the compatibility if that is an option. I chose the
less invasive approach thinking that breaking the compatibility would not
be an option.

Unless others object, I will include --compress as a complimentary option
to --compression-method in updated version of the patch.


> -   else if (compression_method == COMPRESSION_ZLIB)
> -   {
> -         pg_log_error("cannot use --compression-method gzip when "
>
>
> -                      "--compression is 0");
>
>
> -         fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
>
>
> -                 progname);
>
>
> -         exit(1);
>
>
> -   }
>
>     Hmm. It would be more natural to enforce a default compression level
>
>     in this case? The user is asking for a compression with zlib here.


You are correct, in the current patch passing --compression-method=gzip alone
is equivalent to passing --compression=0 in the current master version. This
behaviour may be confusing for the user. What should the default compression
be then? I am inclined to say '5' as a compromise between speed and compression
ration.


> -   my $lz4 = $ENV{LZ4};
>
>     [...]
> -   Verify that the stored file is readable if program lz4 is available
>     ===================================================================
>
> -   skip "program lz4 is not found in your system", 1
> -       if (!defined $lz4 || $lz4 eq '');
>
>
>
> Okay, this is acceptable. Didn't know the existing trick with TAR
>
> either.

Thank you.

>
> -         /*
>
>
> -          * XXX: this is crap... lz4preferences now does show the uncompressed
>
>
> -          * size via lz4 --list <filename> but the compression goes down the
>
>
> -          * window... also it is not very helpfull to have it at the startm, does
>
>
> -          * it?
>
>
> -          */
>
>
>
> What do you mean here by "the compression goes out the window"?

Please consider me adequately embarrassed. This was a personal comment while I was
working on the code. It is not correct and it should have never seen the public
light.


Cheers,
//Georgios


> ---------------------------------------------------------------
>
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Dilip Kumar

Date:

12 July 2021, 08:40:24

On Thu, Jul 8, 2021 at 7:48 PM <gkokolatos@pm.me> wrote:
>
> Dilip Kumar wrote:
>
> > Wouldn't it be better to call it compression method instead of
> > compression program?
>
> Agreed. This is inline with the suggestions of other reviewers.
> Find the change in the attached patch.

Thanks, I will have a look.

> > I think we can somehow use "acceleration" parameter of lz4 compression
> > to map on compression level, It is not direct mapping but
> > can't we create some internal mapping instead of completely ignoring
> > this option for lz4, or we can provide another option for lz4?
>
> We can, though I am not in favour of doing so. There is seemingly
> little benefit for added complexity.

I am really not sure what complexity you are talking about, do you
mean since with pglz we were already providing the compression level
so let it be as it is but with the new compression method you don't
see much benefit of providing compression level or speed?

> > Should we also support LZ4 compression using dictionary?
>
> I would we should not do that. If my understanding is correct,
> decompression would require the dictionary to be passed along.
> The algorithm seems to be very competitive to the current compression
> as is.

I agree, we might not go for a dictionary because we would need to
dictionary to decompress as well.  So that will add an extra
complexity for user.

-- 
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

12 July 2021, 08:56:03

On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> On Thu, Jul 8, 2021 at 7:48 PM <gkokolatos@pm.me> wrote:
>> We can, though I am not in favour of doing so. There is seemingly
>> little benefit for added complexity.
>
> I am really not sure what complexity you are talking about, do you
> mean since with pglz we were already providing the compression level
> so let it be as it is but with the new compression method you don't
> see much benefit of providing compression level or speed?

You mean s/pglz/zlib/ here perhaps?  I am not sure what Georgios has
in mind, but my opinion stands on the latter: there is little benefit
in making lz4 faster than the default and reduce compression, as the
default is already a rather low CPU user.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

12 July 2021, 12:33:44

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:

> On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
>
> > On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:
> >
> > > We can, though I am not in favour of doing so. There is seemingly
> > >
> > > little benefit for added complexity.
> >
> > I am really not sure what complexity you are talking about, do you
> >
> > mean since with pglz we were already providing the compression level
> >
> > so let it be as it is but with the new compression method you don't
> >
> > see much benefit of providing compression level or speed?
>
> You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
>
> in mind, but my opinion stands on the latter: there is little benefit
>
> in making lz4 faster than the default and reduce compression, as the
>
> default is already a rather low CPU user.

Thank you all for your comments. I am sitting on the same side as Micheal
on this one. The complexity is not huge, yet there will need to be code to
pass this option to the lz4 api and various test cases to verify for
correctness and integrity. The burden of maintenance of this code vs the
benefit of the option, tilt the scale towards not including the option.

Of course, I will happily provide whatever the community finds beneficial.

Cheers,
//Georgios

>
-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Magnus Hagander

Date:

12 July 2021, 12:45:29

On Mon, Jul 12, 2021 at 11:33 AM <gkokolatos@pm.me> wrote:
>
>
>
> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:
>
> > On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> >
> > > On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:
> > >
> > > > We can, though I am not in favour of doing so. There is seemingly
> > > >
> > > > little benefit for added complexity.
> > >
> > > I am really not sure what complexity you are talking about, do you
> > >
> > > mean since with pglz we were already providing the compression level
> > >
> > > so let it be as it is but with the new compression method you don't
> > >
> > > see much benefit of providing compression level or speed?
> >
> > You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
> >
> > in mind, but my opinion stands on the latter: there is little benefit
> >
> > in making lz4 faster than the default and reduce compression, as the
> >
> > default is already a rather low CPU user.
>
> Thank you all for your comments. I am sitting on the same side as Micheal
> on this one. The complexity is not huge, yet there will need to be code to
> pass this option to the lz4 api and various test cases to verify for
> correctness and integrity. The burden of maintenance of this code vs the
> benefit of the option, tilt the scale towards not including the option.

+1 for skipping that one, at least for now, and sticking to
default-only for lz4.

--
 Magnus Hagander
 Me: https://www.hagander.net/
 Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

Dilip Kumar

Date:

12 July 2021, 13:27:50

On Mon, Jul 12, 2021 at 3:15 PM Magnus Hagander <magnus@hagander.net> wrote:
>
> On Mon, Jul 12, 2021 at 11:33 AM <gkokolatos@pm.me> wrote:
> >
> >
> >
> > ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
> >
> > On Monday, July 12th, 2021 at 07:56, Michael Paquier <michael@paquier.xyz> wrote:
> >
> > > On Mon, Jul 12, 2021 at 11:10:24AM +0530, Dilip Kumar wrote:
> > >
> > > > On Thu, Jul 8, 2021 at 7:48 PM gkokolatos@pm.me wrote:
> > > >
> > > > > We can, though I am not in favour of doing so. There is seemingly
> > > > >
> > > > > little benefit for added complexity.
> > > >
> > > > I am really not sure what complexity you are talking about, do you
> > > >
> > > > mean since with pglz we were already providing the compression level
> > > >
> > > > so let it be as it is but with the new compression method you don't
> > > >
> > > > see much benefit of providing compression level or speed?
> > >
> > > You mean s/pglz/zlib/ here perhaps? I am not sure what Georgios has
> > >
> > > in mind, but my opinion stands on the latter: there is little benefit
> > >
> > > in making lz4 faster than the default and reduce compression, as the
> > >
> > > default is already a rather low CPU user.
> >
> > Thank you all for your comments. I am sitting on the same side as Micheal
> > on this one. The complexity is not huge, yet there will need to be code to
> > pass this option to the lz4 api and various test cases to verify for
> > correctness and integrity. The burden of maintenance of this code vs the
> > benefit of the option, tilt the scale towards not including the option.
>
> +1 for skipping that one, at least for now, and sticking to
> default-only for lz4.

Okay, fine with me as well.

--
Regards,
Dilip Kumar
EnterpriseDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

10 September 2021, 11:21:51


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote:

Hi,

please find v3 of the patch attached, rebased to the current head.

> > Michael Paquier wrote:
> >
>
> * http://www.zlib.org/rfc-gzip.html.
>
> -   -   For lz4 compressed segments
>         */
>         This comment is incomplete.

Fixed.

>         +#define IsLZ4CompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0)
>
>     +#define IsLZ4PartialCompressXLogFileName(fname) \
> -   (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \
> -   strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \
> -   strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0)
>
>     This is getting complicated. Would it be better to change this stuff
>     and switch to a routine that checks if a segment has a valid name, is
>     partial, and the type of compression that applied to it? It seems to
>     me that we should group iszlibcompress and islz4compress together with
>     the options available through compression_method.

Agreed and done.


> -   if (compresslevel != 0)
> -   {
> -         if (compression_method == COMPRESSION_NONE)
> -         {
> -             compression_method = COMPRESSION_ZLIB;
> -         }
> -         if (compression_method != COMPRESSION_ZLIB)
> -         {
> -             pg_log_error("cannot use --compress when "
> -                          "--compression-method is not gzip");
> -             fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
> -                     progname);
> -             exit(1);
> -         }
> -   }
>
>     For compatibility where --compress enforces the use of zlib that would
>     work, but this needs a comment explaining the goal of this block. I
>     am wondering if it would be better to break the will and just complain
>     when specifying --compress without --compression-method though. That
>     would cause compatibility issues, but this would make folks aware of
>     the presence of LZ4, which does not sound bad to me either as ZLIB is
>     slower than LZ4 on all fronts.

Fair point. In v3 of the patch --compress requires --compression-method. Passing
0 as value errors out.

> -   else if (compression_method == COMPRESSION_ZLIB)
> -   {
> -         pg_log_error("cannot use --compression-method gzip when "
> -                      "--compression is 0");
> -         fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"),
> -                 progname);
> -         exit(1);
> -   }
>
>     Hmm. It would be more natural to enforce a default compression level
>     in this case? The user is asking for a compression with zlib here.

Agreed. A default value of 5, which is in the middle point of options, has been
defined and used.

In addition, the tests have been adjusted to mimic the newly added gzip tests.


Cheers,
//Georgios


> ---------------------------------------------------------------
>
> Michael

Attachment

v3-0001-Teach-pg_receivewal-to-use-lz4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Jian Guo

Date:

11 September 2021, 08:02:42

@@ -250,14 +302,18 @@ FindStreamingStart(uint32 *tli)
         /*
          * Check that the segment has the right size, if it's supposed to be
          * completed.  For non-compressed segments just check the on-disk size
-         * and see if it matches a completed segment. For compressed segments,
-         * look at the last 4 bytes of the compressed file, which is where the
-         * uncompressed size is located for gz files with a size lower than
-         * 4GB, and then compare it to the size of a completed segment. The 4
-         * last bytes correspond to the ISIZE member according to
+         * and see if it matches a completed segment. For zlib compressed
+         * segments, look at the last 4 bytes of the compressed file, which is
+         * where the uncompressed size is located for gz files with a size lower
+         * than 4GB, and then compare it to the size of a completed segment.
+         * The 4 last bytes correspond to the ISIZE member according to
          * http://www.zlib.org/rfc-gzip.html.
+         *
+         * For lz4 compressed segments read the header using the exposed API and
+         * compare the uncompressed file size, stored in
+         * LZ4F_frameInfo_t{.contentSize}, to that of a completed segment.
          */
-        if (!ispartial && !iscompress)
+        if (!ispartial && wal_compression_method == COMPRESSION_NONE)
         {
             struct stat statbuf;
             char        fullpath[MAXPGPATH * 2];
@@ -276,7 +332,7 @@ FindStreamingStart(uint32 *tli)
                 continue;
             }
         }
-        else if (!ispartial && iscompress)
+        else if (!ispartial && wal_compression_method == COMPRESSION_ZLIB)
         {
             int            fd;
             char        buf[4];
@@ -322,6 +378,70 @@ FindStreamingStart(uint32 *tli)
                 continue;
             }
         }
+        else if (!ispartial && compression_method == COMPRESSION_LZ4)
+        {
+#ifdef HAVE_LIBLZ4
+            int            fd;
+            int            r;
+            size_t        consumed_len = LZ4F_HEADER_SIZE_MAX;
+            char        buf[LZ4F_HEADER_SIZE_MAX];
+            char        fullpath[MAXPGPATH * 2];
+            LZ4F_frameInfo_t frame_info = { 0 };
+            LZ4F_decompressionContext_t ctx = NULL;
+
+            snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
+
+            fd = open(fullpath, O_RDONLY | PG_BINARY, 0);

Should close the fd before exit or abort.

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

13 September 2021, 11:35:43

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, September 11th, 2021 at 07:02, Jian Guo <gjian@vmware.com> wrote:

Hi,

thank you for looking at the patch.

> -             LZ4F_decompressionContext_t ctx = NULL;
> -             snprintf(fullpath, sizeof(fullpath), "%s/%s", basedir, dirent->d_name);
> -             fd = open(fullpath, O_RDONLY | PG_BINARY, 0);
>
> Should close the fd before exit or abort.

You are correct. Please find version 4 attached.

Cheers,
//Georgios

Attachment

v4-0001-Teach-pg_receivewal-to-use-lz4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

15 September 2021, 09:46:50

On Fri, Sep 10, 2021 at 08:21:51AM +0000, gkokolatos@pm.me wrote:
> Agreed. A default value of 5, which is in the middle point of options, has been
> defined and used.
>
> In addition, the tests have been adjusted to mimic the newly added gzip tests.

Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
immediately any data buffered in the frame context but not compressed
yet.  It seems to me that dir_sync() should be extended to support
LZ4.

 export GZIP_PROGRAM=$(GZIP)
+export LZ4
[...]
+PGAC_PATH_PROGS(LZ4, lz4)
+
 PGAC_PATH_BISON
The part of the test assigning LZ4 is fine, but I'd rather switch to a
logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
saving cycles in ./configure.

+static bool
+is_xlogfilename(const char *filename, bool *ispartial,
+               WalCompressionMethod *wal_compression_method)
I like the set of simplifications you have done here to detection if a
segment is partial and which compression method applies to it.

+   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
+   {
+       pg_log_error("can only use --compress together with "
+                    "--compression-method=gzip");
+#ifndef HAVE_LIBLZ4
+       pg_log_error("this build does not support compression via gzip");
+#endif

s/HAVE_LIBLZ4/HAVE_LIBZ/.

+$primary->command_fails(
+   [
+     'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
+     '--compress', '1'
+   ],
+   'failure if --compression-method=lz4 specified with --compress');

This would fail when the code is not built with LZ4 with a non-zero
error code but with an error that is not what we expect.  I think that
you should use $primary->command_fails_like() instead.  That's quite
new, as of de1d4fe.  The matching error pattern will need to change
depending on if we build the code with LZ4 or not.  A simpler method
is to use --compression-method=none, to bypass the first round of
errors and make that build-independent, but that feels incomplete if
you want to tie that to LZ4.

+               pg_log_warning("compressed segment file \"%s\" has incorrect header size %lu, skipping",
+                              dirent->d_name, consumed_len);
+               LZ4F_freeDecompressionContext(ctx);
I agree that skipping all those cases when calculating the streaming
start point is more consistent.

+               if (r < 0)
+                   pg_log_error("could not read compressed file \"%s\": %m",
+                                fullpath);
+               else
+                   pg_log_error("could not read compressed file \"%s\": read %d of %lu",
+                                fullpath, r, sizeof(buf));
Let's same in translation effort here by just using "could not read",
etc. by removing the term "compressed".

+       pg_log_error("can only use --compress together with "
+                    "--compression-method=gzip");
Better to keep these in a single line to ease grepping.  We don't care
if error strings are longer than the 72-80 character limit.

+/* Size of lz4 input chunk for .lz4 */
+#define LZ4_IN_SIZE  4096
Why this choice?  Does it need to use LZ4_COMPRESSBOUND?

-           if (dir_data->compression > 0)
+           if (dir_data->compression_method == COMPRESSION_ZLIB)
                gzclose(gzfp);
            else
Hm.  The addition of the header in dir_open_for_write() uses
LZ4F_compressBegin.  Shouldn't we use LZ4F_compressEnd() if
fsync_fname() or fsync_parent_path() fail on top of closing the fd?
That would be more consistent IMO to do so.  The patch does that in
dir_close().  You should do that additionally if there is a failure
when writing the header.

+                   pg_log_error("invalid compression-method \"%s\"", optarg);
+                   exit(1);
This could be "invalid value \"%s\" for option %s", see
option_parse_int() in fe_utils/option_utils.c.

After running the TAP tests, the LZ4 section is failing as follows:
pg_receivewal: stopped log streaming at 0/4001950 (timeline 1)
pg_receivewal: not renaming "000000010000000000000004.partial", segment is not complete
pg_receivewal: error: could not close file "000000010000000000000004": Undefined error: 0
ok 26 - streaming some WAL using --compression-method=lz4
The third log line I am quoting here looks unexpected to me.  Saying
that, the tests integrate nicely with the existing code.

As mentioned upthread, LZ4-compressed files don't store the file size
by default.  I think that we should document that better in the code
and the documentation, in two ways at least:
- Add some comments mentioning lz4 --content-size, with at least one
in FindStreamingStart().
- Add a new paragraph in the documentation of --compression-method.

The name of the compression method is "LZ4" with upper-case
characters.  Some comments in the code and the tests, as well as the
docs, are not careful about that.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

16 September 2021, 18:17:15

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, September 15th, 2021 at 08:46, Michael Paquier michael@paquier.xyz wrote:

Hi,

thank you for the review.

> On Fri, Sep 10, 2021 at 08:21:51AM +0000, gkokolatos@pm.me wrote:
>
> > Agreed. A default value of 5, which is in the middle point of options, has been
> > defined and used.
> > In addition, the tests have been adjusted to mimic the newly added gzip tests.
>
> Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
> immediately any data buffered in the frame context but not compressed
> yet. It seems to me that dir_sync() should be extended to support
> LZ4.

Agreed. LZ4F_flush() calls have been added where appropriate.

>
> export GZIP_PROGRAM=$(GZIP)
> +export LZ4
> [...]
> +PGAC_PATH_PROGS(LZ4, lz4)
> -
> PGAC_PATH_BISON
>
> The part of the test assigning LZ4 is fine, but I'd rather switch to a
> logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
> saving cycles in ./configure.

Reluctantly agreed.

>
> +static bool
> +is_xlogfilename(const char *filename, bool *ispartial,
> -                   WalCompressionMethod *wal_compression_method)
>
>
> I like the set of simplifications you have done here to detection if a
> segment is partial and which compression method applies to it.

Thank you very much.

>
> +   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
> +   {
> +           pg_log_error("can only use --compress together with "
> +                        "--compression-method=gzip");
> +#ifndef HAVE_LIBLZ4
> +           pg_log_error("this build does not support compression via gzip");
> +#endif
>
> s/HAVE_LIBLZ4/HAVE_LIBZ/.
>

Fixed.

> +$primary->command_fails(
> +   [
> +         'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
> +         '--compress', '1'
> +   ],
> +   'failure if --compression-method=lz4 specified with --compress');
> This would fail when the code is not built with LZ4 with a non-zero
> error code but with an error that is not what we expect. I think that
> you should use $primary->command_fails_like() instead. That's quite
> new, as of de1d4fe. The matching error pattern will need to change
> depending on if we build the code with LZ4 or not. A simpler method
> is to use --compression-method=none, to bypass the first round of
> errors and make that build-independent, but that feels incomplete if
> you want to tie that to LZ4.

Fixed. Now a regex has been added to address both builds.

>
> +                   pg_log_warning("compressed segment file \\\\"%s\\\\" has incorrect header size %lu, skipping",
> +                                  dirent->d_name, consumed_len);
> +                   LZ4F_freeDecompressionContext(ctx);
>
> I agree that skipping all those cases when calculating the streaming
> start point is more consistent.

Thanks.

>
> +                   if (r < 0)
> +                       pg_log_error("could not read compressed file \\\\"%s\\\\": %m",
> +                                    fullpath);
> +                   else
> +                       pg_log_error("could not read compressed file \\\\"%s\\\\": read %d of %lu",
> +                                    fullpath, r, sizeof(buf));
>
> Let's same in translation effort here by just using "could not read",
> etc. by removing the term "compressed".

The string is also present in the gzip compressed case, i.e. prior to this patch.
The translation effort will not be reduced by changing this string only.

> +           pg_log_error("can only use --compress together with "
> +                        "--compression-method=gzip");
>
> Better to keep these in a single line to ease grepping. We don't care
> if error strings are longer than the 72-80 character limit.

Fixed.

> +/* Size of lz4 input chunk for .lz4 */
> +#define LZ4_IN_SIZE 4096
>
> Why this choice? Does it need to use LZ4_COMPRESSBOUND?

This value is used in order to calculate the bound, before any buffer is
received. Then when we receive buffer, we consume them in LZ4_IN_SIZE chunks.
Note the call to LZ4F_compressBound() in dir_open_for_write().

+       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
+       lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, &lz4preferences);


> -              if (dir_data->compression > 0)
> +              if (dir_data->compression_method == COMPRESSION_ZLIB)
>                     gzclose(gzfp);
>                 else
>
> Hm. The addition of the header in dir_open_for_write() uses
> LZ4F_compressBegin. Shouldn't we use LZ4F_compressEnd() if
> fsync_fname() or fsync_parent_path() fail on top of closing the fd?
> That would be more consistent IMO to do so. The patch does that in
> dir_close(). You should do that additionally if there is a failure
> when writing the header.

Fixed. LZ4_flush() have been added where appropriate.

>
> +                       pg_log_error("invalid compression-method \"%s\", optarg);
> +                       exit(1);
>
> This could be "invalid value \"%s\" for option %s", see
> option_parse_int() in fe_utils/option_utils.c.

Fixed.

>
> After running the TAP tests, the LZ4 section is failing as follows:
> pg_receivewal: stopped log streaming at 0/4001950 (timeline 1)
> pg_receivewal: not renaming "000000010000000000000004.partial", segment is not complete
> pg_receivewal: error: could not close file "000000010000000000000004": Undefined error: 0
> ok 26 - streaming some WAL using --compression-method=lz4
> The third log line I am quoting here looks unexpected to me. Saying
> that, the tests integrate nicely with the existing code.

Strange that you got an undefined error. I managed to _almost_ reproduce
with the log line looking like:

  pg_receivewal: error: could not close file "000000010000000000000004": Success

This was due to a call to LZ4F_compressEnd() on a partial file. In v5 of
the patch, LZ4F_compressEnd() is called when the WalCloseMethod is CLOSE_NORMAL
otherwise LZ4F_flush is used. This seems to remove the log line and a
more consistent behaviour overall.

In passing, close_walfile() has been taught to consider compression in
the filename, via get_file_name().

> As mentioned upthread, LZ4-compressed files don't store the file size
> by default. I think that we should document that better in the code
> and the documentation, in two ways at least:
>
> -   Add some comments mentioning lz4 --content-size, with at least one
>     in FindStreamingStart().
> -   Add a new paragraph in the documentation of --compression-method.

Apologies, I didn't understood what you meant upstream. Now I do.

How about:

By default, LZ4-compressed files don't store the uncompressed file size.
However, the program pg_receivewal, does store that information. As a
consequence, the file does not need to be decompressed if the external
program is used, e.g. lz4 -t --content-size <file>, will report the
uncompressed file size.


>     The name of the compression method is "LZ4" with upper-case
>     characters. Some comments in the code and the tests, as well as the
>     docs, are not careful about that.

Hopefully fixed.

Cheers,
//Georgios

>     --
>     Michael
>

Attachment

v5-0001-Teach-pg_receivewal-to-use-LZ4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

17 September 2021, 10:39:41

On Thu, Sep 16, 2021 at 03:17:15PM +0000, gkokolatos@pm.me wrote:
> Hopefully fixed.

Thanks for the new version.  I have put my hands on the patch, and
began reviewing its internals with LZ4.  I am not done with it yet,
and I have noticed some places that could be improved (error handling,
some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
and more tweaks).  I'll send an updated version once I complete my
review, but that looks rather solid overall.

The changes done in close_walfile()@receivelog.c are useful taken
independently, so I have applied these separately.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

17 September 2021, 11:12:42


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, September 17th, 2021 at 09:39, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Sep 16, 2021 at 03:17:15PM +0000, gkokolatos@pm.me wrote:
>
> > Hopefully fixed.
>
> Thanks for the new version. I have put my hands on the patch, and
> began reviewing its internals with LZ4. I am not done with it yet,
> and I have noticed some places that could be improved (error handling,
> some uses of LZ4F_flush() that should be replaced LZ4F_compressEnd(),
> and more tweaks). I'll send an updated version once I complete my
> review, but that looks rather solid overall.

Thanks! Looking forward to seeing it!

> The changes done in close_walfile()@receivelog.c are useful taken
> independently, so I have applied these separately.

Yeah, I was considering it to split them over to a separate commit,
then decided against it. Will do so in the future.

Cheers,
//Georgios

> --------------------------------------------------------------------
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

18 September 2021, 09:18:16

On Fri, Sep 17, 2021 at 08:12:42AM +0000, gkokolatos@pm.me wrote:
> Yeah, I was considering it to split them over to a separate commit,
> then decided against it. Will do so in the future.

I have been digging into the issue I saw in the TAP tests when closing
a segment, and found the problem.  The way you manipulate
frameInfo.contentSize by just setting it to WalSegSz when *opening*
a segment causes problems on LZ4F_compressEnd(), making the code
throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
LZ4F_compressEnd() triggers this check and the error:
    if (cctxPtr->prefs.frameInfo.contentSize) {
        if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
            return err0r(LZ4F_ERROR_frameSize_wrong);
    }

We don't really care about contentSize as long as a segment is not
completed.  Rather than filling contentSize all the time we write
something, we'd better update frameInfo once the segment is
completed and closed.  That would also take take of the error as this
is not checked if contentSize is 0.  It seems to me that we should
fill in the information when doing a CLOSE_NORMAL.

-       if (stream->walmethod->compression() == 0 &&
+       if (stream->walmethod->compression() == COMPRESSION_NONE &&
                stream->walmethod->existsfile(fn))
This one was a more serious issue, as the compression() callback would
return an integer for the compression level but v5 compared it to a
WalCompressionMethod.  In order to take care of this issue, mainly for
pg_basebackup, I think that we have to update the compression()
callback to compression_method(), and it is cleaner to save the
compression method as well as the compression level for the tar data.

I am attaching a new patch, on which I have done many tweaks and
adjustments while reviewing it.  The attached patch fixes the second
issue, and I have done nothing about the first issue yet, but that
should be simple enough to address as this needs an update of the
frame info when closing a completed segment.  Could you look at it?

Thanks,
--
Michael

Attachment

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

29 October 2021, 12:45:41


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Saturday, September 18th, 2021 at 8:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
> On Fri, Sep 17, 2021 at 08:12:42AM +0000, gkokolatos@pm.me wrote:
>
> I have been digging into the issue I saw in the TAP tests when closing
> a segment, and found the problem.  The way you manipulate
> frameInfo.contentSize by just setting it to WalSegSz when *opening*
> a segment causes problems on LZ4F_compressEnd(), making the code
> throw a ERROR_frameSize_wrong.  In lz4frame.c, the end of
> LZ4F_compressEnd() triggers this check and the error:
>     if (cctxPtr->prefs.frameInfo.contentSize) {
>         if (cctxPtr->prefs.frameInfo.contentSize != cctxPtr->totalInSize)
>             return err0r(LZ4F_ERROR_frameSize_wrong);
>     }
>
> We don't really care about contentSize as long as a segment is not
> completed.  Rather than filling contentSize all the time we write
> something, we'd better update frameInfo once the segment is
> completed and closed.  That would also take take of the error as this
> is not checked if contentSize is 0.  It seems to me that we should
> fill in the information when doing a CLOSE_NORMAL.

Thank you for the comment. I think that the opposite should be done. At the time
that the file is closed, the header is already written to disk. We have no way
to know that is not. If we need to go back to refill the information, we will
have to ask for the API to produce a new header. There is little guarantee that
the header size will be the same and as a consequence we will have to shift
the actual data around.

In the attached, the header is rewritten only when closing an incomplete
segment. For all intents and purposes that segment is not usable. However there
might be custom scripts that might want to attempt to parse even an otherwise
unusable file.

A different and easier approach would be to simply prepare the LZ4 context for
future actions and simply ignore the file.

>
> -       if (stream->walmethod->compression() == 0 &&
> +       if (stream->walmethod->compression() == COMPRESSION_NONE &&
>                 stream->walmethod->existsfile(fn))
> This one was a more serious issue, as the compression() callback would
> return an integer for the compression level but v5 compared it to a
> WalCompressionMethod.  In order to take care of this issue, mainly for
> pg_basebackup, I think that we have to update the compression()
> callback to compression_method(), and it is cleaner to save the
> compression method as well as the compression level for the tar data.
>

Agreed.

> I am attaching a new patch, on which I have done many tweaks and
> adjustments while reviewing it.  The attached patch fixes the second
> issue, and I have done nothing about the first issue yet, but that
> should be simple enough to address as this needs an update of the
> frame info when closing a completed segment.  Could you look at it?
>

Thank you. Find v7 attached, rebased to the current head.

Cheers,
//Georgios

> Thanks,
> --
> Michae

Attachment

v7-0001-Teach-pg_receivewal-to-use-LZ4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

29 October 2021, 14:38:33

On Fri, Oct 29, 2021 at 09:45:41AM +0000, gkokolatos@pm.me wrote:
> On Saturday, September 18th, 2021 at 8:18 AM, Michael Paquier <michael@paquier.xyz> wrote:
>> We don't really care about contentSize as long as a segment is not
>> completed.  Rather than filling contentSize all the time we write
>> something, we'd better update frameInfo once the segment is
>> completed and closed.  That would also take take of the error as this
>> is not checked if contentSize is 0.  It seems to me that we should
>> fill in the information when doing a CLOSE_NORMAL.
>
> Thank you for the comment. I think that the opposite should be done. At the time
> that the file is closed, the header is already written to disk. We have no way
> to know that is not. If we need to go back to refill the information, we will
> have to ask for the API to produce a new header. There is little guarantee that
> the header size will be the same and as a consequence we will have to shift
> the actual data around.

Why would the header size change between the moment the segment is
begun and it is finished?  We could store it in memory and write it
again when the segment is closed instead, even if it means to fseek()
back to the beginning of the file once the segment is completed.
Storing WalSegSz from the moment a segment is opened makes the code
weaker to SIGINTs and the kind, so this does not fix the problem I
mentioned previously :/

> In the attached, the header is rewritten only when closing an incomplete
> segment. For all intents and purposes that segment is not usable. However there
> might be custom scripts that might want to attempt to parse even an otherwise
> unusable file.
>
> A different and easier approach would be to simply prepare the LZ4 context for
> future actions and simply ignore the file.

I am not sure what you mean by "ignore" here.  Do you mean to store 0
in contentSize when opening the segment and rewriting again the header
once the segment is completed?
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

01 November 2021, 11:09:08

On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
> Why would the header size change between the moment the segment is
> begun and it is finished?  We could store it in memory and write it
> again when the segment is closed instead, even if it means to fseek()
> back to the beginning of the file once the segment is completed.
> Storing WalSegSz from the moment a segment is opened makes the code
> weaker to SIGINTs and the kind, so this does not fix the problem I
> mentioned previously :/

I got to think more on this one, and another argument against storing
an incorrect contentSize while the segment is not completed would
break the case of partial segments with --synchronous, where we should
still be able to recover as much data flushed as possible.  Like zlib,
where one has to complete the partial segment with zeros after
decompression until the WAL segment size is reached, we should be able
to support that with LZ4.  (I have saved some customer data in the
past thanks to this property, btw.)

It is proves to be too fancy to rewrite the header with a correct
contentSize once the segment is completed, another way would be to
enforce a decompression of each segment in-memory.  The advantage of
this method is that we would be a maximum portable.  For example, if
one begins to use pg_receivewal on an archive directory where we used
an archive_command, we would be able to grab the starting LSN.  That's
more costly of course, but the LZ4 protocol does not make that easy
either with its chunk protocol.  By the way, you are right that we
should worry about the variability in size of the header as we only
have the guarantee that it can be within a give window.  I missed
that and lz4frame.h mentions that around LZ4F_headerSize :/

It would be good to test with many segments, but could we think about
just relying on LZ4F_decompress() with a frame and compute the
decompressed size by ourselves?  At least that will never break, and
that would work in all the cases aimed by pg_receivewal.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

01 November 2021, 11:39:59

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Monday, November 1st, 2021 at 9:09 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
>
> It would be good to test with many segments, but could we think about
> just relying on LZ4F_decompress() with a frame and compute the
> decompressed size by ourselves? At least that will never break, and
> that would work in all the cases aimed by pg_receivewal.

Agreed.

I have already started on v8 of the patch with that technique. I should
be able to update the thread soon.

>
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

02 November 2021, 01:27:50

On Mon, Nov 01, 2021 at 08:39:59AM +0000, gkokolatos@pm.me wrote:
> Agreed.
>
> I have already started on v8 of the patch with that technique. I should
> be able to update the thread soon.

Nice, thanks!
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

02 November 2021, 11:51:45

On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> On Mon, Nov 01, 2021 at 08:39:59AM +0000, gkokolatos@pm.me wrote:
> > Agreed.
> >
> > I have already started on v8 of the patch with that technique. I should
> > be able to update the thread soon.
>
> Nice, thanks!

By the way, I was reading the last version of the patch today, and
it seems to me that we could make the commit history if we split the
patch into two parts:
- One that introduces the new option --compression-method and
is_xlogfilename(), while reworking --compress (including documentation
changes).
- One to have LZ4 support.

v7 has been using "gzip" for the option name, but I think that it
would be more consistent to use "zlib" instead.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

02 November 2021, 14:36:49


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Tuesday, November 2nd, 2021 at 9:51 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> > On Mon, Nov 01, 2021 at 08:39:59AM +0000, gkokolatos@pm.me wrote:
> > > Agreed.
> > >
> > > I have already started on v8 of the patch with that technique. I should
> > > be able to update the thread soon.
> >
> > Nice, thanks!
>

A pleasure. Please find it in the attached v8-0002 patch.

> By the way, I was reading the last version of the patch today, and
> it seems to me that we could make the commit history if we split the
> patch into two parts:
> - One that introduces the new option --compression-method and
> is_xlogfilename(), while reworking --compress (including documentation
> changes).
> - One to have LZ4 support.

Agreed.

>
> v7 has been using "gzip" for the option name, but I think that it
> would be more consistent to use "zlib" instead.

Done.

Cheers,
//Georgios

> --
> Michael

Attachment

Re: Teach pg_receivewal to use lz4 compression

From

Magnus Hagander

Date:

02 November 2021, 15:17:05

On Tue, Nov 2, 2021 at 9:51 AM Michael Paquier <michael@paquier.xyz> wrote:

On Tue, Nov 02, 2021 at 07:27:50AM +0900, Michael Paquier wrote:
> On Mon, Nov 01, 2021 at 08:39:59AM +0000, gkokolatos@pm.me wrote:
> > Agreed.
> >
> > I have already started on v8 of the patch with that technique. I should
> > be able to update the thread soon.
>
> Nice, thanks!

By the way, I was reading the last version of the patch today, and
it seems to me that we could make the commit history if we split the
patch into two parts:
- One that introduces the new option --compression-method and
is_xlogfilename(), while reworking --compress (including documentation
changes).
- One to have LZ4 support.

v7 has been using "gzip" for the option name, but I think that it
would be more consistent to use "zlib" instead.

Um, why?

That we are using zlib to provide the compression is an implementation detail. Whereas AFAIK "gzip" refers to both the program and the format. And we specifically use the gzxxx() functions in zlib, in order to produce gzip format.

I think for the end user, it is strictly better to name it "gzip", and given that the target of this option is the end user we should do so. (It'd be different it we were talking about a build-time parameter to configure).

Magnus Hagander
Me: https://www.hagander.net/
Work: https://www.redpill-linpro.com/

Re: Teach pg_receivewal to use lz4 compression

From

Robert Haas

Date:

02 November 2021, 19:31:47

On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander <magnus@hagander.net> wrote:
> Um, why?
>
> That we are using zlib to provide the compression is an implementation detail. Whereas AFAIK "gzip" refers to both
theprogram and the format. And we specifically use the gzxxx() functions in zlib, in order to produce gzip format.
 
>
> I think for the end user, it is strictly better to name it "gzip", and given that the target of this option is the
enduser we should do so. (It'd be different it we were talking about a build-time parameter to configure).
 

I agree. Also, I think there's actually a file format called "zlib"
which is slightly different from the "gzip" format, and you have to be
careful not to generate the wrong one.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

03 November 2021, 02:23:46

On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
> On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander <magnus@hagander.net> wrote:
>> I think for the end user, it is strictly better to name it "gzip",
>> and given that the target of this option is the end user we should
>> do so. (It'd be different it we were talking about a build-time
>> parameter to configure).
>
> I agree. Also, I think there's actually a file format called "zlib"
> which is slightly different from the "gzip" format, and you have to be
> careful not to generate the wrong one.

Okay, fine by me.  It would be better to be also consistent in
WalCompressionMethods once we switch to this option value, then.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

03 November 2021, 11:05:53


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, November 3rd, 2021 at 12:23 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
>> On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander magnus@hagander.net wrote:
>>> I think for the end user, it is strictly better to name it "gzip",
>>> and given that the target of this option is the end user we should
>>> do so. (It'd be different it we were talking about a build-time
>>> parameter to configure).
>>
>> I agree. Also, I think there's actually a file format called "zlib"
>> which is slightly different from the "gzip" format, and you have to be
>> careful not to generate the wrong one.
>
> Okay, fine by me.  It would be better to be also consistent in
> WalCompressionMethods once we switch to this option value, then.

I will revert to gzip for version 9. Should be out shortly.

Cheers,
//Georgios
>
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

03 November 2021, 12:11:24


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Wednesday, November 3rd, 2021 at 9:05 AM, <gkokolatos@pm.me> wrote:

> ‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐
>
> On Wednesday, November 3rd, 2021 at 12:23 AM, Michael Paquier michael@paquier.xyz wrote:
> > On Tue, Nov 02, 2021 at 12:31:47PM -0400, Robert Haas wrote:
> > > On Tue, Nov 2, 2021 at 8:17 AM Magnus Hagander magnus@hagander.net wrote:
> > >
> > > > I think for the end user, it is strictly better to name it "gzip",
> > > > and given that the target of this option is the end user we should
> > > > do so. (It'd be different it we were talking about a build-time
> > > > parameter to configure).
> > >
> > > I agree. Also, I think there's actually a file format called "zlib"
> > > which is slightly different from the "gzip" format, and you have to be
> > > careful not to generate the wrong one.
> >
> > Okay, fine by me. It would be better to be also consistent in
> > WalCompressionMethods once we switch to this option value, then.
>
> I will revert to gzip for version 9. Should be out shortly.

Please find v9 attached.

Cheers,
//Georgios
>
> > Michael

Attachment

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

04 November 2021, 10:31:48

On Wed, Nov 03, 2021 at 09:11:24AM +0000, gkokolatos@pm.me wrote:
> Please find v9 attached.

Thanks.  I have looked at 0001 today, and applied it after fixing a
couple of issues.  From memory:
- zlib.h was missing from pg_receivewal.c, issue that I noticed after
removing the redefinition of Z_DEFAULT_COMPRESSION because there was
no need for it (did a run with a --without-zlib as well).
- Simplified a bit the error handling for incorrect option
combinations, using a switch/case while on it.
- Renamed the existing variable "compression" in walmethods.c to
compression_level, to reduce any confusion with the introduction of
compression_method.  One thing I have noticed is about the tar method,
where we rely on the compression level to decide if compression should
be used or not.  There should be some simplifications possible there
but there is a huge take in receivelog.c where we use COMPRESSION_NONE
to track down that we still want to zero a new segment when using tar
method.
- Use of 'I' as short option name, err...  After applying the first
batch..

Based on the work of 0001, there were some conflicts with 0002.  I
have solved them while reviewing it, and adapted the code to what got
already applied.

+       header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
+       if (LZ4F_isError(header_size))
+       {
+           pg_free(lz4buf);
+           close(fd);
+           return NULL;
+       }
In dir_open_for_write(), I guess that this one is missing one
LZ4F_freeCompressionContext().

+           status = LZ4F_freeDecompressionContext(ctx);
+           if (LZ4F_isError(status))
+           {
+               pg_log_error("could not free LZ4 decompression context: %s",
+                            LZ4F_getErrorName(status));
+               exit(1);
+           }
+
+           if (uncompressed_size != WalSegSz)
+           {
+               pg_log_warning("compressed segment file \"%s\" has
incorrect uncompressed size %ld, skipping",
+                              dirent->d_name, uncompressed_size);
+               (void) LZ4F_freeDecompressionContext(ctx);
+               continue;
+           }
When the uncompressed size does not match out expected size, the
second LZ4F_freeDecompressionContext() looks unnecessary to me because
we have already one a couple of lines above.

+       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
+       lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
+       if (LZ4F_isError(ctx_out))
+       {
+           close(fd);
+           return NULL;
+       }
LZ4F_compressBound() can be after the check on ctx_out, here.

+           while (1)
+           {
+               char       *readp;
+               char       *readend;
Simply looping when decompressing a segment to check its size looks
rather unsafe to me.  We should leave the loop once uncompressed_size
is strictly more than WalSegSz.

The amount of TAP tests looks fine, and that's consistent with what we
do for zlib^D^D^D^Dgzip.  Now, when testing manually pg_receivewal
with various combinations of gzip, lz4 and none, I can see the
following failure in the code that calculates the streaming start
point:
pg_receivewal: error: could not decompress file
"wals//000000010000000000000006.lz4": ERROR_frameType_unknown

In the LZ4 code, this points to lib/lz4frame.c, where we read an
incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
The segments written by pg_receivewal are clean.  Please note that
this shows up as well when manually compressing some segments with a
simple lz4 command, to simulate for example the case where a user
compressed some segments by himself/herself before running
pg_receivewal.

So, tour code does LZ4F_createDecompressionContext() followed by a
loop on read() and LZ4F_decompress() that relies on an input and an
output buffer of a fixed 4kB size (we could use 64kB at least here
actually?).  So this set of loops looks rather correct to me.

Now, this part is weird:
+               while (readp < readend)
+               {
+                   size_t      read_size = 1;
+                   size_t      out_size = 1;

I would have expected read_size to be (readend - readp) to match with
the remaining data in the read buffer that we still need to read.
Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
output buffer where all the contents are read?  By setting it to 1, I
think that this is doing more loops into LZ4F_decompress() than really
necessary.  It would not hurt either to memset(0) those buffers before
they are used, IMO.  I am not completely sure either, but should we
use the number of bytes returned by LZ4F_decompress() as a hint for
the follow-up loops?

Attached is an updated patch, which includes fixes for most of the
issues I am mentioning above.  Please note that I have not changed
FindStreamingStart(), so this part is the same as v9.

Thanks,
--
Michael

Attachment

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

04 November 2021, 11:21:53

On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> I would have expected read_size to be (readend - readp) to match with
> the remaining data in the read buffer that we still need to read.
> Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
> output buffer where all the contents are read?  By setting it to 1, I
> think that this is doing more loops into LZ4F_decompress() than really
> necessary.  It would not hurt either to memset(0) those buffers before
> they are used, IMO.  I am not completely sure either, but should we
> use the number of bytes returned by LZ4F_decompress() as a hint for
> the follow-up loops?
>
> +#ifdef HAVE_LIBLZ4
> +    while (readp < readend)
> +    {
> +        size_t        read_size = 1;
> +        size_t        out_size = 1;
> +
> +        status = LZ4F_decompress(ctx, outbuf, &out_size,
> +                                 readbuf, &read_size, NULL);

And...  It happens that the error from v9 is here, where we need to
read the amount of remaining data from "readp", and not "readbuf" :)

It is already late here, I'll continue on this stuff tomorrow, but
this looks rather committable overall.
--
Michael

Attachment

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

04 November 2021, 12:53:32


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Nov 04, 2021 at 04:31:48PM +0900, Michael Paquier wrote:
> Thanks.  I have looked at 0001 today, and applied it after fixing a
> couple of issues.

Great! Thank you very much.

> From memory:
> - zlib.h was missing from pg_receivewal.c, issue that I noticed after
> removing the redefinition of Z_DEFAULT_COMPRESSION because there was
> no need for it (did a run with a --without-zlib as well).

Yeah, I simply wanted to avoid adding a header. Either way works really.

> - Simplified a bit the error handling for incorrect option
> combinations, using a switch/case while on it.

Much cleaner done this way.

> - Renamed the existing variable "compression" in walmethods.c to
> compression_level, to reduce any confusion with the introduction of
> compression_method.  One thing I have noticed is about the tar method,
> where we rely on the compression level to decide if compression should
> be used or not.  There should be some simplifications possible there
> but there is a huge take in receivelog.c where we use COMPRESSION_NONE
> to track down that we still want to zero a new segment when using tar
> method.

Agreed.

> - Use of 'I' as short option name, err...  After applying the first
> batch..

I left that in just to have the two compression related options next to each
other when switching. I assumed it might help with readability for the next
developer looking at it.

Removing it, is cleaner for the option definifion though, thanks.

>
> Based on the work of 0001, there were some conflicts with 0002.  I
> have solved them while reviewing it, and adapted the code to what got
> already applied.

Thank you very much.

>
> +       header_size = LZ4F_compressBegin(ctx, lz4buf, lz4bufsize, NULL);
> +       if (LZ4F_isError(header_size))
> +       {
> +           pg_free(lz4buf);
> +           close(fd);
> +           return NULL;
> +       }
> In dir_open_for_write(), I guess that this one is missing one
> LZ4F_freeCompressionContext().

Agreed.

>
> +           status = LZ4F_freeDecompressionContext(ctx);
> +           if (LZ4F_isError(status))
> +           {
> +               pg_log_error("could not free LZ4 decompression context: %s",
> +                            LZ4F_getErrorName(status));
> +               exit(1);
> +           }
> +
> +           if (uncompressed_size != WalSegSz)
> +           {
> +               pg_log_warning("compressed segment file \"%s\" has
> incorrect uncompressed size %ld, skipping",
> +                              dirent->d_name, uncompressed_size);
> +               (void) LZ4F_freeDecompressionContext(ctx);
> +               continue;
> +           }
> When the uncompressed size does not match out expected size, the
> second LZ4F_freeDecompressionContext() looks unnecessary to me because
> we have already one a couple of lines above.

Agreed.

>
> +       ctx_out = LZ4F_createCompressionContext(&ctx, LZ4F_VERSION);
> +       lz4bufsize = LZ4F_compressBound(LZ4_IN_SIZE, NULL);
> +       if (LZ4F_isError(ctx_out))
> +       {
> +           close(fd);
> +           return NULL;
> +       }
> LZ4F_compressBound() can be after the check on ctx_out, here.
>
> +           while (1)
> +           {
> +               char       *readp;
> +               char       *readend;
> Simply looping when decompressing a segment to check its size looks
> rather unsafe to me.  We should leave the loop once uncompressed_size
> is strictly more than WalSegSz.

The loop exits when done reading or when it failed to read:

+               r = read(fd, readbuf, sizeof(readbuf));
+               if (r < 0)
+               {
+                   pg_log_error("could not read file \"%s\": %m", fullpath);
+                   exit(1);
+               }
+
+               /* Done reading */
+               if (r == 0)
+                   break;

Although I do agree that it can exit before that, if the uncompressed size is
greater than WalSegSz.

>
> The amount of TAP tests looks fine, and that's consistent with what we
> do for zlib^D^D^D^Dgzip.  Now, when testing manually pg_receivewal
> with various combinations of gzip, lz4 and none, I can see the
> following failure in the code that calculates the streaming start
> point:
> pg_receivewal: error: could not decompress file
> "wals//000000010000000000000006.lz4": ERROR_frameType_unknown
>

Hmmm.... I will look into that.

> In the LZ4 code, this points to lib/lz4frame.c, where we read an
> incorrect header (see the part that does not match LZ4F_MAGICNUMBER).
> The segments written by pg_receivewal are clean.  Please note that
> this shows up as well when manually compressing some segments with a
> simple lz4 command, to simulate for example the case where a user
> compressed some segments by himself/herself before running
> pg_receivewal.
>

Rights, thank you for investigating. I will look further.

> So, tour code does LZ4F_createDecompressionContext() followed by a
> loop on read() and LZ4F_decompress() that relies on an input and an
> output buffer of a fixed 4kB size (we could use 64kB at least here
> actually?).  So this set of loops looks rather correct to me.
>

For what is worth, in a stand alone program I wrote while investigating, I did
not notice any noteworthy performance gain, when decompressing files of original
size similar to common WalSegSz values, using 4kB, 8kB, 16kB and 32kB buffers.
I did not try 64kB though. This was by no means exhaustive performance testing,
though good enough to propose a value. I chose 4kB because it is small enough to
have in the stack. I thought anything bigger should be heap alloced and that
would add a bit more distraction in the code with the pg_free() calls.

I will re-write to use 64kB in the heap.

> Now, this part is weird:
> +               while (readp < readend)
> +               {
> +                   size_t      read_size = 1;
> +                   size_t      out_size = 1;
>
> I would have expected read_size to be (readend - readp) to match with
> the remaining data in the read buffer that we still need to read.
> Shouldn't out_size also be LZ4_CHUNK_SZ to match with the size of the
> output buffer where all the contents are read?  By setting it to 1, I
> think that this is doing more loops into LZ4F_decompress() than really
> necessary.

You are very correct. An oversight when moving code over from my program and
renaming variables. Consider me embarrassed.

> It would not hurt either to memset(0) those buffers before
> they are used, IMO.

It does not hurt, yet I do not think that is necessary because one buffer is
throw away, i.e. the program writes to it but we never read it, and the other is
overwritten during the read call.

> I am not completely sure either, but should we
> use the number of bytes returned by LZ4F_decompress() as a hint for
> the follow-up loops?

It is possible, though in my humble opinion it adds some code and has no
measurable effect in the code.

> > +        status = LZ4F_decompress(ctx, outbuf, &out_size,
> > +                                 readbuf, &read_size, NULL);

> And...  It happens that the error from v9 is here, where we need to
> read the amount of remaining data from "readp", and not "readbuf" :)

Agreed.

I really suck at renaming all the things... I am really embarrassed.

>
> Attached is an updated patch, which includes fixes for most of the
> issues I am mentioning above.  Please note that I have not changed
> FindStreamingStart(), so this part is the same as v9.

Thanks!

Cheers,
//Georgios

>
> Thanks,
> --
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

04 November 2021, 20:02:28


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Thursday, November 4th, 2021 at 9:21 AM, Michael Paquier <michael@paquier.xyz> wrote:
> > +#ifdef HAVE_LIBLZ4
> > +    while (readp < readend)
> > +    {
> > +        size_t        read_size = 1;
> > +        size_t        out_size = 1;
> > +
> > +        status = LZ4F_decompress(ctx, outbuf, &out_size,
> > +                                 readbuf, &read_size, NULL);
>
> And...  It happens that the error from v9 is here, where we need to
> read the amount of remaining data from "readp", and not "readbuf" :)
>
> It is already late here, I'll continue on this stuff tomorrow, but
> this looks rather committable overall.

Thank you for v11 of the patch. Please find attached v12 which addresses a few
minor points.

Added an Oxford comma since the list now contains three or more terms:
-        <option>--with-lz4</option>) and <literal>none</literal>.
+        <option>--with-lz4</option>), and <literal>none</literal>.

Removed an extra condinional check while switching over compression_method.
Instead of:
        +       case COMPRESSION_LZ4:
        +#ifdef HAVE_LIBLZ4
        +           if (compresslevel != 0)
        +           {
        +               pg_log_error("cannot use --compress with
        --compression-method=%s",
        +                            "lz4");
        +               fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
        +                       progname);
        +               exit(1);
        +           }
        +#else
        +           if (compression_method == COMPRESSION_LZ4)
        +           {
        +               pg_log_error("this build does not support compression with %s",
        +                            "LZ4");
        +               exit(1);
        +           }
        +           break;
        +#endif

I opted for:
        +       case COMPRESSION_LZ4:
        +#ifdef HAVE_LIBLZ4
        +           if (compresslevel != 0)
        +           {
        +               pg_log_error("cannot use --compress with
        --compression-method=%s",
        +                            "lz4");
        +               fprintf(stderr, _("Try \"%s --help\" for more information.\n"),
        +                       progname);
        +               exit(1);
        +           }
        +#else
        +           pg_log_error("this build does not support compression with %s",
        +                        "LZ4");
        +           exit(1);
        + #endif

There was an error while trying to find the streaming start. The code read:
+ else if (!ispartial && compression_method == COMPRESSION_LZ4)

where it should be instead:
+ else if (!ispartial && wal_compression_method == COMPRESSION_LZ4)

because compression_method is the global option exposed to the whereas
wal_compression_method is the local variable used to figure out what kind of
file the function is currently working with. This error was existing at least in
v9-0002 of $subject.

The variables readbuf and outbuf, used in the decompression of LZ4 files, are
now heap allocated.

Last, while the following is correct:
+           /*
+            * Once we have read enough data to cover one segment, we are
+            * done, there is no need to do more.
+            */
+           while (uncompressed_size <= WalSegSz)

I felt that converting it a do {} while () loop instead, will help with
readability:
+           do
+           {
<snip>
+           /*
+            * No need to continue reading the file when the uncompressed_size
+            * exceeds WalSegSz, even if there are still data left to read.
+            * However, if uncompressed_size is equal to WalSegSz, it should
+            * verify that there is no more data to read.
+            */
+           } while (r > 0 && uncompressed_size <= WalSegSz);

of course the check:
+               /* Done reading the file */
+               if (r == 0)
+                   break;
midway the loop is no longer needed and thus removed.

I would like to have a bit more test coverage in the case for FindStreamingStart().
Specifically for the case that a lz4-compressed segment larger than WalSegSz exists.
The attached patch does not contain such test case. I am not very certain on how to
create such a test case reliably as it would be mostly based on a warning emitted
during the parsing of such a file.

Cheers,
//Georgios

> --
> Michael

Attachment

v12-0001-Teach-pg_receivewal-to-use-LZ4-compression.patch

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

05 November 2021, 05:47:20

On Thu, Nov 04, 2021 at 05:02:28PM +0000, gkokolatos@pm.me wrote:
> Removed an extra condinional check while switching over compression_method.

Indeed.  My rebase was a bit sloppy here.

> because compression_method is the global option exposed to the whereas
> wal_compression_method is the local variable used to figure out what kind of
> file the function is currently working with. This error was existing at least in
> v9-0002 of $subject.

Right.

> I felt that converting it a do {} while () loop instead, will help with
> readability:
> +           do
> +           {
> <snip>
> +           /*
> +            * No need to continue reading the file when the uncompressed_size
> +            * exceeds WalSegSz, even if there are still data left to read.
> +            * However, if uncompressed_size is equal to WalSegSz, it should
> +            * verify that there is no more data to read.
> +            */
> +           } while (r > 0 && uncompressed_size <= WalSegSz);

No objections from me to do that.  This makes the code a bit easier to
follow, indeed.

> I would like to have a bit more test coverage in the case for FindStreamingStart().
> Specifically for the case that a lz4-compressed segment larger than WalSegSz exists.

The same could be said for gzip.  I am not sure that this is worth the
extra I/O and pg_receivewal commands, though.

I have spent an extra couple of hours staring at the code, and the
whole looked fine, so applied.  While on it, I have tested the new TAP
tests with all the possible combinations of --without-zlib and
--with-lz4.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

05 November 2021, 10:50:52


‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier <michael@paquier.xyz> wrote:

>
> I have spent an extra couple of hours staring at the code, and the
> whole looked fine, so applied. While on it, I have tested the new TAP
> tests with all the possible combinations of --without-zlib and
> --with-lz4.

Great news. Thank you very much.

Cheers,
//Georgios

> --
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Jeevan Ladhe

Date:

18 November 2021, 17:24:37

In dir_open_for_write() I observe that we are writing the header
and then calling LZ4F_compressEnd() in case there is an error
while writing the buffer to the file, and the output buffer of
LZ4F_compressEnd() is not written anywhere. Why should this be
necessary? To flush the contents of the internal buffer? But, then we
are calling LZ4F_freeCompressionContext() immediately after the
LZ4F_compressEnd() call. I might be missing something, will be
happy to get more insights.

Regards,

Jeevan Ladhe

On Fri, Nov 5, 2021 at 1:21 PM <gkokolatos@pm.me> wrote:

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, November 5th, 2021 at 3:47 AM, Michael Paquier <michael@paquier.xyz> wrote:

>
> I have spent an extra couple of hours staring at the code, and the
> whole looked fine, so applied. While on it, I have tested the new TAP
> tests with all the possible combinations of --without-zlib and
> --with-lz4.

Great news. Thank you very much.

Cheers,
//Georgios

> --
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

19 November 2021, 05:07:40

On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> In dir_open_for_write() I observe that we are writing the header
> and then calling LZ4F_compressEnd() in case there is an error
> while writing the buffer to the file, and the output buffer of
> LZ4F_compressEnd() is not written anywhere. Why should this be
> necessary? To flush the contents of the internal buffer? But, then we
> are calling LZ4F_freeCompressionContext() immediately after the
> LZ4F_compressEnd() call. I might be missing something, will be
> happy to get more insights.

My concern here was symmetry, where IMO it makes sense to have a
compressEnd call each time there is a successful compressBegin call
done for the LZ4 state data, as there is no way to know if in the
future LZ4 won't change some of its internals to do more than just an
internal flush.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

gkokolatos@pm.me

Date:

19 November 2021, 13:07:50

‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐

On Friday, November 19th, 2021 at 3:07 AM, Michael Paquier <michael@paquier.xyz> wrote:

> On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
>
> > In dir_open_for_write() I observe that we are writing the header
> > and then calling LZ4F_compressEnd() in case there is an error
> > while writing the buffer to the file, and the output buffer of
> > LZ4F_compressEnd() is not written anywhere. Why should this be
> > necessary? To flush the contents of the internal buffer? But, then we
> > are calling LZ4F_freeCompressionContext() immediately after the
> > LZ4F_compressEnd() call. I might be missing something, will be
> > happy to get more insights.
>
> My concern here was symmetry, where IMO it makes sense to have a
> compressEnd call each time there is a successful compressBegin call
> done for the LZ4 state data, as there is no way to know if in the
> future LZ4 won't change some of its internals to do more than just an
> internal flush.

Agreed.

Although the library does provide an interface for simply flushing contents, it
also assumes that each initializing call will have a finilizing call. If my
memory serves me right, earlier versions of the patch, did not have this
summetry, but that got ammended.

Cheers,
//Georgios

> ---
> Michael

Re: Teach pg_receivewal to use lz4 compression

From

Jeevan Ladhe

Date:

22 November 2021, 08:45:55

On Fri, Nov 19, 2021 at 7:37 AM Michael Paquier <michael@paquier.xyz> wrote:

On Thu, Nov 18, 2021 at 07:54:37PM +0530, Jeevan Ladhe wrote:
> In dir_open_for_write() I observe that we are writing the header
> and then calling LZ4F_compressEnd() in case there is an error
> while writing the buffer to the file, and the output buffer of
> LZ4F_compressEnd() is not written anywhere. Why should this be
> necessary? To flush the contents of the internal buffer? But, then we
> are calling LZ4F_freeCompressionContext() immediately after the
> LZ4F_compressEnd() call. I might be missing something, will be
> happy to get more insights.

My concern here was symmetry, where IMO it makes sense to have a
compressEnd call each time there is a successful compressBegin call
done for the LZ4 state data, as there is no way to know if in the
future LZ4 won't change some of its internals to do more than just an
internal flush.

Fair enough. But, still I have a doubt in mind what benefit would that

really bring to us here, because we are immediately also freeing the

lz4buf without using it anywhere.

Regards,

Jeevan

Re: Teach pg_receivewal to use lz4 compression

From

Robert Haas

Date:

22 November 2021, 17:02:47

On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Fair enough. But, still I have a doubt in mind what benefit would that
> really bring to us here, because we are immediately also freeing the
> lz4buf without using it anywhere.

Yeah, I'm also doubtful about that. If we're freeng the compression
context, we shouldn't need to guarantee that it's in any particular
state before doing so. Why would any critical cleanup be part of
LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
point of LZ4F_compressEnd() is to make sure all of the output bytes
get written, and it would be stupid to force people to write the
output bytes even when they've decided that they no longer care about
them due to some error.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

24 November 2021, 08:25:52

On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote:
> On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> Fair enough. But, still I have a doubt in mind what benefit would that
>> really bring to us here, because we are immediately also freeing the
>> lz4buf without using it anywhere.
>
> Yeah, I'm also doubtful about that. If we're freeng the compression
> context, we shouldn't need to guarantee that it's in any particular
> state before doing so. Why would any critical cleanup be part of
> LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
> point of LZ4F_compressEnd() is to make sure all of the output bytes
> get written, and it would be stupid to force people to write the
> output bytes even when they've decided that they no longer care about
> them due to some error.

Hmm.  I have double-checked all that, and I agree that we could just
skip LZ4F_compressEnd() in this error code path.  From what I can see
in the upstream code, what we have now is not broken either, but the
compressEnd() call does some work that's not needed here.
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

Jeevan Ladhe

Date:

24 November 2021, 14:54:16

On Wed, Nov 24, 2021 at 10:55 AM Michael Paquier <michael@paquier.xyz> wrote:

On Mon, Nov 22, 2021 at 09:02:47AM -0500, Robert Haas wrote:
> On Mon, Nov 22, 2021 at 12:46 AM Jeevan Ladhe
> <jeevan.ladhe@enterprisedb.com> wrote:
>> Fair enough. But, still I have a doubt in mind what benefit would that
>> really bring to us here, because we are immediately also freeing the
>> lz4buf without using it anywhere.
>
> Yeah, I'm also doubtful about that. If we're freeng the compression
> context, we shouldn't need to guarantee that it's in any particular
> state before doing so. Why would any critical cleanup be part of
> LZ4F_compressEnd() rather than LZ4F_freeCompressionContext()? The
> point of LZ4F_compressEnd() is to make sure all of the output bytes
> get written, and it would be stupid to force people to write the
> output bytes even when they've decided that they no longer care about
> them due to some error.

Hmm. I have double-checked all that, and I agree that we could just
skip LZ4F_compressEnd() in this error code path. From what I can see
in the upstream code, what we have now is not broken either, but the
compressEnd() call does some work that's not needed here.

Yes I agree that we are not broken, but as you said we are doing some

an extra bit of work here.

Regards,

Jeevan Ladhe

Re: Teach pg_receivewal to use lz4 compression

From

Robert Haas

Date:

11 February 2022, 18:07:49

On Thu, Nov 4, 2021 at 10:47 PM Michael Paquier <michael@paquier.xyz> wrote:
> Indeed.  My rebase was a bit sloppy here.

Hi!

Over in http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pGEBaOsgFmzZ6Ac4_sRwOm_+UeHA@mail.gmail.com
I was noticing that CreateWalTarMethod doesn't support LZ4
compression. It would be nice if it did. I thought maybe the patch on
this thread would fix that, but I think maybe it doesn't, because it
looks like that's touching the WalDirectoryMethod part of that file,
rather than the WalTarMethod part. Is that correct? And, on a related
note, Michael, do you plan to get something committed here?

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

12 February 2022, 06:52:40

On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote:
> Over in http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pGEBaOsgFmzZ6Ac4_sRwOm_+UeHA@mail.gmail.com
> I was noticing that CreateWalTarMethod doesn't support LZ4
> compression. It would be nice if it did. I thought maybe the patch on
> this thread would fix that, but I think maybe it doesn't, because it
> looks like that's touching the WalDirectoryMethod part of that file,
> rather than the WalTarMethod part. Is that correct?

Correct.  pg_receivewal only cares about the directory method, so this
thread was limited to this part.  Yes, it would be nice to extend
fully the tar method of walmethods.c to support LZ4, but I was not
sure what needed to be done, and I am still not sure based on what has
just been done as of 751b8d23.

> And, on a related note, Michael, do you plan to get something
> committed here?

Apart from f79962d, babbbb5 and 50e1441, I don't think that there was
something left to do for this thread.  Perhaps I am missing something?
--
Michael

Attachment

signature.asc

Re: Teach pg_receivewal to use lz4 compression

From

Robert Haas

Date:

14 February 2022, 23:07:46

On Fri, Feb 11, 2022 at 10:52 PM Michael Paquier <michael@paquier.xyz> wrote:
> > And, on a related note, Michael, do you plan to get something
> > committed here?
>
> Apart from f79962d, babbbb5 and 50e1441, I don't think that there was
> something left to do for this thread.  Perhaps I am missing something?

Oh, my mistake. I didn't realize you'd already committed it.

-- 
Robert Haas
EDB: http://www.enterprisedb.com

Re: Teach pg_receivewal to use lz4 compression

From

Justin Pryzby

Date:

17 March 2022, 14:12:20

On Sat, Feb 12, 2022 at 12:52:40PM +0900, Michael Paquier wrote:
> On Fri, Feb 11, 2022 at 10:07:49AM -0500, Robert Haas wrote:
> > Over in http://postgr.es/m/CA+TgmoYUDEJga2qV_XbAZ=pGEBaOsgFmzZ6Ac4_sRwOm_+UeHA@mail.gmail.com
> > I was noticing that CreateWalTarMethod doesn't support LZ4
> > compression. It would be nice if it did. I thought maybe the patch on
> > this thread would fix that, but I think maybe it doesn't, because it
> > looks like that's touching the WalDirectoryMethod part of that file,
> > rather than the WalTarMethod part. Is that correct?
> 
> Correct.  pg_receivewal only cares about the directory method, so this
> thread was limited to this part.  Yes, it would be nice to extend
> fully the tar method of walmethods.c to support LZ4, but I was not
> sure what needed to be done, and I am still not sure based on what has
> just been done as of 751b8d23.
> 
> > And, on a related note, Michael, do you plan to get something
> > committed here? 
> 
> Apart from f79962d, babbbb5 and 50e1441, I don't think that there was
> something left to do for this thread.  Perhaps I am missing something?

I think this should use <lz4frame.h>

+#include "lz4frame.h"

commit babbbb595d2322da095a1e6703171b3f1f2815cb
Author: Michael Paquier <michael@paquier.xyz>
Date:   Fri Nov 5 11:33:25 2021 +0900

    Add support for LZ4 compression in pg_receivewal

-- 
Justin

Re: Teach pg_receivewal to use lz4 compression

From

Michael Paquier

Date:

18 March 2022, 04:38:54

On Thu, Mar 17, 2022 at 06:12:20AM -0500, Justin Pryzby wrote:
> I think this should use <lz4frame.h>
>
> +#include "lz4frame.h"
>
> commit babbbb595d2322da095a1e6703171b3f1f2815cb
> Author: Michael Paquier <michael@paquier.xyz>
> Date:   Fri Nov 5 11:33:25 2021 +0900
>
>     Add support for LZ4 compression in pg_receivewal

Yes, you are right.  A second thing is that should be declared before
the PG headers.
--
Michael

Attachment

signature.asc