Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Teach pg_receivewal to use lz4 compression
Date
Msg-id YUGW2mwx0eNnffoR@paquier.xyz
Whole thread Raw
In response to Re: Teach pg_receivewal to use lz4 compression  (gkokolatos@pm.me)
Responses Re: Teach pg_receivewal to use lz4 compression
List pgsql-hackers
On Fri, Sep 10, 2021 at 08:21:51AM +0000, gkokolatos@pm.me wrote:
> Agreed. A default value of 5, which is in the middle point of options, has been
> defined and used.
>
> In addition, the tests have been adjusted to mimic the newly added gzip tests.

Looking at lz4frame.h, there is LZ4F_flush() that allows to compress
immediately any data buffered in the frame context but not compressed
yet.  It seems to me that dir_sync() should be extended to support
LZ4.

 export GZIP_PROGRAM=$(GZIP)
+export LZ4
[...]
+PGAC_PATH_PROGS(LZ4, lz4)
+
 PGAC_PATH_BISON
The part of the test assigning LZ4 is fine, but I'd rather switch to a
logic à-la-gzip, where we just save "lz4" in Makefile.global.in,
saving cycles in ./configure.

+static bool
+is_xlogfilename(const char *filename, bool *ispartial,
+               WalCompressionMethod *wal_compression_method)
I like the set of simplifications you have done here to detection if a
segment is partial and which compression method applies to it.

+   if (compression_method != COMPRESSION_ZLIB && compresslevel != 0)
+   {
+       pg_log_error("can only use --compress together with "
+                    "--compression-method=gzip");
+#ifndef HAVE_LIBLZ4
+       pg_log_error("this build does not support compression via gzip");
+#endif

s/HAVE_LIBLZ4/HAVE_LIBZ/.

+$primary->command_fails(
+   [
+     'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4',
+     '--compress', '1'
+   ],
+   'failure if --compression-method=lz4 specified with --compress');

This would fail when the code is not built with LZ4 with a non-zero
error code but with an error that is not what we expect.  I think that
you should use $primary->command_fails_like() instead.  That's quite
new, as of de1d4fe.  The matching error pattern will need to change
depending on if we build the code with LZ4 or not.  A simpler method
is to use --compression-method=none, to bypass the first round of
errors and make that build-independent, but that feels incomplete if
you want to tie that to LZ4.

+               pg_log_warning("compressed segment file \"%s\" has incorrect header size %lu, skipping",
+                              dirent->d_name, consumed_len);
+               LZ4F_freeDecompressionContext(ctx);
I agree that skipping all those cases when calculating the streaming
start point is more consistent.

+               if (r < 0)
+                   pg_log_error("could not read compressed file \"%s\": %m",
+                                fullpath);
+               else
+                   pg_log_error("could not read compressed file \"%s\": read %d of %lu",
+                                fullpath, r, sizeof(buf));
Let's same in translation effort here by just using "could not read",
etc. by removing the term "compressed".

+       pg_log_error("can only use --compress together with "
+                    "--compression-method=gzip");
Better to keep these in a single line to ease grepping.  We don't care
if error strings are longer than the 72-80 character limit.

+/* Size of lz4 input chunk for .lz4 */
+#define LZ4_IN_SIZE  4096
Why this choice?  Does it need to use LZ4_COMPRESSBOUND?

-           if (dir_data->compression > 0)
+           if (dir_data->compression_method == COMPRESSION_ZLIB)
                gzclose(gzfp);
            else
Hm.  The addition of the header in dir_open_for_write() uses
LZ4F_compressBegin.  Shouldn't we use LZ4F_compressEnd() if
fsync_fname() or fsync_parent_path() fail on top of closing the fd?
That would be more consistent IMO to do so.  The patch does that in
dir_close().  You should do that additionally if there is a failure
when writing the header.

+                   pg_log_error("invalid compression-method \"%s\"", optarg);
+                   exit(1);
This could be "invalid value \"%s\" for option %s", see
option_parse_int() in fe_utils/option_utils.c.

After running the TAP tests, the LZ4 section is failing as follows:
pg_receivewal: stopped log streaming at 0/4001950 (timeline 1)
pg_receivewal: not renaming "000000010000000000000004.partial", segment is not complete
pg_receivewal: error: could not close file "000000010000000000000004": Undefined error: 0
ok 26 - streaming some WAL using --compression-method=lz4
The third log line I am quoting here looks unexpected to me.  Saying
that, the tests integrate nicely with the existing code.

As mentioned upthread, LZ4-compressed files don't store the file size
by default.  I think that we should document that better in the code
and the documentation, in two ways at least:
- Add some comments mentioning lz4 --content-size, with at least one
in FindStreamingStart().
- Add a new paragraph in the documentation of --compression-method.

The name of the compression method is "LZ4" with upper-case
characters.  Some comments in the code and the tests, as well as the
docs, are not careful about that.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Andrey Lepikhov
Date:
Subject: Re: Increase value of OUTER_VAR
Next
From: Amit Kapila
Date:
Subject: Re: Added schema level support for publication.