Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers
From | Michael Paquier |
---|---|
Subject | Re: Teach pg_receivewal to use lz4 compression |
Date | |
Msg-id | YUGW2mwx0eNnffoR@paquier.xyz Whole thread Raw |
In response to | Re: Teach pg_receivewal to use lz4 compression (gkokolatos@pm.me) |
Responses |
Re: Teach pg_receivewal to use lz4 compression
|
List | pgsql-hackers |
On Fri, Sep 10, 2021 at 08:21:51AM +0000, gkokolatos@pm.me wrote: > Agreed. A default value of 5, which is in the middle point of options, has been > defined and used. > > In addition, the tests have been adjusted to mimic the newly added gzip tests. Looking at lz4frame.h, there is LZ4F_flush() that allows to compress immediately any data buffered in the frame context but not compressed yet. It seems to me that dir_sync() should be extended to support LZ4. export GZIP_PROGRAM=$(GZIP) +export LZ4 [...] +PGAC_PATH_PROGS(LZ4, lz4) + PGAC_PATH_BISON The part of the test assigning LZ4 is fine, but I'd rather switch to a logic à-la-gzip, where we just save "lz4" in Makefile.global.in, saving cycles in ./configure. +static bool +is_xlogfilename(const char *filename, bool *ispartial, + WalCompressionMethod *wal_compression_method) I like the set of simplifications you have done here to detection if a segment is partial and which compression method applies to it. + if (compression_method != COMPRESSION_ZLIB && compresslevel != 0) + { + pg_log_error("can only use --compress together with " + "--compression-method=gzip"); +#ifndef HAVE_LIBLZ4 + pg_log_error("this build does not support compression via gzip"); +#endif s/HAVE_LIBLZ4/HAVE_LIBZ/. +$primary->command_fails( + [ + 'pg_receivewal', '-D', $stream_dir, '--compression-method', 'lz4', + '--compress', '1' + ], + 'failure if --compression-method=lz4 specified with --compress'); This would fail when the code is not built with LZ4 with a non-zero error code but with an error that is not what we expect. I think that you should use $primary->command_fails_like() instead. That's quite new, as of de1d4fe. The matching error pattern will need to change depending on if we build the code with LZ4 or not. A simpler method is to use --compression-method=none, to bypass the first round of errors and make that build-independent, but that feels incomplete if you want to tie that to LZ4. + pg_log_warning("compressed segment file \"%s\" has incorrect header size %lu, skipping", + dirent->d_name, consumed_len); + LZ4F_freeDecompressionContext(ctx); I agree that skipping all those cases when calculating the streaming start point is more consistent. + if (r < 0) + pg_log_error("could not read compressed file \"%s\": %m", + fullpath); + else + pg_log_error("could not read compressed file \"%s\": read %d of %lu", + fullpath, r, sizeof(buf)); Let's same in translation effort here by just using "could not read", etc. by removing the term "compressed". + pg_log_error("can only use --compress together with " + "--compression-method=gzip"); Better to keep these in a single line to ease grepping. We don't care if error strings are longer than the 72-80 character limit. +/* Size of lz4 input chunk for .lz4 */ +#define LZ4_IN_SIZE 4096 Why this choice? Does it need to use LZ4_COMPRESSBOUND? - if (dir_data->compression > 0) + if (dir_data->compression_method == COMPRESSION_ZLIB) gzclose(gzfp); else Hm. The addition of the header in dir_open_for_write() uses LZ4F_compressBegin. Shouldn't we use LZ4F_compressEnd() if fsync_fname() or fsync_parent_path() fail on top of closing the fd? That would be more consistent IMO to do so. The patch does that in dir_close(). You should do that additionally if there is a failure when writing the header. + pg_log_error("invalid compression-method \"%s\"", optarg); + exit(1); This could be "invalid value \"%s\" for option %s", see option_parse_int() in fe_utils/option_utils.c. After running the TAP tests, the LZ4 section is failing as follows: pg_receivewal: stopped log streaming at 0/4001950 (timeline 1) pg_receivewal: not renaming "000000010000000000000004.partial", segment is not complete pg_receivewal: error: could not close file "000000010000000000000004": Undefined error: 0 ok 26 - streaming some WAL using --compression-method=lz4 The third log line I am quoting here looks unexpected to me. Saying that, the tests integrate nicely with the existing code. As mentioned upthread, LZ4-compressed files don't store the file size by default. I think that we should document that better in the code and the documentation, in two ways at least: - Add some comments mentioning lz4 --content-size, with at least one in FindStreamingStart(). - Add a new paragraph in the documentation of --compression-method. The name of the compression method is "LZ4" with upper-case characters. Some comments in the code and the tests, as well as the docs, are not careful about that. -- Michael
Attachment
pgsql-hackers by date: