Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers
From | gkokolatos@pm.me |
---|---|
Subject | Re: Teach pg_receivewal to use lz4 compression |
Date | |
Msg-id | m9tNOULZ1-ZZwdCVNSfWe1HkASTFeobtXkMeewSY--McJMAe1lIx4nmVSiUh8xU1Ano7lwwdCJTLdjPWxqoRVNep325VH8sLCvjHkH9RvOM=@pm.me Whole thread Raw |
In response to | Re: Teach pg_receivewal to use lz4 compression (Michael Paquier <michael@paquier.xyz>) |
List | pgsql-hackers |
‐‐‐‐‐‐‐ Original Message ‐‐‐‐‐‐‐ On Friday, July 9th, 2021 at 04:49, Michael Paquier <michael@paquier.xyz> wrote: > On Thu, Jul 08, 2021 at 02:18:40PM +0000, gkokolatos@pm.me wrote: > > > please find v2 of the patch which tries to address the commends > > > > received so far. > > Thanks! > > > Michael Paquier wrote: > > > > > - system_or_bail('lz4', '-t', $lz4_wals[0]); > > > > > > I think that you should just drop this part of the test. The only > > > > > > part of LZ4 that we require to be present when Postgres is built with > > > > > > --with-lz4 is its library liblz4. Commands associated to it may not > > > > > > be around, causing this test to fail. The test checking that one .lz4 > > > > > > file has been created is good to have. It may be worth adding a test > > > > > > with a .lz4.partial segment generated and --endpos pointing to a LSN > > > > > > that does not finish the segment that gets switched. > > > > I humbly disagree with the need for the test. It is rather easily possible > > > > to generate a file that can not be decoded, thus becoming useless. Having the > > > > test will provide some guarantee for the fact. In the current patch, there > > > > is code to find out if the program lz4 is available in the system. If it is > > > > not available, then that specific test is skipped. The rest remains as it > > > > were. Also `system_or_bail` is not used anymore in favour of the `system_log` > > > > so that the test counted and the execution of tests continues upon failure. > > Check. I can see what you are using in the new patch. We could live > > with that. Great. Thank you. > > > > It seems to me that you are missing some logic in > > > > > > FindStreamingStart() to handle LZ4-compressed segments, in relation > > > > > > with IsCompressXLogFileName() and IsPartialCompressXLogFileName(). > > > > Very correct. The logic is now added. Given the lz4 api, one has to fill > > > > in the uncompressed size during creation time. Then one can read the > > > > headers and verify the expectations. > > Yeah, I read that as well with lz4 --list and the kind. That's weird > > compared to how ZLIB gives an easy access to it. We may want to do an > > effort and tell about more lz4 --content-size/--list, telling that we > > add the size in the segment compressed because we have to and LZ4 does > > not do it by default? I am afraid I do not follow. In the patch we do add the uncompressed size and then, the uncompressed size is checked against a known value WalSegSz. What the compressed size will be checked against? > > > > Should we have more tests for ZLIB, while on it? That seems like a > > > > > > good addition as long as we can skip the tests conditionally when > > > > > > that's not supported. > > > > Agreed. Please allow me to provide a distinct patch for this code. > > Thanks. Looking forward to seeing it. That may be better on a > > separate thread, even if I digressed in this thread :) Thank you for the comments. I will sent in a separate thread. > > > > I think we can somehow use "acceleration" parameter of lz4 compression > > > > > > to map on compression level, It is not direct mapping but > > > > > > can't we create some internal mapping instead of completely ignoring > > > > > > this option for lz4, or we can provide another option for lz4? > > > > We can, though I am not in favour of doing so. There is seemingly > > > > little benefit for added complexity. > > Agreed. > > > > What I think is important for the user when it comes to this > > > > > > option is the consistency of its naming across all the tools > > > > > > supporting it. pg_dump and pg_basebackup, where we could plug LZ4, > > > > > > already use most of the short options you could use for pg_receivewal, > > > > > > having only a long one gives a bit more flexibility. > > > > Done. > > * http://www.zlib.org/rfc-gzip.html. > > - - For lz4 compressed segments > > */ > > This comment is incomplete. It is. I will fix. > > +#define IsLZ4CompressXLogFileName(fname) \ > - (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4") && \ > - strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \ > - strcmp((fname) + XLOG_FNAME_LEN, ".lz4") == 0) > > +#define IsLZ4PartialCompressXLogFileName(fname) \ > - (strlen(fname) == XLOG_FNAME_LEN + strlen(".lz4.partial") && \ > - strspn(fname, "0123456789ABCDEF") == XLOG_FNAME_LEN && \ > - strcmp((fname) + XLOG_FNAME_LEN, ".lz4.partial") == 0) > > This is getting complicated. Would it be better to change this stuff > > and switch to a routine that checks if a segment has a valid name, is > > partial, and the type of compression that applied to it? It seems to > > me that we should group iszlibcompress and islz4compress together with > > the options available through compression_method. I agree with you. I will refactor. > - if (compresslevel != 0) > - { > - if (compression_method == COMPRESSION_NONE) > > > - { > > > - compression_method = COMPRESSION_ZLIB; > > > - } > > > - if (compression_method != COMPRESSION_ZLIB) > > > - { > > > - pg_log_error("cannot use --compress when " > > > - "--compression-method is not gzip"); > > > - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"), > > > - progname); > > > - exit(1); > > > - } > > > - } > > For compatibility where --compress enforces the use of zlib that would > > work, but this needs a comment explaining the goal of this block. I > > am wondering if it would be better to break the will and just complain > > when specifying --compress without --compression-method though. That > > would cause compatibility issues, but this would make folks aware of > > the presence of LZ4, which does not sound bad to me either as ZLIB is > > slower than LZ4 on all fronts. I would vote to break the compatibility if that is an option. I chose the less invasive approach thinking that breaking the compatibility would not be an option. Unless others object, I will include --compress as a complimentary option to --compression-method in updated version of the patch. > - else if (compression_method == COMPRESSION_ZLIB) > - { > - pg_log_error("cannot use --compression-method gzip when " > > > - "--compression is 0"); > > > - fprintf(stderr, _("Try \\"%s --help\\" for more information.\\n"), > > > - progname); > > > - exit(1); > > > - } > > Hmm. It would be more natural to enforce a default compression level > > in this case? The user is asking for a compression with zlib here. You are correct, in the current patch passing --compression-method=gzip alone is equivalent to passing --compression=0 in the current master version. This behaviour may be confusing for the user. What should the default compression be then? I am inclined to say '5' as a compromise between speed and compression ration. > - my $lz4 = $ENV{LZ4}; > > [...] > - Verify that the stored file is readable if program lz4 is available > =================================================================== > > - skip "program lz4 is not found in your system", 1 > - if (!defined $lz4 || $lz4 eq ''); > > > > Okay, this is acceptable. Didn't know the existing trick with TAR > > either. Thank you. > > - /* > > > - * XXX: this is crap... lz4preferences now does show the uncompressed > > > - * size via lz4 --list <filename> but the compression goes down the > > > - * window... also it is not very helpfull to have it at the startm, does > > > - * it? > > > - */ > > > > What do you mean here by "the compression goes out the window"? Please consider me adequately embarrassed. This was a personal comment while I was working on the code. It is not correct and it should have never seen the public light. Cheers, //Georgios > --------------------------------------------------------------- > > Michael
pgsql-hackers by date: