Re: Teach pg_receivewal to use lz4 compression - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Teach pg_receivewal to use lz4 compression
Date
Msg-id YX+gpFH2s9MG5X3D@paquier.xyz
Whole thread Raw
In response to Re: Teach pg_receivewal to use lz4 compression  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Teach pg_receivewal to use lz4 compression
List pgsql-hackers
On Fri, Oct 29, 2021 at 08:38:33PM +0900, Michael Paquier wrote:
> Why would the header size change between the moment the segment is
> begun and it is finished?  We could store it in memory and write it
> again when the segment is closed instead, even if it means to fseek()
> back to the beginning of the file once the segment is completed.
> Storing WalSegSz from the moment a segment is opened makes the code
> weaker to SIGINTs and the kind, so this does not fix the problem I
> mentioned previously :/

I got to think more on this one, and another argument against storing
an incorrect contentSize while the segment is not completed would
break the case of partial segments with --synchronous, where we should
still be able to recover as much data flushed as possible.  Like zlib,
where one has to complete the partial segment with zeros after
decompression until the WAL segment size is reached, we should be able
to support that with LZ4.  (I have saved some customer data in the
past thanks to this property, btw.)

It is proves to be too fancy to rewrite the header with a correct
contentSize once the segment is completed, another way would be to
enforce a decompression of each segment in-memory.  The advantage of
this method is that we would be a maximum portable.  For example, if
one begins to use pg_receivewal on an archive directory where we used
an archive_command, we would be able to grab the starting LSN.  That's
more costly of course, but the LZ4 protocol does not make that easy
either with its chunk protocol.  By the way, you are right that we
should worry about the variability in size of the header as we only
have the guarantee that it can be within a give window.  I missed
that and lz4frame.h mentions that around LZ4F_headerSize :/

It would be good to test with many segments, but could we think about
just relying on LZ4F_decompress() with a frame and compute the
decompressed size by ourselves?  At least that will never break, and
that would work in all the cases aimed by pg_receivewal.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Peter Eisentraut
Date:
Subject: Re: Some RELKIND macro refactoring
Next
From: gkokolatos@pm.me
Date:
Subject: Re: Teach pg_receivewal to use lz4 compression