Re: refactoring basebackup.c - Mailing list pgsql-hackers

From Jeevan Ladhe
Subject Re: refactoring basebackup.c
Date
Msg-id CAOgcT0OarU7csuf6Obgmg96DdjpGKUSETx5d6ss3eJx7VD+bdA@mail.gmail.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: refactoring basebackup.c
List pgsql-hackers
Thanks, Robert for reviewing the patch.


On Tue, Oct 12, 2021 at 11:09 PM Robert Haas <robertmhaas@gmail.com> wrote:

This is the only place where CHUNK_SIZE gets used, and I don't think I
see any point to it. I think the 5th argument to LZ4F_compressUpdate
could just be avail_in. And as soon as you do that then I think
bbsink_lz4_archive_contents() no longer needs to be a loop.

Agree. Removed the CHUNK_SIZE and the loop.
 

+ /* First of all write the frame header to destination buffer. */
+ headerSize = LZ4F_compressBegin(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer,
+ mysink->base.bbs_next->bbs_buffer_length,
+ &mysink->prefs);

+ compressedSize = LZ4F_compressEnd(mysink->ctx,
+ mysink->base.bbs_next->bbs_buffer + mysink->bytes_written,
+ mysink->base.bbs_next->bbs_buffer_length - mysink->bytes_written,
+ NULL);

I think there's some issue with these two chunks of code. What happens
if one of these functions wants to write more data than will fit in
the output buffer? It seems like either there needs to be some code
someplace that ensures adequate space in the output buffer at the time
of these calls, or else there needs to be a retry loop that writes as
much of the data as possible, flushes the output buffer, and then
loops to generate more output data. But there's clearly no retry loop
here, and I don't see any code that guarantees that the output buffer
has to be large enough (and in the case of LZ4F_compressEnd, have
enough remaining space) either. In other words, all the same concerns
that apply to LZ4F_compressUpdate() also apply here ... but in
LZ4F_compressUpdate() you seem to BOTH have a retry loop and ALSO code
to make sure that the buffer is certain to be large enough (which is
more than you need, you only need one of those) and here you seem to
have NEITHER of those things (which is not enough, you need one or the
other).

Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
make sure we reserve enough extra bytes for the header and the footer those
are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
The LZ4F_compressBound() when passed the input size as "0", would give
the upper bound for output buffer needed by the LZ4F_compressEnd().

How about instead using memset() to zero the whole thing and then
omitting the zero initializations? That seems like it would be less
fragile, if the upstream structure definition ever changes.

Made this change.

Please review the patch, and let me know your comments.

Regards,
Jeevan Ladhe
Attachment

pgsql-hackers by date:

Previous
From: Isaac Morland
Date:
Subject: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?
Next
From: Andres Freund
Date:
Subject: Re: [RFC] building postgres with meson