Re: refactoring basebackup.c - Mailing list pgsql-hackers

From Robert Haas
Subject Re: refactoring basebackup.c
Date
Msg-id CA+TgmoZEzVs5s-Y-mCMb=pABTdaney=EgipH575z4b3Q=+oW+A@mail.gmail.com
Whole thread Raw
In response to Re: refactoring basebackup.c  (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
Responses Re: refactoring basebackup.c  (Jeevan Ladhe <jeevan.ladhe@enterprisedb.com>)
List pgsql-hackers
On Thu, Oct 14, 2021 at 1:21 PM Jeevan Ladhe
<jeevan.ladhe@enterprisedb.com> wrote:
> Agree. Removed the CHUNK_SIZE and the loop.

Try harder. :-)

The loop is gone, but CHUNK_SIZE itself seems to have evaded the executioner.

> Fair enough. I have made the change in the bbsink_lz4_begin_backup() to
> make sure we reserve enough extra bytes for the header and the footer those
> are written by LZ4F_compressBegin() and LZ4F_compressEnd() respectively.
> The LZ4F_compressBound() when passed the input size as "0", would give
> the upper bound for output buffer needed by the LZ4F_compressEnd().

I think this is not the best way to accomplish the goal. Adding
LZ4F_compressBound(0) to next_buf_len makes the buffer substantially
bigger for something that's only going to happen once. We are assuming
in any case, I think, that LZ4F_compressBound(0) <=
LZ4F_compressBound(mysink->base.bbs_buffer_length), so all you need to
do is have bbsink_end_archive() empty the buffer, if necessary, before
calling LZ4F_compressEnd(). With just that change, you can set
next_buf_len = LZ4F_HEADER_SIZE_MAX + mysink->output_buffer_bound --
but that's also more than you need. You can instead do next_buf_len =
Min(LZ4F_HEADER_SIZE_MAX, mysink->output_buffer_bound). Now, you're
probably thinking that won't work, because bbsink_lz4_begin_archive()
could fill up the buffer partway, and then the first call to
bbsink_lz4_archive_contents() could overrun it. But that problem can
be solved by reversing the order of operations in
bbsink_lz4_archive_contents(): before you call LZ4F_compressUpdate(),
test whether you need to empty the buffer first, and if so, do it.

That's actually less confusing than the way you've got it, because as
you have it written, we don't really know why we're emptying the
buffer -- is it to prepare for the next call to LZ4F_compressUpdate(),
or is it to prepare for the call to LZ4F_compressEnd()? How do we know
now how much space the next person writing into the buffer is going to
need? It seems better if bbsink_lz4_archive_contents() empties the
buffer before calling LZ4F_compressUpdate() if that call might not
have enough space, and likewise bbsink_lz4_end_archive() empties the
buffer before calling LZ4F_compressEnd() if that's needed. That way,
each callback makes the space *it* needs, not the space the *next*
caller needs. (bbsink_lz4_end_archive() still needs to ALSO empty the
buffer after LZ4F_compressEnd(), so we don't orphan any data.)

On another note, if the call to LZ4F_freeCompressionContext() is
required in bbsink_lz4_end_archive(), then I think this code is going
to just leak the memory used by the compression context if an error
occurs before this code is reached. That kind of sucks. The way to fix
it, I suppose, is a TRY/CATCH block, but I don't think that can be
something internal to basebackup_lz4.c: I think the bbsink stuff would
need to provide some kind of infrastructure for basebackup_lz4.c to
use. It would be a lot better if we could instead get LZ4 to allocate
memory using palloc(), but a quick Google search suggests that you
can't accomplish that without recompiling liblz4, and that's not
workable since we don't want to require a liblz4 built specifically
for PostgreSQL. Do you see any other solution?

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: "David G. Johnston"
Date:
Subject: Re: [PATCH] Proposal for HIDDEN/INVISIBLE column
Next
From: Jeff Davis
Date:
Subject: Re: should we allow users with a predefined role to access pg_backend_memory_contexts view and pg_log_backend_memory_contexts function?