On Fri, Sep 12, 2025 at 11:58 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Here are some review comments on v3-0004:
>
Thanks for the review. My replies are below.
> There doesn't seem to be any reason for
> astreamer_waldump_content_new() to take an astreamer *next argument.
> If you look at astreamer.h, you'll see that some astreamer_BLAH_new()
> functions take such an argument, and others don't. The ones that do
> forward their input to another astreamer; the ones that don't, like
> astreamer_plain_writer_new(), send it somewhere else. AFAICT, this
> astreamer is never going to send its output to another astreamer, so
> there's no reason for this argument.
>
Done.
> I'm also a little confused by the choice of the name
> astreamer_waldump_content_new(). I would have thought this would be
> something like astreamer_waldump_new() or astreamer_xlogreader_new().
> The word "content" doesn't seem to me to be adding much here, and it
> invites confusion with the "content" callback.
>
Done -- renamed to astreamer_waldump_new().
> I think you can merge setup_astreamer() into
> init_tar_archive_reader(). The only other caller is
> verify_tar_archive(), but that does exactly the same additional steps
> as init_tar_archive_reader(), as far as I can see.
>
Done.
> The return statement for astreamer_wal_read is really odd:
>
> + return (count - nbytes) ? (count - nbytes) : -1;
>
Agreed, that's a bit odd. This seems to be leftover code from the experimental
patch. The astreamer_wal_read() function should behave like WALRead():
it should either successfully read all the requested bytes or throw an
error. Corrected in the attached version.
>
> I would suggest changing the name of the variable from "readBuff" to
> "readBuf". There are no existing uses of readBuff in the code base.
>
The existing WALDumpReadPage() function has a "readBuff" argument, and
I've used it that way for consistency.
> I think this comment also needs improvement:
>
> + /*
> + * Ignore existing data if the required target page
> has not yet been
> + * read.
> + */
> + if (recptr >= endPtr)
> + {
> + len = 0;
> +
> + /* Reset the buffer */
> + resetStringInfo(astreamer_buf);
> + }
>
> This comment is problematic for a few reasons. First, we're not
> ignoring the existing data: we're throwing it out. Second, the comment
> doesn't say why we're doing what we're doing, only that we're doing
> it. Here's my guess at the actual explanation -- please correct me if
> I'm wrong: "pg_waldump never reads the same WAL bytes more than once,
> so if we're now being asked for data beyond the end of what we've
> already read, that means none of the data we currently have in the
> buffer will ever be consulted again. So, we can discard the existing
> buffer contents and start over." By the way, if this explanation is
> correct, it might be nice to add an assertion someplace that verifies
> it, like asserting that we're always reading from an LSN greater than
> or equal to (or exactly equal to?) the LSN immediately following the
> last data we read.
>
Updated the comment. The similar assertion exists right before
copying to the readBuff.
>
> Another thing that isn't so nice right now is that
> verify_tar_archive() has to open and close the archive only for
> init_tar_archive_reader() to be called to reopen it again just moments
> later. It would be nicer to open the file just once and then keep it
> open. Here again, I wonder if the separation of duties could be a bit
> cleaner.
>
Prefer to keep those separate, assuming that reopening the file won't
cause any significant harm. Let me know if you think otherwise.
Attached the updated version, kindly have a look.
Regards,
Amul