Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers

From Amul Sul
Subject Re: pg_waldump: support decoding of WAL inside tarfile
Date
Msg-id CAAJ_b94Uh+b41LQG45bZFK+i62EVvv972LiGWWWuR64=-64rTQ@mail.gmail.com
Whole thread Raw
In response to Re: pg_waldump: support decoding of WAL inside tarfile  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On Fri, Sep 12, 2025 at 11:58 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> Here are some review comments on v3-0004:
>

Thanks for the review. My replies are below.

> There doesn't seem to be any reason for
> astreamer_waldump_content_new() to take an astreamer *next argument.
> If you look at astreamer.h, you'll see that some astreamer_BLAH_new()
> functions take such an argument, and others don't. The ones that do
> forward their input to another astreamer; the ones that don't, like
> astreamer_plain_writer_new(), send it somewhere else. AFAICT, this
> astreamer is never going to send its output to another astreamer, so
> there's no reason for this argument.
>

Done.

> I'm also a little confused by the choice of the name
> astreamer_waldump_content_new(). I would have thought this would be
> something like astreamer_waldump_new() or astreamer_xlogreader_new().
> The word "content" doesn't seem to me to be adding much here, and it
> invites confusion with the "content" callback.
>

Done -- renamed to astreamer_waldump_new().

> I think you can merge setup_astreamer() into
> init_tar_archive_reader(). The only other caller is
> verify_tar_archive(), but that does exactly the same additional steps
> as init_tar_archive_reader(), as far as I can see.
>

Done.

> The return statement for astreamer_wal_read is really odd:
>
> +       return (count - nbytes) ? (count - nbytes) : -1;
>

Agreed, that's a bit odd. This seems to be leftover code from the experimental
patch. The astreamer_wal_read() function should behave like WALRead():
it should either successfully read all the requested bytes or throw an
error. Corrected in the attached version.

>
> I would suggest changing the name of the variable from "readBuff" to
> "readBuf". There are no existing uses of readBuff in the code base.
>

The existing WALDumpReadPage() function has a "readBuff" argument, and
I've used it that way for consistency.

> I think this comment also needs improvement:
>
> +               /*
> +                * Ignore existing data if the required target page
> has not yet been
> +                * read.
> +                */
> +               if (recptr >= endPtr)
> +               {
> +                       len = 0;
> +
> +                       /* Reset the buffer */
> +                       resetStringInfo(astreamer_buf);
> +               }
>
> This comment is problematic for a few reasons. First, we're not
> ignoring the existing data: we're throwing it out. Second, the comment
> doesn't say why we're doing what we're doing, only that we're doing
> it. Here's my guess at the actual explanation -- please correct me if
> I'm wrong: "pg_waldump never reads the same WAL bytes more than once,
> so if we're now being asked for data beyond the end of what we've
> already read, that means none of the data we currently have in the
> buffer will ever be consulted again. So, we can discard the existing
> buffer contents and start over." By the way, if this explanation is
> correct, it might be nice to add an assertion someplace that verifies
> it, like asserting that we're always reading from an LSN greater than
> or equal to (or exactly equal to?) the LSN immediately following the
> last data we read.
>

Updated the comment. The similar assertion exists right before
copying to the readBuff.

>
> Another thing that isn't so nice right now is that
> verify_tar_archive() has to open and close the archive only for
> init_tar_archive_reader() to be called to reopen it again just moments
> later. It would be nicer to open the file just once and then keep it
> open. Here again, I wonder if the separation of duties could be a bit
> cleaner.
>

Prefer to keep those separate, assuming that reopening the file won't
cause any significant harm. Let me know if you think otherwise.

Attached the updated version, kindly have a look.

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Shubham Khanna
Date:
Subject: Re: Add support for specifying tables in pg_createsubscriber.
Next
From: Chao Li
Date:
Subject: Re: Optimize LISTEN/NOTIFY