Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers

From Thomas Munro
Subject Re: pg_waldump: support decoding of WAL inside tarfile
Date
Msg-id CA+hUKG+Pqz5=YQG_=8ho0YsTfn2HWOsJQWqS4j0q8QQWweJP9w@mail.gmail.com
Whole thread
In response to Re: pg_waldump: support decoding of WAL inside tarfile  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_waldump: support decoding of WAL inside tarfile
Re: pg_waldump: support decoding of WAL inside tarfile
List pgsql-hackers
On Mon, Mar 30, 2026 at 11:23 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Thomas Munro <thomas.munro@gmail.com> writes:
> > Anyway, given the defaults, GNU tar + ZFS/BTRFS users must be pretty
> > unlikely to hit this in the wild, and the symptom is a confusing error
> > in a maintenance tool, not corruption, so I don't think this is a big
> > deal.  I might still try teaching the astreamer code to understand PAX
> > 1.0 when it sees it in the next cycle though, for the benefit of
> > FreeBSD users.
>
> I agree that this isn't too critical if the effects are confined to
> pg_waldump.  I believe that pg_basebackup and pg_verifybackup also use
> astreamer_tar.c, but it's not clear to me if they'd ever be asked to
> parse files made by tar(1) and not by our own sparseness-ignorant
> tar-writing code.  If they can be, that'd be a higher-priority reason
> to fill in this gap.

I pushed the workaround for the test.

Yeah I can't see any reason why pg_verifybackup --wal-path=foo.tar
won't suffer the same problem in the wild.  Again, it's not the end of
the world because it'll just fail and you'll probably eventually
figure out why.  So perhaps we should just improve our detection of
archives that we can't handle?  Straw man algorithm:

If you can't find $NAME in the archive, then check if PaxHeaders/$NAME
exists, and if so, fail with 'unsupported TAR format for WAL file "%s"
in archive "%s"' instead.  That'd probably work well enough in
practice, because astreamer_tar.c treats PAX extended header
pseudo-files as regular files (they're not, they have type 'x'), and
both GNU and BSD tar happen to use that.

POSIX doesn't require that naming, so it would in theory be more
correct to teach astreamer_tar.c to recognise PAX extended headers and
fish out enough information and link it to the following archive
member, but a simple test to improve error messaging seems like the
right level of effort here.

Here's a test patch that shows the problem on any system with GNU tar
or BSD tar and a file system that supports sparse files.  The test
succeeds because it looks for "error: could not find WAL" but the idea
would be to change it to look for a new error message like that.  My
motivation was to make this reproducible on any system, in case that's
helpful for Amul and Andrew if they're interested in trying to improve
this edge case in time for the release.  Otherwise I'll come back to
it, but probably not in time...

Attachment

pgsql-hackers by date:

Previous
From: Alexander Lakhin
Date:
Subject: Re: More speedups for tuple deformation
Next
From: Chao Li
Date:
Subject: bufmgr: pass through I/O stats context in FlushUnlockedBuffer()