Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_waldump: support decoding of WAL inside tarfile
Date
Msg-id 3686764.1775175097@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_waldump: support decoding of WAL inside tarfile  (Thomas Munro <thomas.munro@gmail.com>)
Responses Re: pg_waldump: support decoding of WAL inside tarfile
Re: pg_waldump: support decoding of WAL inside tarfile
List pgsql-hackers
Thomas Munro <thomas.munro@gmail.com> writes:
> On Fri, Apr 3, 2026 at 11:50 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>>> How about using --format=ustar, instead of that sparse control stuff?

>> I did it that way for GNU tar, but did not research whether bsdtar
>> will take that option.  Feel free to hack on ebba64c08 some more.

> This seems to work for both:

> $ tar --format=ustar -c /dev/null  > /dev/null
> tar: Removing leading '/' from member names
> $ gtar --format=ustar -c /dev/null  > /dev/null
> gtar: Removing leading `/' from member names

Cool.  LGTM.

> I think a Windows system could be using either.  BSD tar comes
> pre-installed by Microsoft and people often install GNU tools.  So I
> think we should use File::Spec->devnull() instead of /dev/null, and
> Andrew showed that working.

Agreed.

> Longer term I think we need to tolerate but ignore pax headers.  If I
> understand the spirit of this long evolution, pax archives are
> intended to be acceptable to pre-pax implementations, which implies
> that they can't really change the meaning of the bits of the file
> contents.

I don't buy that.  For example, POSIX specifies these allowed
fields in an extended header:

    linkpath
        The pathname of a link being created to another file, of any
        type, previously archived. This record shall override the
        linkname field in the following ustar header block(s).

    path
        The pathname of the following file(s). This record shall
        override the name and prefix fields in the following header
        block(s).

    size
        The size of the file in octets, expressed as a decimal number
        using digits from the ISO/IEC 646:1991 standard. This record
        shall override the size field in the following header
        block(s).

GNU tar seems to try hard to ensure that a non-pax-aware tar can
extract *something* from a tar file, but it's not guaranteed that the
something contains the right data or is located at the right pathname.
It looks like the goal is to allow post-processing to pick up the
pieces.

In any case, this is all completely moot if we don't write code to
de-sparse a sparse entry: we will not be able to validate WAL data
if the WAL file is missing some pages.  So I see little point in
having code that tolerates pax headers if it doesn't also do that.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Thomas Munro
Date:
Subject: Re: pg_waldump: support decoding of WAL inside tarfile
Next
From: Fujii Masao
Date:
Subject: Re: pgsql: Reduce log level of some logical decoding messages from LOG to D