Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_waldump: support decoding of WAL inside tarfile
Date
Msg-id 2609460.1774153487@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_waldump: support decoding of WAL inside tarfile  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_waldump: support decoding of WAL inside tarfile
List pgsql-hackers
I wrote:
> Unsurprisingly, applying this change to unmodified master results
> in the pg_waldump and pg_verifybackup tests falling over.  More
> surprisingly, they still fall over after applying your fix to the
> decompressors, so there's some other source of garbage trailing
> data.  I haven't figured out what.

In the learn-something-new-every-day dept.: good ol' GNU tar itself
does that.  By default, it zero-pads its output to a multiple of 10kB
after it's written the required terminator.  Moreover, this behavior
is actually specified by POSIX:

  -x format
    Specify the output archive format. The pax utility shall support
    the following formats:
    ...
    ustar
      The tar interchange format; see the EXTENDED DESCRIPTION
      section. The default blocksize for this format for character
      special archive files shall be 10240. Implementations shall
      support all blocksize values less than or equal to 32256 that
      are multiples of 512.

So, astreamer_tar_parser_content's idea that it should disallow more
than 1024 bytes of trailer is completely wrong, which we would have
figured out long ago if the code attempting to enforce that weren't
completely broken.

You could argue that this means the tar files our existing utilities
create aren't POSIX-compliant.  I think it's all right though: we
can just say that we write these files with blocksize 1024 not
blocksize 10240, and tar-file readers are required to accept that
per the above spec text.

However, this discourages me from editorializing on the file trailer
emitted by whatever wrote the tar file we are reading.  I think
emitting it as-is is the most appropriate thing.  So we should just
get rid of astreamer_tar_parser_content's nonfunctional error check
and not change its behavior otherwise.

            regards, tom lane



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: Re: Use CASEFOLD() internally rather than LOWER()
Next
From: John Naylor
Date:
Subject: Re: vectorized CRC on ARM64