Re: pg_waldump: support decoding of WAL inside tarfile - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_waldump: support decoding of WAL inside tarfile
Date
Msg-id 2555285.1774131847@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_waldump: support decoding of WAL inside tarfile  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: pg_waldump: support decoding of WAL inside tarfile
List pgsql-hackers
I wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>> I also noticed a possible bug in astreamer, where the decompressor 
>> finalize functions send bbs_buffer.maxlen bytes to the next streamer 
>> when flushing remaining data at end-of-stream. This seems wrong because 
>> the buffer may only be partially filled with valid decompressed data. 
>> Possible patch for that attached. (But I don't think it's related to 
>> these failures).

> Ugh.  That's surely very broken, but how did we not notice?
> Do all the consumers know enough to ignore garbage trailing data?

I poked into this question and found that all of our existing uses
of the decompression astreamers are in front of astreamer_tar.c's
astreamer_tar_parser_ops.  Since a tar file has a trailer (two
zero blocks), it is not surprising that garbage after the trailer
will be ignored.  It might appear that astreamer_tar_parser_content
itself will complain about such extra data:

            case ASTREAMER_ARCHIVE_TRAILER:

                /*
                 * We've seen an end-of-archive indicator, so anything more is
                 * buffered and sent as part of the archive trailer. But we
                 * don't expect more than 2 blocks.
                 */
                astreamer_buffer_bytes(streamer, &data, &len, len);
                if (len > 2 * TAR_BLOCK_SIZE)
                    pg_fatal("tar file trailer exceeds 2 blocks");
                return;

But this code is itself buggy!  The astreamer_buffer_bytes call
is guaranteed to reduce "len" to zero, therefore that pg_fatal
call can never fire, no matter how much garbage we just stuffed
into the astreamer's buffer.

We might want to do

-               if (len > 2 * TAR_BLOCK_SIZE)
+               if (streamer->bbs_buffer.len > 2 * TAR_BLOCK_SIZE)

Unsurprisingly, applying this change to unmodified master results
in the pg_waldump and pg_verifybackup tests falling over.  More
surprisingly, they still fall over after applying your fix to the
decompressors, so there's some other source of garbage trailing
data.  I haven't figured out what.

However, I'm a bit loath to make that change even after fixing the
decompressors, because this discovery suggests that we may currently
be building tar files that contain garbage after the trailer.
And it seems to me that such tar files are perfectly legal anyway.
The POSIX spec for "ustar" format (found under pax(1)) says

    At the end of the archive file there shall be two 512-octet
    logical records filled with binary zeros, interpreted as an
    end-of-archive indicator.

    The logical records may be grouped for physical I/O operations, as
    described under the -bblocksize and -x ustar options. Each group
    of logical records may be written with a single operation
    equivalent to the write() function. On magnetic tape, the result
    of this write shall be a single tape physical block. The last
    physical block shall always be the full size, so logical records
    after the two zero logical records may contain undefined data.

That last sentence seems to say that you can put arbitrary garbage
after the trailer without invalidating the file.  Sure, it's a
holdover from magtape days, but that's how the spec is written.

Therefore, I think we might be best off to just remove this
nonfunctional error check, rather than "fix" it.  Another
idea which might be a bit better is to drop any data after
the standard trailer on the floor, instead of passing it on
as the code now does.

Thoughts?

            regards, tom lane



pgsql-hackers by date:

Previous
From: Jianghua Yang
Date:
Subject: Re: basebackup: add missing deflateEnd() in gzip compression sink
Next
From: Peter Geoghegan
Date:
Subject: Re: index prefetching