Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers

From Tom Lane
Subject Re: pg_verifybackup: TAR format backup verification
Date
Msg-id 1466068.1727709896@sss.pgh.pa.us
Whole thread Raw
In response to Re: pg_verifybackup: TAR format backup verification  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_verifybackup: TAR format backup verification
List pgsql-hackers
Robert Haas <robertmhaas@gmail.com> writes:
> On Sat, Sep 28, 2024 at 6:59 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Now, manifest_file.size is in fact a size_t, so %zu is the correct
>> format spec for it.  But astreamer_verify.checksum_bytes is declared
>> uint64.  This leads me to question whether size_t was the correct
>> choice for manifest_file.size.

> It seems that manifest_file.size is size_t because parse_manifest.h is
> using size_t for json_manifest_per_file_callback. What's happening is
> that json_manifest_finalize_file() is parsing the file size
> information out of the manifest. It uses strtoul to do that and
> assigns the result to a size_t. I don't think I had any principled
> reason for making that decision; size_t is, I think, the size of an
> object in memory, and this is not that.

Correct, size_t is defined to measure in-memory object sizes.  It's
the argument type of malloc(), the result type of sizeof(), etc.
It does not restrict the size of disk files.

> This is just a string in a
> JSON file that represents an integer which will hopefully turn out to
> be the size of the file on disk. I guess I don't know what type I
> should be using here. Most things in PostgreSQL use a type like uint32
> or uint64, but technically what we're going to be comparing against in
> the end is probably an off_t produced by stat(), but the return value
> of strtoul() or strtoull() is unsigned long or unsigned long long,
> which is not any of those things. If you have a suggestion here, I'm
> all ears.

I don't know if it's realistic to expect that this code might be used
to process JSON blobs exceeding 4GB.  But if it is, I'd be inclined to
use uint64 and strtoull for these purposes, if only to avoid
cross-platform hazards with varying sizeof(long) and sizeof(size_t).

Um, wait ... we do have strtou64(), so you should use that.

>> Aside from that, I'm unimpressed with expending a five-line comment
>> at line 376 to justify casting control_file_bytes to int,

> I don't know what you mean by this.

I mean that we have a widely-used, better solution.  If you copied
this from someplace else, the someplace else could stand to be
improved too.

            regards, tom lane



pgsql-hackers by date:

Previous
From: jian he
Date:
Subject: Re: Set query_id for query contained in utility statement
Next
From: Tom Lane
Date:
Subject: Re: pg_verifybackup: TAR format backup verification