Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers

From Amul Sul
Subject Re: pg_verifybackup: TAR format backup verification
Date
Msg-id CAAJ_b95rXAsyuetre9Wc-W6uHW-=ivVj56F8-w4gfrTGsnRRSA@mail.gmail.com
Whole thread Raw
In response to Re: pg_verifybackup: TAR format backup verification  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_verifybackup: TAR format backup verification
List pgsql-hackers
On Wed, Aug 7, 2024 at 9:12 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> [ I committed 0001, then noticed I had a type in the subject line of
> the commit message. Argh. ]
>
> On Wed, Aug 7, 2024 at 9:41 AM Amul Sul <sulamul@gmail.com> wrote:
> > With the patch, I am concerned that we won't be able to give an
> > accurate progress report as before. We add all the file sizes in the
> > backup manifest to the total_size without checking if they exist on
> > disk. Therefore, sometimes the reported progress completion might not
> > show 100% when we encounter files where m->bad or m->match == false at
> > a later stage. However, I think this should be acceptable since there
> > will be an error for the respective missing or bad file, and it can be
> > understood that verification is complete even if the progress isn't
> > 100% in that case. Thoughts/Comments?
>
> When somebody says that something is a refactoring commit, my
> assumption is that there should be no behavior change. If the behavior
> is changing, it's not purely a refactoring, and it shouldn't be
> labelled as a refactoring (or at least there should be a prominent
> disclaimer identifying whatever behavior has changed, if a small
> change was deemed acceptable and unavoidable).
>

I agree; I'll be more careful next time.

> I am very reluctant to accept a functional regression of the type that
> you describe here (or the type that I postulated might occur, even if
> I was wrong and it doesn't). The point here is that we're trying to
> reuse the code, and I support that goal, because code reuse is good.
> But it's not such a good thing that we should do it if it has negative
> consequences. We should either figure out some other way of
> refactoring it that doesn't have those negative side-effects, or we
> should leave the existing code alone and have separate code for the
> new stuff we want to do.
>
> I do realize that the type of side effect you describe here is quite
> minor. I could live with it if it were unavoidable. But I really don't
> see why we can't avoid it.
>

The main issue I have is computing the total_size of valid files that
will be checksummed and that exist in both the manifests and the
backup, in the case of a tar backup. This cannot be done in the same
way as with a plain backup.

Another consideration is that, in the case of a tar backup, since
we're reading all tar files from the backup rather than individual
files to be checksummed, should we consider total_size as the sum of
all valid tar files in the backup, regardless of checksum
verification? If so, we would need to perform an initial pass to
calculate the total_size in the directory, similar to what
verify_backup_directory() does., but doing so I am a bit uncomfortable
and unsure if we should do that pass.

An alternative idea is to provide progress reports per file instead of
for the entire backup directory. We could report the size of each file
and keep track of done_size as we read each tar header and content.
For example, the report could be:

109032/109032 kB (100%) base.tar file verified
123444/123444 kB (100%) 16245.tar file verified
23478/23478 kB (100%) 16246.tar file verified
.....
<total_done_size>/<total_size>  (NNN%) verified.

I think this type of reporting can be implemented with minimal
changes, abd for plain backups, we can keep the reporting as it is.
Thoughts?

Regards,
Amul



pgsql-hackers by date:

Previous
From: Melanie Plageman
Date:
Subject: Re: Add LSN <-> time conversion functionality
Next
From: Robert Haas
Date:
Subject: Re: Add LSN <-> time conversion functionality