Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers

From Amul Sul
Subject Re: pg_verifybackup: TAR format backup verification
Date
Msg-id CAAJ_b95mcGjkfAf1qduOR97CokW8-_i-dWLm3v6x1w2-OW9M+A@mail.gmail.com
Whole thread Raw
In response to Re: pg_verifybackup: TAR format backup verification  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: pg_verifybackup: TAR format backup verification
List pgsql-hackers
On Wed, Aug 7, 2024 at 11:28 PM Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Wed, Aug 7, 2024 at 1:05 PM Amul Sul <sulamul@gmail.com> wrote:
> > The main issue I have is computing the total_size of valid files that
> > will be checksummed and that exist in both the manifests and the
> > backup, in the case of a tar backup. This cannot be done in the same
> > way as with a plain backup.
>
> I think you should compute and sum the sizes of the tar files
> themselves. Suppose you readdir(), make a list of files that look
> relevant, and stat() each one. total_size is the sum of the file
> sizes. Then you work your way through the list of files and read each
> one. done_size is the total size of all files you've read completely
> plus the number of bytes you've read from the current file so far.
>

I tried this in the attached version and made a few additional changes
based on Sravan's off-list comments regarding function names and
descriptions.

Now, verification happens in two passes. The first pass simply
verifies the file names, determines their compression types, and
returns a list of valid tar files whose contents need to be verified
in the second pass. The second pass is called at the end of
verify_backup_directory() after all files in that directory have been
scanned. I named the functions for pass 1 and pass 2 as
verify_tar_file_name() and verify_tar_file_contents(), respectively.
The rest of the code flow is similar as in the previous version.

In the attached patch set, I abandoned the changes, touching the
progress reporting code of plain backups by dropping the previous 0009
patch. The new 0009 patch adds missing APIs to simple_list.c to
destroy SimplePtrList. The rest of the patch numbers remain unchanged.

Regards,
Amul

Attachment

pgsql-hackers by date:

Previous
From: Ashutosh Bapat
Date:
Subject: Re: A problem about partitionwise join
Next
From: Melih Mutlu
Date:
Subject: Re: Do we still need parent column in pg_backend_memory_context?