Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_verifybackup: TAR format backup verification |
Date | |
Msg-id | CA+TgmoZTJzUyzDkdo6K7TTM2ePP0EMXTwxGhJN1Ymx3qNHvevQ@mail.gmail.com Whole thread Raw |
In response to | Re: pg_verifybackup: TAR format backup verification (Amul Sul <sulamul@gmail.com>) |
List | pgsql-hackers |
On Mon, Aug 12, 2024 at 5:13 AM Amul Sul <sulamul@gmail.com> wrote: > I tried this in the attached version and made a few additional changes > based on Sravan's off-list comments regarding function names and > descriptions. > > Now, verification happens in two passes. The first pass simply > verifies the file names, determines their compression types, and > returns a list of valid tar files whose contents need to be verified > in the second pass. The second pass is called at the end of > verify_backup_directory() after all files in that directory have been > scanned. I named the functions for pass 1 and pass 2 as > verify_tar_file_name() and verify_tar_file_contents(), respectively. > The rest of the code flow is similar as in the previous version. > > In the attached patch set, I abandoned the changes, touching the > progress reporting code of plain backups by dropping the previous 0009 > patch. The new 0009 patch adds missing APIs to simple_list.c to > destroy SimplePtrList. The rest of the patch numbers remain unchanged. I think you've entangled the code paths here for plain-format backup and tar-format backup in a way that is not very nice. I suggest refactoring things so that verify_backup_directory() is only called for plain-format backups, and you have some completely separate function (perhaps verify_tar_backup) that is called for tar-format backups. I don't think verify_backup_file() should be shared between tar-format and plain-format backups either. Let that be just for plain-format backups, and have separate logic for tar-format backups. Right now you've got "if" statements in various places trying to get all the cases correct, but I think you've missed some (and there's also the issue of updating all the comments). For instance, verify_backup_file() recurses into subdirectories, but that behavior is inappropriate for a tar format backup, where subdirectories should instead be treated like stray files: complain that they exist. pg_verify_backup() does this: /* If it's a directory, just recurse. */ if (S_ISDIR(sb.st_mode)) { verify_backup_directory(context, relpath, fullpath); return; } /* If it's not a directory, it should be a plain file. */ if (!S_ISREG(sb.st_mode)) { report_backup_error(context, "\"%s\" is not a file or directory", relpath); return; } For a plain format backup, this whole thing should be just: /* In a tar format backup, we expect only plain files. */ if (!S_ISREG(sb.st_mode)) { report_backup_error(context, "\"%s\" is not a plain file", relpath); return; } Also, immediately above, you do simple_string_list_append(&context->ignore_list, relpath), but that is pointless in the tar-file case, and arguably wrong, if -i is going to ignore both pathnames in the base directory and also pathnames inside the tar files, because we could add something to the ignore list here -- accomplishing nothing useful -- and then that ignore-list entry could cause us to disregard a stray file with the same name present inside one of the tar files -- which is silly. Note that the actual point of this logic is to make sure that if we can't stat() a certain directory, we don't go off and issue a million complaints about all the files in that directory being missing. But this doesn't accomplish that goal for a tar-format backup. For a tar-format backup, you'd want to figure out which files in the manifest we don't expect to see based on this file being inaccessible, and then arrange to suppress future complaints about all of those files. But you can't implement that here, because you haven't parsed the file name yet. That happens later, in verify_tar_file_name(). You could add a whole bunch more if statements here and try to work around these issues, but it seems pretty obviously a dead end to me. Almost the entire function is going to end up being inside of an if-statement. Essentially the only thing in verify_backup_file() that should actually be the same in the plain and tar-format cases is that you should call stat() either way and check whether it throws an error. But that is not enough justification for trying to share the whole function. I find the logic in verify_tar_file_name() to be somewhat tortured as well. The strstr() calls will match those strings anywhere in the filename, not just at the end. But also, even if you fixed that, why work backward from the end of the filename toward the beginning? It would seem like it would more sense to (1) first check if the string starts with "base" and set suffix equal to pathname+4, (2) if not, strtol(pathname, &suffix, 10) and complain if we didn't eat at least one character or got 0 or something too big to be an OID, (3) check whether suffix is .tar, .tar.gz, etc. In verify_member_checksum(), you set mystreamer->verify_checksum = false. That would be correct if there could only ever be one call to verify_member_checksum() per member file, but there is no such rule. There can be (and, I think, normally will be) more than one ASTREAMER_MEMBER_CONTENTS chunk. I'm a little confused as to how this code passes any kind of testing. Also in verify_member_checksum(), the mystreamer->received_bytes < m->size seems strange. I don't think this is the right way to do something when you reach the end of an archive member. The right way to do that is to do it when the ASTREAMER_MEMBER_TRAILER chunk shows up. In verify_member_control_data(), you use astreamer_buffer_untIl(). But that's the same buffer that is being used by verify_member_checksum(), so I don't really understand how you expect this to work. If this code path were ever taken, verify_member_checksum() would see the same data more than once. The call to pg_log_debug() in this function also seems quite random. In a plain-format backup, we'd actually be doing something different for pg_controldata vs. other files, namely reading it during the initial directory scan. But here we're reading the file in exactly the same sense as we're reading every other file, neither more nor less, so why mention this file and not all of the others? And why at this exact spot in the code? I suspect that the report_fatal_error("%s: could not read control file: read %d of %zu", ...) call is unreachable. I agree that you need to handle the case where the control file inside the tar file is not the expected length, and in fact I think you should probably write a TAP test for that exact scenario to make sure it works. I bet this doesn't. Even if it did, the error message makes no sense in context. In the plain-format backup, this error would come from code reading the actual bytes off the disk -- i.e. the complaint about not being able to read the control file would come from the read() system call. Here it doesn't. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: