Re: pg_verifybackup: TAR format backup verification - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: pg_verifybackup: TAR format backup verification |
Date | |
Msg-id | CA+Tgmobmi7qiu8HDQECwuy-JrAH0zcYbzGnhnKjurqv44DHtxg@mail.gmail.com Whole thread Raw |
In response to | Re: pg_verifybackup: TAR format backup verification (Amul Sul <sulamul@gmail.com>) |
Responses |
Re: pg_verifybackup: TAR format backup verification
Re: pg_verifybackup: TAR format backup verification |
List | pgsql-hackers |
On Wed, Jul 31, 2024 at 9:28 AM Amul Sul <sulamul@gmail.com> wrote: > Fixed -- I did that because it was part of a separate group in pg_basebackup. Well, that's because pg_basebackup builds multiple executables, and these files needed to be linked with some but not others. It looks like when Andres added meson support, instead of linking each object file into the binaries that need it, he had it just build a static library and link every executable to that. That puts the linker in charge of sorting out which binaries need which files, instead of having the makefile do it. In any case, this consideration doesn't apply when we're putting the object files into a library, so there was no need to preserve the separate makefile variable. I think this looks good now. > Fixed -- frontend_common_code now includes lz4 as well. Cool. 0003 overall looks good to me now, unless Andres wants to object. > Noted. I might give it a try another day, unless someone else beats > me, perhaps in a separate thread. Probably not too important, since nobody has complained. > Done -- added a new patch as 0004, and the subsequent patch numbers > have been incremented accordingly. I think I would have made this pass context->show_progress to progress_report() instead of the whole verifier_context, but that's an arguable stylistic choice, so I'll defer to you if you prefer it the way you have it. Other than that, this LGTM. However, what is now 0005 does something rather evil. The commit message claims that it's just rearranging code, and that's almost entirely true, except that it also changes manifest_file's pathname member to be char * instead of const char *. I do not think that is a good idea, and I definitely do not think you should do it in a patch that purports to just be doing code movement, and I even more definitely think that you should not do it without even mentioning that you did it, and why you did it. > Fixed -- I did the NULL check in the earlier 0007 patch, but it should > have been done in this patch. This is now 0006. struct stat's st_size is of type off_t -- or maybe ssize_t on some platforms? - not type size_t. I suggest making the filesize argument use int64 as we do in some other places. size_t is, I believe, defined to be the right width to hold the size of an object in memory, not the size of a file on disk, so it isn't really relevant here. Other than that, my only comment on this patch is that I think I would find it more natural to write the check in verify_backup_file() in a different order: I'd put context->manifest->version != 1 && m != NULL && m->matched && !m->bad && strcmp() because (a) that way the most expensive test is last and (b) it feels weird to think about whether we have the right pathname if we don't even have a valid manifest entry. But this is minor and just a stylistic preference, so it's also OK as you have it if you prefer. > I agree, changing the order of errors could create confusion. > Previously, a file size mismatch was a clear and appropriate error > that was reported before the checksum failure error. In my opinion, this patch (currently 0007) creates a rather confusing situation that I can't easily reason about. Post-patch, verify_content_checksum() is a mix of two different things: it ends up containing all of the logic that needs to be performed on every chunk of bytes read from the file plus some but not all of the end-of-file error-checks from verify_file_checksum(). That's really weird. I'm not very convinced that the test for whether we've reached the end of the file is 100% correct, but even if it is, the stuff before that point is stuff that is supposed to happen many times and the stuff after that is only supposed to happen once, and I don't see any good reason to smush those two categories of things into a single function. Plus, changing the order in which those end-of-file checks happen doesn't seem like the right idea either: the current ordering is good the way it is. Maybe you want to think of refactoring to create TWO new functions, one to do the per-hunk work and a second to do the end-of-file "is the checksum OK?" stuff, or maybe you can just open code it, but I'm not willing to commit this the way it is. Regarding 0008, I don't really see a reason why the m != NULL shouldn't also move inside should_verify_control_data(). Yeah, the caller added in 0010 might not need the check, but it won't really cost anything. Also, it seems to me that the logic in 0010 is actually wrong. If m == NULL, we'll keep the values of verifyChecksums and verifyControlData from the previous iteration, whereas what we should do is make them both false. How about removing the if m == NULL guard here and making both should_verify_checksum() and should_verify_control_data() test m != NULL internally? Then it all works out nicely, I think. Or alternatively you need an else clause that resets both values to false when m == NULL. > Okay, I added the verify_checksums() and verify_controldata() > functions to the astreamer_verify.c file. I also updated related > variables that were clashing with these function names: > verify_checksums has been renamed to verifyChecksums, and verify_sysid > has been renamed to verifyControlData. Maybe think of doing something with the ASTREAMER_MEMBER_HEADER case also. Out of time for today, will look again soon. I think the first few of these are probably pretty much ready for commit already, and with a little more adjustment they'll probably be ready up through about 0006. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: