Re: backup manifests - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: backup manifests |
Date | |
Msg-id | CA+TgmoZn6K4KiBvcAfYOxUwv-t9oot0ZULdPbOG5k8prpEVVZA@mail.gmail.com Whole thread Raw |
In response to | Re: backup manifests (Suraj Kharage <suraj.kharage@enterprisedb.com>) |
Responses |
Re: backup manifests
|
List | pgsql-hackers |
On Tue, Dec 10, 2019 at 6:40 AM Suraj Kharage <suraj.kharage@enterprisedb.com> wrote: > Please find attached patch for backup validator implementation (0004 patch). This patch is based > on Rushabh's latest patch for backup manifest. > > There are some functions required at client side as well, so I have moved those functions > and some data structure at common place so that they can be accessible for both. (0003 patch). > > My colleague Rajkumar Raghuwanshi has prepared the WIP patch (0005) for tap test cases which > is also attached. As of now, test cases related to the tablespace and tar backup format are missing, > will continue work on same and submit the complete patch. > > With this mail, I have attached the complete patch stack for backup manifest and backup > validate implementation. > > Please let me know your thoughts on the same. Well, for the second time on this thread, please don't take a bunch of somebody else's code and post it in a patch that doesn't attribute that person as one of the authors. For the second time on this thread, the person is me, but don't borrow *anyone's* code without proper attribution. It's really important! On a related note, it's a very good idea to use git format-patch and git rebase -i to maintain patch stacks like this. Rushabh seems to have done that, but the files you're posting look like raw 'git diff' output. Notice that this gives him a way to include authorship information and a tentative commit message in each patch, but you don't have any of that. Also on a related note, part of the process of adapting existing code to a new purpose is adapting the comments. You haven't done that: + * Search a result-set hash table for a row matching a given filename. ... + * Insert a row into a result-set hash table, provided no such row is already ... + * Most of the values + * that we're hashing are short integers formatted as text, so there + * shouldn't be much room for pathological input. I think that what we should actually do here is try to use simplehash. Right now, it won't work for frontend code, but I posted some patches to try to address that issue: https://www.postgresql.org/message-id/CA%2BTgmob8oyh02NrZW%3DxCScB%2B5GyJ-jVowE3%2BTWTUmPF%3DFsGWTA%40mail.gmail.com That would have a few advantages. One, we wouldn't need to know the number of elements in advance, because simplehash can grow dynamically. Two, we could use the iteration interfaces to walk the hash table. Your solution to that is pgrhash_seq_search, but that's actually not well-designed, because it's not a generic iterator function but something that knows specifically about the 'touch' flag. I incidentally suggest renaming 'touch' to 'matched;' 'touch' is not bad, but I think 'matched' will be a little more recognizable. Please run pgindent. If needed, first add locally defined types to typedefs.list, so that things indent properly. It's not a crazy idea to try to share some data structures and code between the frontend and the backend here, but I think src/common/backup.c and src/include/common/backup.h is a far too generic name given what the code is actually doing. It's mostly about checksums, not backup, and I think it should be named accordingly. I suggest removing "manifestinfo" and renaming the rest to just talk about checksums rather than manifests. That would make it logical to reuse this for any other future code that needs a configurable checksum type. Also, how about adding a function like: extern bool parse_checksum_algorithm(char *name, ChecksumAlgorithm *algo); ...which would return true and set *algo if name is recognized, and return false otherwise. That code could be used on both the client and server sides of this patch, and by any future patches that want to return this scaffolding. The file header for backup.h has the wrong filename (string.h). The header format looks somewhat atypical compared to what we normally do, too. It's arguable, but I tend to think that it would be better to hex-encode the CRC rather than printing it as an integer. Maybe hex_encode() is another thing that could be moved into the new src/common file. As I said before about Rushabh's patch set, it's very confusing that we have so many patches here stacked up. Like, you have 0002 moving stuff, and then 0003 moving it again. That's super-confusing. Please try to structure the patch set so as to make it as easy to review as possible. Regarding the test case patch, error checks are important! Don't do things like this: +open my $modify_file_sha256, '>>', "$tempdir/backup_verify/postgresql.conf"; +print $modify_file_sha256 "port = 5555\n"; +close $modify_file_sha256; If the open fails, then it and the print and the close are going to silently do nothing. That's bad. I don't know exactly what the customary error-checking is for things like this in TAP tests, but I hope it's not like this, because this has a pretty fair chance of looking like it's testing something that it isn't. Let's figure out what the best practice in this area is and adhere to it. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: