refactoring basebackup.c - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | refactoring basebackup.c |
Date | |
Msg-id | CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com Whole thread Raw |
Responses |
Re: refactoring basebackup.c
Re: refactoring basebackup.c Re: refactoring basebackup.c |
List | pgsql-hackers |
Hi, I'd like to propose a fairly major refactoring of the server's basebackup.c. The current code isn't horrific or anything, but the base backup mechanism has grown quite a few features over the years and all of the code knows about all of the features. This is going to make it progressively more difficult to add additional features, and I have a few in mind that I'd like to add, as discussed below and also on several other recent threads.[1][2] The attached patch set shows what I have in mind. It needs more work, but I believe that there's enough here for someone to review the overall direction, and even some of the specifics, and hopefully give me some useful feedback. This patch set is built around the idea of creating two new abstractions, a base backup sink -- or bbsink -- and a base backup archiver -- or bbarchiver. Each of these works like a foreign data wrapper or custom scan or TupleTableSlot. That is, there's a table of function pointers that act like method callbacks. Every implementation can allocate a struct of sufficient size for its own bookkeeping data, and the first member of the struct is always the same, and basically holds the data that all implementations must store, including a pointer to the table of function pointers. If we were using C++, bbarchiver and bbsink would be abstract base classes. They represent closely-related concepts, so much so that I initially thought we could get by with just one new abstraction layer. I found on experimentation that this did not work well, so I split it up into two and that worked a lot better. The distinction is this: a bbsink is something to which you can send a bunch of archives -- currently, each would be a tarfile -- and also a backup manifest. A bbarchiver is something to which you send every file in the data directory individually, or at least the ones that are getting backed up, plus any that are being injected into the backup (e.g. the backup_label). Commonly, a bbsink will do something with the data and then forward it to a subsequent bbsink, or a bbarchiver will do something with the data and then forward it to a subsequent bbarchiver or bbsink. For example, there's a bbarchiver_tar object which, like any bbarchiver, sees all the files and their contents as input. The output is a tarfile, which gets send to a bbsink. As things stand in the patch set now, the tar archives are ultimately sent to the "libpq" bbsink, which sends them to the client. In the future, we could have other bbarchivers. For example, we could add "pax", "zip", or "cpio" bbarchiver which produces archives of that format, and any given backup could choose which one to use. Or, we could have a bbarchiver that runs each individual file through a compression algorithm and then forwards the resulting data to a subsequent bbarchiver. That would make it easy to produce a tarfile of individually compressed files, which is one possible way of creating a seekable achive.[3] Likewise, we could have other bbsinks. For example, we could have a "localdisk" bbsink that cause the server to write the backup somewhere in the local filesystem instead of streaming it out over libpq. Or, we could have an "s3" bbsink that writes the archives to S3. We could also have bbsinks that compresses the input archives using some compressor (e.g. lz4, zstd, bzip2, ...) and forward the resulting compressed archives to the next bbsink in the chain. I'm not trying to pass judgement on whether any of these particular things are things we want to do, nor am I saying that this patch set solves all the problems with doing them. However, I believe it will make such things a whole lot easier to implement, because all of the knowledge about whatever new functionality is being added is centralized in one place, rather than being spread across the entirety of basebackup.c. As an example of this, look at how 0010 changes basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer knows anything that is tar-specific, whereas right now it knows about tar-specific things in many places. Here's an overview of this patch set: 0001-0003 are cleanup patches that I have posted for review on separate threads.[4][5] They are included here to make it easy to apply this whole series if someone wishes to do so. 0004 is a minor refactoring that reduces by 1 the number of functions in basebackup.c that know about the specifics of tarfiles. It is just a preparatory patch and probably not very interesting. 0005 invents the bbsink abstraction. 0006 creates basebackup_libpq.c and moves all code that knows about the details of sending archives via libpq there. The functionality is exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq. 0007 creates basebackup_throttle.c and moves all code that knows about throttling backups there. The functionality is exposed for use by basebackup.c as a new type of bbsink, bbsink_throttle. This means that the throttling logic could be reused to throttle output to any final destination. Essentially, this is a bbsink that just passes everything it gets through to the next bbsink, but with a rate limit. If throttling's not enabled, no bbsink_throttle object is created, so all of the throttling code is completely out of the execution pipeline. 0008 creates basebackup_progress.c and moves all code that knows about progress reporting there. The functionality is exposed for use by basebackup.c as a new type of bbsink, bbsink_progress. Since the abstraction doesn't fit perfectly in this case, some extra functions are added to work around the problem. This is not entirely elegant, but I don't think it's still an improvement over what we have now, and I don't have a better idea. 0009 invents the bbarchiver abstraction. 0010 invents two new bbarchivers, a tar bbarchiver and a tarsize bbarchiver, and refactors basebackup.c to make use of them. The tar bbarchiver puts the files it sees into tar archives and forwards the resulting archives to a bbsink. The tarsize bbarchiver is used to support the PROGRESS option to the BASE_BACKUP command. It just estimates the size of the backup by summing up the file sizes without reading them. This approach is good for a couple of reasons. First, without something like this, it's impossible to keep basebackup.c from knowing something about the tar format, because the PROGRESS option doesn't just figure out how big the files to be backed up are: it figures out how big it thinks the archives will be, and that involves tar-specific considerations. This area needs more work, as the whole idea of measuring progress by estimating the archive size is going to break down as soon as server-side compression is in the picture. Second, this makes the code path that we use for figuring out the backup size details much more similar to the path we use for performing the actual backup. For instance, with this patch, we include the exact same files in the calculation that we will include in the backup, and in the same order, something that's not true today. The basebackup_tar.c file added by this patch is sadly lacking in comments, which I will add in a future version of the patch set. I think, though, that it will not be too unclear what's going on here. 0011 invents another new kind of bbarchiver. This bbarchiver just eavesdrops on the stream of files to facilitate backup manifest construction, and then forwards everything through to a subsequent bbarchiver. Like bbsink_throttle, it can be entirely omitted if not used. This patch is a bit clunky at the moment and needs some polish, but it is another demonstration of how these abstractions can be used to simplify basebackup.c, so that basebackup.c only has to worry about determining what should be backed up and not have to worry much about all the specific things that need to be done as part of that. Although this patch set adds quite a bit of code on net, it makes basebackup.c considerably smaller and simpler, removing more than 400 lines of code from that file, about 20% of the current total. There are some gratifying changes vs. the status quo. For example, in master, we have this: sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces, bool sendtblspclinks, backup_manifest_info *manifest, const char *spcoid) Notably, the sizeonly flag makes the function not do what the name of the function suggests that it does. Also, we've got to pass some extra fields through to enable specific features. With the patch set, the equivalent function looks like this: archive_directory(bbarchiver *archiver, const char *path, int basepathlen, List *tablespaces, bool sendtblspclinks) The question "what should I do with the directories and files we find as we recurse?" is now answered by the choice of which bbarchiver to pass to the function, rather than by the values of sizeonly, manifest, and spcoid. That's not night and day, but I think it's better, especially as you imagine adding more features in the future. The really important part, for me, is that you can make the bbarchiver do anything you like without needing to make any more changes to this function. It just arranges to invoke your callbacks. You take it from there. One pretty major question that this patch set doesn't address is what the user interface for any of the hypothetical features mentioned above ought to look like, or how basebackup.c ought to support them. The syntax for the BASE_BACKUP command, like the contents of basebackup.c, has grown organically, and doesn't seem to be very scalable. Also, the wire protocol - a series of CopyData results which the client is entirely responsible for knowing how to interpret and about which the server provides only minimal information - doesn't much lend itself to extensibility. Some careful design work is likely needed in both areas, and this patch does not try to do any of it. I am quite interested in discussing those questions, but I felt that they weren't the most important problems to solve first. What do you all think? Thanks, -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company [1] http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com [2] http://postgr.es/m/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com [3] http://postgr.es/m/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com and following [4] http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com [5] http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com
pgsql-hackers by date: