Re: WIP/PoC for parallel backup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: WIP/PoC for parallel backup |
Date | |
Msg-id | CA+TgmobLeEAkxph=bgd4f5wrcSTKdK=2Qq9puccWeuvqtWpLgA@mail.gmail.com Whole thread Raw |
In response to | Re: WIP/PoC for parallel backup (Asif Rehman <asifr.rehman@gmail.com>) |
Responses |
Re: WIP/PoC for parallel backup
|
List | pgsql-hackers |
On Thu, Dec 12, 2019 at 10:20 AM Asif Rehman <asifr.rehman@gmail.com> wrote: > I have updated the patches (v7 attached) and have taken care of all issues pointed by Jeevan, additionally > ran the pgindent on each patch. Furthermore, Command names have been renamed as suggested and I > have simplified the SendFiles function. Client can only request the regular files, any other kind such as > directories or symlinks will be skipped, the client will be responsible for taking care of such. Hi, Patch 0001 of this series conflicts with my recent commit 303640199d0436c5e7acdf50b837a027b5726594; that commit was actually inspired by some previous study of 0001. That being said, I think 0001 has the wrong idea. There's no reason that I can see why it should be correct to remove the PG_ENSURE_ERROR_CLEANUP calls from perform_base_backup(). It's true that if we register a long-lived before_shmem_exit hook, then the backup will get cleaned up even without the PG_ENSURE_ERROR_CLEANUP block, but there's also the question of the warning message. I think that our goal should be to emit the warning message about a backup being stopped too early if the user uses either pg_start_backup() or the new START_BACKUP command and does not end the backup with either pg_stop_backup() or the new STOP_BACKUP command -- but not if a single command that both starts and ends a backup, like BASE_BACKUP, is interrupted. To accomplish that goal in the wake of 303640199d0436c5e7acdf50b837a027b5726594, we need to temporarily register do_pg_abort_backup() as a before_shmem_exit() handler using PG_ENSURE_ERROR_CLEANUP() during commands like BASE_BACKUP() -- and for things like pg_start_backup() or the new START_BACKUP command, we just need to add a single call to register_persistent_abort_backup_handler(). So I think you can drop 0001, and then in the patch that actually introduces START_BACKUP, add the call to register_persistent_abort_backup_handler() before calling do_pg_start_backup(). Also in that patch, also adjust the warning text that do_pg_abort_backup() emits to be more generic e.g. "aborting backup due to backend exiting while a non-exclusive backup is in progress". 0003 creates three new functions, moving code from do_pg_start_backup() to a new function collectTablespaces() and from perform_base_backup() to new functions setup_throttle() and include_wal_files(). I'm skeptical about all of these changes. One general nitpick is that the way these function names are capitalized and punctuated does not seem to have been chosen very consistently; how about name_like_this() throughout? A bit more substantively: - collectTablespaces() is factored out of do_pg_start_backup() so that it can also be used by SendFileList(), but that means that a client is going to invoke START_BACKUP, indirectly calling collectTablespaces(), and then immediately afterward the client is probably going to call SEND_FILE_LIST, which will again call collectTablespaces(). That does not appear to be super-great. For one thing, it's duplicate work, although because SendFileList() is going to pass infotbssize as false, it's not a lot of duplicated work. Also, what happens if the two calls to collectTablespaces() return different answers due to concurrent CREATE/DROP TABLESPACE commands? Maybe it would all work out fine, but it seems like there is at least the possibility of bugs if different parts of the backup have different notions of what tablespaces exist. - setup_throttle() is factored out of perform_base_backup() so that it can be called in StartBackup() and StopBackup() and SendFiles(). This seems extremely odd. Why does it make any sense to give the user an option to activate throttling when *ending* a backup? Why does it make sense to give the user a chance to enable throttling *both* at the startup of a backup *and also* for each individual file. If we're going to support throttling here, it seems like it should be either a backup-level property or a file-level property, not both. - include_wal_files() is factored out of perform_base_backup() so that it can be called by StopBackup(). This seems like a poor design decision. The idea behind the BASE_BACKUP command is that you run that one command, and the server sends you everything. The idea in this new way of doing business is that the client requests the individual files it wants -- except for the WAL files, which are for some reason not requested individually but sent all together as part of the STOP_BACKUP response. It seems like it would be more consistent if the client were to decide which WAL files it needs and request them one by one, just as we do with other files. I think there's a common theme to all of these complaints, which is that you haven't done enough to move things that are the responsibility of the backend in the BASE_BACKUP model to the frontend in this model. I started wondering, for example, whether it might not be better to have the client rather than the server construct the tablespace_map file. After all, the client needs to get the list of files anyway (hence SEND_FILE_LIST) and if it's got that then it knows almost enough to construct the tablespace map. The only additional thing it needs is the full pathname to which the link points. But, it seems that we could fairly easily extend SEND_FILE_LIST to send, for files that are symbolic links, the target of the link, using a new column. Or alternatively, using a separate command, so that instead of just sending a single SEND_FILE_LIST command, the client might first ask for a tablespace list and then might ask for a list of files within each tablespace (e.g. LIST_TABLESPACES, then LIST_FILES <oid> for each tablespace, with 0 for the main tablespace, perhaps). I'm not sure which way is better. Similarly, for throttling, I have a hard time understanding how what you've got here is going to work reasonably. It looks like each client is just going to request whatever MAX_RATE the user specifies, but the result of that will be that the actual transfer rate is probably a multiple of the specified rate, approximately equal to the specified rate times the number of clients. That's probably not what the user wants. You could take the specified rate and divide it by the number of workers, but limiting each of 4 workers to a quarter of the rate will probably lead to a combined rate of less than than the specified rate, because if one worker doesn't use all of the bandwidth to which it's entitled, or even exits earlier than the others, the other workers don't get to go any faster as a result. Another problem is that, in the current approach, throttling applies overall to the entire backup, but in this approach, it is applied separately to each SEND_FILE command. In the current approach, if one file finishes a little faster or slower than anticipated, the next file in the tarball will be sent a little slower or faster to compensate. But in this approach, each SEND_FILES command is throttled separately, so this property is lost. Furthermore, while BASEBACKUP sends data continuously, this approach naturally involves pauses between commands. If files are large, that won't matter much, but if they're small and numerous, it will tend to cause the actual transfer rate to be less than the throttling rate. One potential way to solve this problem is... move it to the client side. Instead of making it the server's job not to send data too fast, make it the client's job not to receive data too fast. Let the server backends write as fast as they want, and on the pg_basebackup side, have the threads coordinate with each other so that they don't read data faster than the configured rate. That's not quite the same thing, though, because the server can get ahead by the size of the client's receive buffers plus whatever data is on the wire. I don't know whether that's a big enough problem to be worth caring about. If it is, then I think we need some server infrastructure to "group throttle" a group of cooperating backends. A general comment about 0004 is that it seems like you've proceeded by taking the code from perform_base_backup() and spreading it across several different functions without, necessarily, as much thought as is needed there. For instance, StartBackup() looks like just the beginning of perform_base_backup(). But, why shouldn't it instead look like pg_start_backup() -- in fact, a simplified version that only handles the non-exclusive backup case? Is the extra stuff it's doing really appropriate? I've already complained about the tablespace-related stuff here and the throttling, but there's more. Setting statrelpath here will probably break if somebody tries to use SEND_FILES without first calling START_BACKUP. Sending the backup_label file here is oddly asymmetric, because that's done by pg_stop_backup(), not pg_start_backup(). And similarly, StopBackup() looks like it's just the end of perform_base_backup(), but that's not pretty strange-looking too. Again, I've already complained about include_wal_files() being part of this, but there's also: + /* ... and pg_control after everything else. */ ...which (1) is an odd thing to say when this is the first thing this particular function is to send and (2) is another example of a sloppy division of labor between client and server; apparently, the client is supposed to know not to request pg_control, because the server is going to send it unsolicited. There's no particular reason to have this a special case. The client could just request it last. And then the server code wouldn't need a special case, and you wouldn't have this odd logic split between the client and the server. Overall, I think this needs a lot more work. The overall idea's not wrong, but there seem to be a very large number of details which, at least to me, do not seem to be correct. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: