Thread: recovery modules
I've attached a patch set that adds the restore_library, archive_cleanup_library, and recovery_end_library parameters to allow archive recovery via loadable modules. This is a follow-up to the archive_library parameter added in v15 [0] [1]. The motivation behind this change is similar to that of archive_library (e.g., robustness, performance). The recovery functions are provided via a similar interface to archive modules (i.e., an initialization function that returns the function pointers). Also, I've extended basic_archive to work as a restore_library, which makes it easy to demonstrate both archiving and recovery via a loadable module in a TAP test. A few miscellaneous design notes: * Unlike archive modules, recovery libraries cannot be changed at runtime. There isn't a safe way to unload a library, and archive libraries work around this restriction by restarting the archiver process. Since recovery libraries are loaded via the startup and checkpointer processes (which cannot be trivially restarted like the archiver), the same workaround is not feasible. * pg_rewind uses restore_command, but there isn't a straightforward path to support restore_library. I haven't addressed this in the attached patches, but perhaps this is a reason to allow specifying both restore_command and restore_library at the same time. pg_rewind would use restore_command, and the server would use restore_library. * I've combined the documentation to create one "Archive and Recovery Modules" chapter. They are similar enough that it felt silly to write a separate chapter for recovery modules. However, I've still split them up within the chapter, and they have separate initialization functions. This retains backward compatibility with v15 archive modules, keeps them logically separate, and hopefully hints at the functional differences. Even so, if you want to create one library for both archive and recovery, there is nothing stopping you. * Unlike archive modules, I didn't add any sort of "check" or "shutdown" callbacks. The recovery_end_library parameter makes a "shutdown" callback largely redundant, and I couldn't think of any use-case for a "check" callback. However, new callbacks could be added in the future if needed. * Unlike archive modules, restore_library and recovery_end_library may be loaded in single-user mode. I believe this works out-of-the-box, but it's an extra thing to be cognizant of. * If both the library and command parameter for a recovery action is specified, the server should fail to startup, but if a misconfiguration is detected after SIGHUP, we emit a WARNING and continue using the library. I originally thought about emitting an ERROR like the archiver does in this case, but failing recovery and stopping the server felt a bit too harsh. I'm curious what folks think about this. * Іt could be nice to rewrite pg_archivecleanup for use as an archive_cleanup_library, but I don't think that needs to be a part of this patch set. [0] https://postgr.es/m/668D2428-F73B-475E-87AE-F89D67942270%40amazon.com [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5ef1eef -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote: > I've attached a patch set that adds the restore_library, > archive_cleanup_library, and recovery_end_library parameters to allow > archive recovery via loadable modules. This is a follow-up to the > archive_library parameter added in v15 [0] [1]. Why do we need N parameters for this? To me it seems more sensible to have one parameter that then allows a library to implement all these (potentially optionally). > * Unlike archive modules, recovery libraries cannot be changed at runtime. > There isn't a safe way to unload a library, and archive libraries work > around this restriction by restarting the archiver process. Since recovery > libraries are loaded via the startup and checkpointer processes (which > cannot be trivially restarted like the archiver), the same workaround is > not feasible. I don't think that's a convincing reason to not support configuration changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded library is cheap. All that's needed is to redirect the relevant function calls. > * pg_rewind uses restore_command, but there isn't a straightforward path to > support restore_library. I haven't addressed this in the attached patches, > but perhaps this is a reason to allow specifying both restore_command and > restore_library at the same time. pg_rewind would use restore_command, and > the server would use restore_library. That seems problematic, leading to situations where one might not be able to use restore_command anymore, because it's not feasible to do segment-by-segment restoration. Greetings, Andres Freund
On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote: > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote: >> I've attached a patch set that adds the restore_library, >> archive_cleanup_library, and recovery_end_library parameters to allow >> archive recovery via loadable modules. This is a follow-up to the >> archive_library parameter added in v15 [0] [1]. > > Why do we need N parameters for this? To me it seems more sensible to have one > parameter that then allows a library to implement all these (potentially > optionally). The main reason is flexibility. Separate parameters allow using a library for one thing and a command for another, or different libraries for different things. If that isn't a use-case we wish to support, I don't mind combining all three into a single recovery_library parameter. >> * Unlike archive modules, recovery libraries cannot be changed at runtime. >> There isn't a safe way to unload a library, and archive libraries work >> around this restriction by restarting the archiver process. Since recovery >> libraries are loaded via the startup and checkpointer processes (which >> cannot be trivially restarted like the archiver), the same workaround is >> not feasible. > > I don't think that's a convincing reason to not support configuration > changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded > library is cheap. All that's needed is to redirect the relevant function > calls. This might leave some stuff around (e.g., GUCs, background workers), but if that isn't a concern, I can adjust it to work as you describe. >> * pg_rewind uses restore_command, but there isn't a straightforward path to >> support restore_library. I haven't addressed this in the attached patches, >> but perhaps this is a reason to allow specifying both restore_command and >> restore_library at the same time. pg_rewind would use restore_command, and >> the server would use restore_library. > > That seems problematic, leading to situations where one might not be able to > use restore_command anymore, because it's not feasible to do > segment-by-segment restoration. I'm not following why this would make segment-by-segment restoration infeasible. Would you mind elaborating? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2022-12-27 14:37:11 -0800, Nathan Bossart wrote: > On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote: > > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote: > >> I've attached a patch set that adds the restore_library, > >> archive_cleanup_library, and recovery_end_library parameters to allow > >> archive recovery via loadable modules. This is a follow-up to the > >> archive_library parameter added in v15 [0] [1]. > > > > Why do we need N parameters for this? To me it seems more sensible to have one > > parameter that then allows a library to implement all these (potentially > > optionally). > > The main reason is flexibility. Separate parameters allow using a library > for one thing and a command for another, or different libraries for > different things. If that isn't a use-case we wish to support, I don't > mind combining all three into a single recovery_library parameter. I think the configuration complexity is a sufficient concern to not go that direction... > >> * Unlike archive modules, recovery libraries cannot be changed at runtime. > >> There isn't a safe way to unload a library, and archive libraries work > >> around this restriction by restarting the archiver process. Since recovery > >> libraries are loaded via the startup and checkpointer processes (which > >> cannot be trivially restarted like the archiver), the same workaround is > >> not feasible. > > > > I don't think that's a convincing reason to not support configuration > > changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded > > library is cheap. All that's needed is to redirect the relevant function > > calls. > > This might leave some stuff around (e.g., GUCs, background workers), but if > that isn't a concern, I can adjust it to work as you describe. You can still have a shutdown hook re background workers. I don't think the GUCs matter, given that it's the startup/checkpointer processes. > >> * pg_rewind uses restore_command, but there isn't a straightforward path to > >> support restore_library. I haven't addressed this in the attached patches, > >> but perhaps this is a reason to allow specifying both restore_command and > >> restore_library at the same time. pg_rewind would use restore_command, and > >> the server would use restore_library. > > > > That seems problematic, leading to situations where one might not be able to > > use restore_command anymore, because it's not feasible to do > > segment-by-segment restoration. > > I'm not following why this would make segment-by-segment restoration > infeasible. Would you mind elaborating? Latency effects for example can make it infeasible to do segment-by-segment restoration infeasible performance wise. On the most extreme end, imagine WAL archived to tape or such. Greetings, Andres Freund
On Tue, Dec 27, 2022 at 02:45:30PM -0800, Andres Freund wrote: > On 2022-12-27 14:37:11 -0800, Nathan Bossart wrote: >> On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote: >> > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote: >> >> * pg_rewind uses restore_command, but there isn't a straightforward path to >> >> support restore_library. I haven't addressed this in the attached patches, >> >> but perhaps this is a reason to allow specifying both restore_command and >> >> restore_library at the same time. pg_rewind would use restore_command, and >> >> the server would use restore_library. >> > >> > That seems problematic, leading to situations where one might not be able to >> > use restore_command anymore, because it's not feasible to do >> > segment-by-segment restoration. >> >> I'm not following why this would make segment-by-segment restoration >> infeasible. Would you mind elaborating? > > Latency effects for example can make it infeasible to do segment-by-segment > restoration infeasible performance wise. On the most extreme end, imagine WAL > archived to tape or such. I'm sorry, I'm still lost here. Wouldn't restoration via library tend to improve latency? Is your point that clusters may end up depending on this improvement so much that a shell command would no longer be able to keep up? I might be creating a straw man, but this seems like less of a concern for pg_rewind since it isn't meant for continuous, ongoing restoration. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote: > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote: >> * Unlike archive modules, recovery libraries cannot be changed at runtime. >> There isn't a safe way to unload a library, and archive libraries work >> around this restriction by restarting the archiver process. Since recovery >> libraries are loaded via the startup and checkpointer processes (which >> cannot be trivially restarted like the archiver), the same workaround is >> not feasible. > > I don't think that's a convincing reason to not support configuration > changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded > library is cheap. All that's needed is to redirect the relevant function > calls. Agreed. That seems worth the cost to switching this stuff to be SIGHUP-able. >> * pg_rewind uses restore_command, but there isn't a straightforward path to >> support restore_library. I haven't addressed this in the attached patches, >> but perhaps this is a reason to allow specifying both restore_command and >> restore_library at the same time. pg_rewind would use restore_command, and >> the server would use restore_library. > > That seems problematic, leading to situations where one might not be able to > use restore_command anymore, because it's not feasible to do > segment-by-segment restoration. Do you mean that supporting restore_library in pg_rewind is a hard requirement? I fail to see why this should be the case. Note that having the possibility to do dlopen() calls in the frontend (aka libpq) for loading some callbacks is something I've been looking for in the past. Having consistency in terms of restrictions between library and command like for archives would make sense I guess (FWIW, I mentioned on the thread of d627ce3 that we'd better not put any restrictions for the archives). -- Michael
Attachment
Hi, On 2022-12-27 15:04:28 -0800, Nathan Bossart wrote: > I'm sorry, I'm still lost here. Wouldn't restoration via library tend to > improve latency? Is your point that clusters may end up depending on this > improvement so much that a shell command would no longer be able to keep > up? Yes. > I might be creating a straw man, but this seems like less of a concern > for pg_rewind since it isn't meant for continuous, ongoing restoration. pg_rewind is in the critical path of a bunch of HA scenarios, so I wouldn't say that restore performance isn't important... Greetings, Andres Freund
Here is a new patch set with the following changes: * The restore_library, archive_cleanup_library, and recovery_end_library parameters are merged into one parameter. * restore_library can now be changed via SIGHUP. To provide a way for modules to clean up when their callbacks are unloaded, I've introduced an optional shutdown callback. * Parameter misconfigurations are now always ERRORs. I'm less confident that we can get by with just a WARNING now that restore_library can be changed via SIGHUP, and this makes things more consistent with archive_library/command. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Here is a rebased patch set for cfbot. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Jan 03, 2023 at 09:59:17AM -0800, Nathan Bossart wrote: > Here is a rebased patch set for cfbot. I noticed that cfbot's Windows tests are failing because the backslashes in the archive directory path are causing escaping problems. Here is an attempt to fix that by converting all backslashes to forward slashes, which is what other tests (e.g., 025_stuck_on_old_timeline.pl) do. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Jan 03, 2023 at 11:05:38AM -0800, Nathan Bossart wrote: > I noticed that cfbot's Windows tests are failing because the backslashes in > the archive directory path are causing escaping problems. Here is an > attempt to fix that by converting all backslashes to forward slashes, which > is what other tests (e.g., 025_stuck_on_old_timeline.pl) do. + GetOldestRestartPoint(&restartRedoPtr, &restartTli); + XLByteToSeg(restartRedoPtr, restartSegNo, wal_segment_size); + XLogFileName(lastRestartPointFname, restartTli, restartSegNo, + wal_segment_size); + + shell_archive_cleanup(lastRestartPointFname); Hmm. Is passing down the file name used as a cutoff point the best interface for the modules? Perhaps passing down the redo LSN and its TLI would be a cleaner approach in terms of flexibility? I agree with letting the startup enforce these numbers as that can be easy to mess up for plugin authors, leading to critical problems. The same code pattern is repeated twice for the end-of-recovery callback and the cleanup commands when it comes to building the file name. Not critical, still not really nice. MODULES = basic_archive -PGFILEDESC = "basic_archive - basic archive module" +PGFILEDESC = "basic_archive - basic archive and recovery module" "basic_archive" does not reflect what this module does. Using one library simplifies the whole configuration picture and the tests, so perhaps something like basic_wal_module, or something like that, would be better long-term? -- Michael
Attachment
Thanks for taking a look. On Wed, Jan 11, 2023 at 04:53:39PM +0900, Michael Paquier wrote: > Hmm. Is passing down the file name used as a cutoff point the best > interface for the modules? Perhaps passing down the redo LSN and its > TLI would be a cleaner approach in terms of flexibility? I agree with > letting the startup enforce these numbers as that can be easy to mess > up for plugin authors, leading to critical problems. I'm having trouble thinking of any practical advantage of providing the redo LSN and TLI. If the main use-case is removing older archives as the documentation indicates, it seems better to provide the file name so that you can plug it straight into strcmp() to determine whether the file can be removed (i.e., what pg_archivecleanup does). If we provided the LSN and TLI instead, you'd either need to convert that into a WAL file name for strcmp(), or you'd need to convert the candidate file name into an LSN and TLI and compare against those. > "basic_archive" does not reflect what this module does. Using one > library simplifies the whole configuration picture and the tests, so > perhaps something like basic_wal_module, or something like that, would > be better long-term? I initially created a separate basic_restore module, but decided to fold it into basic_archive to simplify the patch and tests. I hesitated to rename it because it already exists in v15, and since it deals with creating and restoring archive files, the name still seemed somewhat accurate. That being said, I don't mind renaming it if that's what folks want. I've attached a new patch set that is rebased over c96de2c. There are no other changes. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Jan 11, 2023 at 11:29:01AM -0800, Nathan Bossart wrote: > I'm having trouble thinking of any practical advantage of providing the > redo LSN and TLI. If the main use-case is removing older archives as the > documentation indicates, it seems better to provide the file name so that > you can plug it straight into strcmp() to determine whether the file can be > removed (i.e., what pg_archivecleanup does). If we provided the LSN and > TLI instead, you'd either need to convert that into a WAL file name for > strcmp(), or you'd need to convert the candidate file name into an LSN and > TLI and compare against those. Logging was one thing that came immediately in mind, to let the module know the redo LSN and TLI the segment name was built from without having to recalculate it back. If you don't feel strongly about that, I am fine to discard this remark. It is not like this hook should be set in stone across major releases, in any case. > I initially created a separate basic_restore module, but decided to fold it > into basic_archive to simplify the patch and tests. I hesitated to rename > it because it already exists in v15, and since it deals with creating and > restoring archive files, the name still seemed somewhat accurate. That > being said, I don't mind renaming it if that's what folks want. I've done that in the past for pg_verify_checksums -> pg_checksums, so I would not mind renaming it so as it reflects better its role. (Being outvoted is fine for me if this suggestion sounds bad). Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with the duplication for the segment name build), so I would be tempted to get this one done. My gut tells me that we'd better remove the duplication and just pass down the two fields to shell_archive_cleanup() and shell_recovery_end(), with the segment name given to ExecuteRecoveryCommand().. -- Michael
Attachment
On Thu, Jan 12, 2023 at 03:30:40PM +0900, Michael Paquier wrote: > On Wed, Jan 11, 2023 at 11:29:01AM -0800, Nathan Bossart wrote: >> I initially created a separate basic_restore module, but decided to fold it >> into basic_archive to simplify the patch and tests. I hesitated to rename >> it because it already exists in v15, and since it deals with creating and >> restoring archive files, the name still seemed somewhat accurate. That >> being said, I don't mind renaming it if that's what folks want. > > I've done that in the past for pg_verify_checksums -> pg_checksums, so > I would not mind renaming it so as it reflects better its role. > (Being outvoted is fine for me if this suggestion sounds bad). IMHO I don't think there's an urgent need to rename it, but if there's a better name that people like, I'm happy to do so. > Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with > the duplication for the segment name build), so I would be tempted to > get this one done. My gut tells me that we'd better remove the > duplication and just pass down the two fields to > shell_archive_cleanup() and shell_recovery_end(), with the segment > name given to ExecuteRecoveryCommand().. I moved the duplicated logic to its own function in v6. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Jan 12, 2023 at 10:17:21AM -0800, Nathan Bossart wrote: > On Thu, Jan 12, 2023 at 03:30:40PM +0900, Michael Paquier wrote: > IMHO I don't think there's an urgent need to rename it, but if there's a > better name that people like, I'm happy to do so. Okay. >> Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with >> the duplication for the segment name build), so I would be tempted to >> get this one done. My gut tells me that we'd better remove the >> duplication and just pass down the two fields to >> shell_archive_cleanup() and shell_recovery_end(), with the segment >> name given to ExecuteRecoveryCommand().. > > I moved the duplicated logic to its own function in v6. While looking at 0001, I have noticed one issue as of the following block in shell_restore(): + if (wait_result_is_signal(rc, SIGTERM)) + proc_exit(1); + + ereport(wait_result_is_any_signal(rc, true) ? FATAL : DEBUG2, + (errmsg("could not restore file \"%s\" from archive: %s", + file, wait_result_to_str(rc)))); This block of code would have been executed even if rc == 0, which is incorrect because the command would have succeeded. HEAD would not have done that, actually, as RestoreArchivedFile() would return before. I guess that you have not noticed it because this basically just generated incorrect DEBUG2 messages when rc == 0? One part that this slightly influences is the order of the reports when the command succeeds the follow-up stat() fails to check the size, where we reverse these two logs: DEBUG2, "could not restore" LOG/FATAL, "could not stat file" However, that does not really change the value of the information reported: if the stat() failed, the failure mode is the same except that we don't get the extra DEBUG2/"could not restore" message, which does not really matter except if your elevel is high enough for that and there is always the LOG for that.. Once this issue was fixed, nothing else stood out, so applied this part. -- Michael
Attachment
On Mon, Jan 16, 2023 at 04:36:01PM +0900, Michael Paquier wrote: > Once this issue was fixed, nothing else stood out, so applied this > part. Thanks! I've attached a rebased version of the rest of the patch set. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Jan 16, 2023 at 02:40:40PM -0800, Nathan Bossart wrote: > On Mon, Jan 16, 2023 at 04:36:01PM +0900, Michael Paquier wrote: > > Once this issue was fixed, nothing else stood out, so applied this > > part. > > Thanks! I've attached a rebased version of the rest of the patch set. When it comes to 0002, the only difference between the three code paths of shell_recovery_end(), shell_archive_cleanup() and shell_restore() is the presence of BuildRestoreCommand(). However this is now just a thin wrapper of replace_percent_placeholders() that does just one extra make_native_path() for the xlogpath. Could it be cleaner in the long term to remove entirely BuildRestoreCommand() and move the conversion of the xlogpath with make_native_path() one level higher in the stack? -- Michael
Attachment
On Tue, Jan 17, 2023 at 02:32:03PM +0900, Michael Paquier wrote: > Could it be cleaner in the long term to remove entirely > BuildRestoreCommand() and move the conversion of the xlogpath with > make_native_path() one level higher in the stack? Yeah, this seems cleaner. I removed BuildRestoreCommand() in v8. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Jan 17, 2023 at 10:23:56AM -0800, Nathan Bossart wrote: > Yeah, this seems cleaner. I removed BuildRestoreCommand() in v8. if (*sp == *lp) { - if (val) - { - appendStringInfoString(&result, val); - found = true; - } - /* If val is NULL, we will report an error. */ + appendStringInfoString(&result, val); + found = true; In 0002, this code block has been removed as an effect of the removal of BuildRestoreCommand(), because RestoreArchivedFile() needs to handle two flags with two values. The current design has the advantage to warn extension developers with an unexpected manipulation, as well, so I have kept the logic in percentrepl.c as-is. I was wondering also if ExecuteRecoveryCommand() should use a bits32 for its two boolean flags, but did not bother as it is static in shell_restore.c so ABI does not matter, even if there are three callers of it with 75% of the combinations possible (only false/true is not used). And 0002 is applied. -- Michael
Attachment
On Wed, Jan 18, 2023 at 11:29:20AM +0900, Michael Paquier wrote: > And 0002 is applied. Thanks. Here's a rebased version of the last patch. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Jan 17, 2023 at 08:44:27PM -0800, Nathan Bossart wrote: > Thanks. Here's a rebased version of the last patch. Thanks for the rebase. The final state of the documentation is as follows: 51. Archive and Recovery Modules 51.1. Archive Module Initialization Functions 51.2. Archive Module Callbacks 51.3. Recovery Module Initialization Functions 51.4. Recovery Module Callbacks I am not completely sure whether this grouping is the best thing to do. Wouldn't it be better to move that into two different sub-sections instead? One layout suggestion: 51. WAL Modules 51.1. Archive Modules 51.1.1. Archive Module Initialization Functions 51.1.2. Archive Module Callbacks 51.2. Recovery Modules 51.2.1. Recovery Module Initialization Functions 51.2.2. Recovery Module Callbacks Putting both of them in the same section sounds like a good idea per the symmetry that one would likely have between the code paths of archiving and recovery, so as they share some common knowledge. This kinds of comes back to the previous suggestion to rename basic_archive to something like wal_modules, that covers both archiving and recovery. I does not seem like this would overlap with RMGRs, which is much more internal anyway. (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("must specify restore_command when standby mode is not enabled"))); + errmsg("must specify restore_command or a restore_library that defines " + "a restore callback when standby mode is not enabled"))); This is long. Shouldn't this be split into an errdetail() to list all the options at hand? - if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0') - ereport(ERROR, - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), - errmsg("both archive_command and archive_library set"), - errdetail("Only one of archive_command, archive_library may be set."))); + CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library", + XLogArchiveCommand, "archive_command"); The introduction of this routine could be a patch on its own, as it impacts the archiving path. + CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library", + archiveCleanupCommand, "archive_cleanup_command"); + if (strcmp(prevRestoreLibrary, restoreLibrary) != 0 || + strcmp(prevArchiveCleanupCommand, archiveCleanupCommand) != 0) + { + call_recovery_module_shutdown_cb(0, (Datum) 0); + LoadRecoveryCallbacks(); + } + + pfree(prevRestoreLibrary); + pfree(prevArchiveCleanupCommand); Hm.. The callers of CheckMutuallyExclusiveGUCs() with the new ERROR paths they introduce need a close lookup. As far as I can see this concerns four areas depending on the three restore commands (restore_command and recovery_end command for startup, archive_cleanup_command for the checkpointer): - Startup process initialization, as of validateRecoveryParameters() where the postmaster GUCs for the recovery target are evaluated. This one is an early stage which is fine. - Startup reloading, as of StartupRereadConfig(). This code could involve a WAL receiver restart depending on a change in the slot change or in primary_conninfo, and force at the same time an ERROR because of conflicting recovery library and command configuration. This one should be safe because pendingWalRcvRestart would just be considered later on by the startup process itself while waiting for WAL to become available. Still this could deserve a comment? Even if there is a misconfiguration, a reload on a standby would enforce a FATAL in the startup process, taking down the whole server. - Checkpointer initialization, as of CheckpointerMain(). A configuration failure in this code path, aka server startup, causes the server to loop infinitely on FATAL with the misconfiguration showing up all the time.. This is a problem. - Last comes the checkpointer GUC reloading, as of HandleCheckpointerInterrupts(), with a second problem. This introduces a failure path where ConfigReloadPending is processed at the same time as ShutdownRequestPending based on the way it is coded, interacting with what would be a normal shutdown in some cases? And actually, if you enforce a misconfiguration on reload, the checkpointer reports an error but it does not enforce a process restart, hence it keeps around the new, incorrect, configuration while waiting for a new checkpoint to happen once restore_library and archive_cleanup_command are set. This could lead to surprises, IMO. Upgrading to a FATAL in this code path triggers an infinite loop, like the startup path. If the archive_cleanup_command ballback of a restore library triggers a FATAL, it seems to me that it would continuously trigger a server restart, actually. Perhaps that's something to document, in comparison to the safe fallbacks of the shell command where we don't force an ERROR to give priority to the stability of the checkpointer. -- Michael
Attachment
On Mon, Jan 23, 2023 at 11:44:57AM +0900, Michael Paquier wrote: > Thanks for the rebase. Thanks for the detailed review. > The final state of the documentation is as follows: > 51. Archive and Recovery Modules > 51.1. Archive Module Initialization Functions > 51.2. Archive Module Callbacks > 51.3. Recovery Module Initialization Functions > 51.4. Recovery Module Callbacks > > I am not completely sure whether this grouping is the best thing to > do. Wouldn't it be better to move that into two different > sub-sections instead? One layout suggestion: > 51. WAL Modules > 51.1. Archive Modules > 51.1.1. Archive Module Initialization Functions > 51.1.2. Archive Module Callbacks > 51.2. Recovery Modules > 51.2.1. Recovery Module Initialization Functions > 51.2.2. Recovery Module Callbacks > > Putting both of them in the same section sounds like a good idea per > the symmetry that one would likely have between the code paths of > archiving and recovery, so as they share some common knowledge. > > This kinds of comes back to the previous suggestion to rename > basic_archive to something like wal_modules, that covers both > archiving and recovery. I does not seem like this would overlap with > RMGRs, which is much more internal anyway. I updated the documentation as you suggested, and I renamed basic_archive to basic_wal_module. > (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("must specify restore_command when standby mode is not enabled"))); > + errmsg("must specify restore_command or a restore_library that defines " > + "a restore callback when standby mode is not enabled"))); > This is long. Shouldn't this be split into an errdetail() to list all > the options at hand? Should the errmsg() be something like "recovery parameters misconfigured"? > - if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0') > - ereport(ERROR, > - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), > - errmsg("both archive_command and archive_library set"), > - errdetail("Only one of archive_command, archive_library may be set."))); > + CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library", > + XLogArchiveCommand, "archive_command"); > > The introduction of this routine could be a patch on its own, as it > impacts the archiving path. I moved this to a separate patch. > - Startup reloading, as of StartupRereadConfig(). This code could > involve a WAL receiver restart depending on a change in the slot > change or in primary_conninfo, and force at the same time an ERROR > because of conflicting recovery library and command configuration. > This one should be safe because pendingWalRcvRestart would just be > considered later on by the startup process itself while waiting for > WAL to become available. Still this could deserve a comment? Even if > there is a misconfiguration, a reload on a standby would enforce a > FATAL in the startup process, taking down the whole server. Do you think the parameter checks should go before the WAL receiver restart logic? > - Checkpointer initialization, as of CheckpointerMain(). A > configuration failure in this code path, aka server startup, causes > the server to loop infinitely on FATAL with the misconfiguration > showing up all the time.. This is a problem. Perhaps this is a reason to move the parameter check in CheckpointerMain() to after the sigsetjmp() block. This should avoid full server restarts. Only the checkpointer process would loop with the ERROR. > - Last comes the checkpointer GUC reloading, as of > HandleCheckpointerInterrupts(), with a second problem. This > introduces a failure path where ConfigReloadPending is processed at > the same time as ShutdownRequestPending based on the way it is coded, > interacting with what would be a normal shutdown in some cases? And > actually, if you enforce a misconfiguration on reload, the > checkpointer reports an error but it does not enforce a process > restart, hence it keeps around the new, incorrect, configuration while > waiting for a new checkpoint to happen once restore_library and > archive_cleanup_command are set. This could lead to surprises, IMO. > Upgrading to a FATAL in this code path triggers an infinite loop, like > the startup path. If we move the parameter check in CheckpointerMain() as described above, the checkpointer should be unable to proceed with an incorrect configuration. For the normal shutdown part, do you think the ShutdownRequestPending block should be moved to before the ConfigReloadPending block in HandleCheckpointerInterrupts()? > If the archive_cleanup_command ballback of a restore library triggers > a FATAL, it seems to me that it would continuously trigger a server > restart, actually. Perhaps that's something to document, in > comparison to the safe fallbacks of the shell command where we don't > force an ERROR to give priority to the stability of the checkpointer. I'm not sure it's worth documenting that ereport(FATAL, ...) in the checkpointer process will cause a server restart. In most cases, an extension author would use ERROR, which, if we make the aforementioned changes, would at most cause the checkpointer to effectively restart. This is similar to archive modules where an ERROR causes only the archiver process to restart. Also, we document that recovery libraries are loaded in the startup and checkpointer processes, so IMO it should be relatively apparent that doing something like FATAL or proc_exit() is bad. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Jan 23, 2023 at 01:44:28PM -0800, Nathan Bossart wrote: > On Mon, Jan 23, 2023 at 11:44:57AM +0900, Michael Paquier wrote: > I updated the documentation as you suggested, and I renamed basic_archive > to basic_wal_module. Thanks. The renaming of basic_archive to basic_wal_module looked fine, so applied. While looking at the docs, I found a bit confusing that the main section of the WAL modules included the full description for the archive modules, so I have moved that into the sect2 dedicated to the archive modules instead, as of the attached. 0004 has been updated in consequence, with details about the recovery bits within its own sect2. >> (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> - errmsg("must specify restore_command when standby mode is not enabled"))); >> + errmsg("must specify restore_command or a restore_library that defines " >> + "a restore callback when standby mode is not enabled"))); >> This is long. Shouldn't this be split into an errdetail() to list all >> the options at hand? > > Should the errmsg() be something like "recovery parameters misconfigured"? Hmm. Here is an idea: - errmsg: "must specify restore option when standby mode is not enabled" - errdetail: "Either restore_command or restore_library need to be specified." >> - if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0') >> - ereport(ERROR, >> - (errcode(ERRCODE_INVALID_PARAMETER_VALUE), >> - errmsg("both archive_command and archive_library set"), >> - errdetail("Only one of archive_command, archive_library may be set."))); >> + CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library", >> + XLogArchiveCommand, "archive_command"); >> >> The introduction of this routine could be a patch on its own, as it >> impacts the archiving path. > > I moved this to a separate patch. While pondering about that, I found a bit sad that this only works for string GUCs, while it could be possible to do the same kind of checks for the other GUC types with a more generic routine? Not enum, obviously, but int, float, bool and real, with the a failure if both GUCs are set to non-default values? Also, and I may be missing something here, do we really need to pass the value of the parameters to check? Wouldn't it be enough to check for the case where both parameters are set to their non-default values after reloading? >> - Startup reloading, as of StartupRereadConfig(). This code could >> involve a WAL receiver restart depending on a change in the slot >> change or in primary_conninfo, and force at the same time an ERROR >> because of conflicting recovery library and command configuration. >> This one should be safe because pendingWalRcvRestart would just be >> considered later on by the startup process itself while waiting for >> WAL to become available. Still this could deserve a comment? Even if >> there is a misconfiguration, a reload on a standby would enforce a >> FATAL in the startup process, taking down the whole server. > > Do you think the parameter checks should go before the WAL receiver restart > logic? Yeah, switching the order makes the logic more robust IMO. >> - Checkpointer initialization, as of CheckpointerMain(). A >> configuration failure in this code path, aka server startup, causes >> the server to loop infinitely on FATAL with the misconfiguration >> showing up all the time.. This is a problem. > > Perhaps this is a reason to move the parameter check in CheckpointerMain() > to after the sigsetjmp() block. This should avoid full server restarts. > Only the checkpointer process would loop with the ERROR. The loop part is annoying.. I've never been a fan of adding this cross-value checks for the archiver modules in the first place, and it would make things much simpler in the checkpointer if we need to think about that as we want these values to be reloadable. Perhaps this could just be an exception where we just give priority on one over the other archive_cleanup_command? The startup process has a well-defined sequence after a failure, while the checkpointer is designed to remain robust. >> - Last comes the checkpointer GUC reloading, as of >> HandleCheckpointerInterrupts(), with a second problem. This >> introduces a failure path where ConfigReloadPending is processed at >> the same time as ShutdownRequestPending based on the way it is coded, >> interacting with what would be a normal shutdown in some cases? And >> actually, if you enforce a misconfiguration on reload, the >> checkpointer reports an error but it does not enforce a process >> restart, hence it keeps around the new, incorrect, configuration while >> waiting for a new checkpoint to happen once restore_library and >> archive_cleanup_command are set. This could lead to surprises, IMO. >> Upgrading to a FATAL in this code path triggers an infinite loop, like >> the startup path. > > If we move the parameter check in CheckpointerMain() as described above, > the checkpointer should be unable to proceed with an incorrect > configuration. For the normal shutdown part, do you think the > ShutdownRequestPending block should be moved to before the > ConfigReloadPending block in HandleCheckpointerInterrupts()? I would not touch this order. This could influence the setup a shutdown checkpoint relies on, for one. >> If the archive_cleanup_command ballback of a restore library triggers >> a FATAL, it seems to me that it would continuously trigger a server >> restart, actually. Perhaps that's something to document, in >> comparison to the safe fallbacks of the shell command where we don't >> force an ERROR to give priority to the stability of the checkpointer. > > I'm not sure it's worth documenting that ereport(FATAL, ...) in the > checkpointer process will cause a server restart. In most cases, an > extension author would use ERROR, which, if we make the aforementioned > changes, would at most cause the checkpointer to effectively restart. This > is similar to archive modules where an ERROR causes only the archiver > process to restart. Also, we document that recovery libraries are loaded > in the startup and checkpointer processes, so IMO it should be relatively > apparent that doing something like FATAL or proc_exit() is bad. Okay. Fine by me. This could always be amended later, as required. + if (recoveryRestoreCommand[0] == '\0') + RecoveryContext.restore_cb = NULL; + if (archiveCleanupCommand[0] == '\0') + RecoveryContext.archive_cleanup_cb = NULL; + if (recoveryEndCommand[0] == '\0') + RecoveryContext.recovery_end_cb = NULL; Could it be cleaner to put this knowledge directly in shell_restore.c with a fast-exit path after entering each callback? It does not strike me as a good thing to sprinkle more than necessary the knowledge about the commands. Another question that popped in my mind: could it be better to have two different shutdown callbacks for the checkpointer and the startup process? Having some tests for both, like shell_archive.c, would be nice, actually. -- Michael
Attachment
On Wed, Jan 25, 2023 at 04:34:21PM +0900, Michael Paquier wrote: > The loop part is annoying.. I've never been a fan of adding this > cross-value checks for the archiver modules in the first place, and it > would make things much simpler in the checkpointer if we need to think > about that as we want these values to be reloadable. Perhaps this > could just be an exception where we just give priority on one over the > other archive_cleanup_command? The startup process has a well-defined > sequence after a failure, while the checkpointer is designed to remain > robust. Yeah, there are some problems here. If we ERROR, we'll just bounce back to the sigsetjmp() block once a second, and we'll never pick up configuration reloads, shutdown signals, etc. If we FATAL, we'll just rapidly restart over and over. Given the dicussion about misconfigured archiving parameters [0], I doubt folks will be okay with giving priority to one or the other. I'm currently thinking that the checkpointer should set a flag and clear the recovery callbacks when a misconfiguration is detected. Anytime the checkpointer tries to use the archive-cleanup callback, a WARNING would be emitted. This is similar to an approach I proposed for archiving misconfigurations (that we didn't proceed with) [1]. Given the aformentioned problems, this approach might be more suitable for the checkpointer than it is for the archiver. Thoughts? [0] https://postgr.es/m/9ee5d180-2c32-a1ca-d3d7-63a723f68d9a%40enterprisedb.com [1] https://postgr.es/m/20220914222736.GA3042279%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Jan 26, 2023 at 09:40:58PM -0800, Nathan Bossart wrote: > On Wed, Jan 25, 2023 at 04:34:21PM +0900, Michael Paquier wrote: >> The loop part is annoying.. I've never been a fan of adding this >> cross-value checks for the archiver modules in the first place, and it >> would make things much simpler in the checkpointer if we need to think >> about that as we want these values to be reloadable. Perhaps this >> could just be an exception where we just give priority on one over the >> other archive_cleanup_command? The startup process has a well-defined >> sequence after a failure, while the checkpointer is designed to remain >> robust. > > Yeah, there are some problems here. If we ERROR, we'll just bounce back to > the sigsetjmp() block once a second, and we'll never pick up configuration > reloads, shutdown signals, etc. If we FATAL, we'll just rapidly restart > over and over. Given the dicussion about misconfigured archiving > parameters [0], I doubt folks will be okay with giving priority to one or > the other. > > I'm currently thinking that the checkpointer should set a flag and clear > the recovery callbacks when a misconfiguration is detected. Anytime the > checkpointer tries to use the archive-cleanup callback, a WARNING would be > emitted. This is similar to an approach I proposed for archiving > misconfigurations (that we didn't proceed with) [1]. Given the > aformentioned problems, this approach might be more suitable for the > checkpointer than it is for the archiver. The more I think about this, the more I wonder whether we really need to include archive_cleanup_command and recovery_end_command in recovery modules. Another weird thing with the checkpointer is that the restore_library will stay loaded long after recovery is finished, and it'll be loaded regardless of whether recovery is required in the first place. Of course, that typically won't cause any problems, and we could wait until we need to do archive cleanup to load the library (and call its shutdown callback when recovery is finished), but this strikes me as potentially more complexity than the feature is worth. Perhaps we should just focus on covering the restore_command functionality for now and add the rest later. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Jan 26, 2023 at 09:40:58PM -0800, Nathan Bossart wrote: > Yeah, there are some problems here. If we ERROR, we'll just bounce back to > the sigsetjmp() block once a second, and we'll never pick up configuration > reloads, shutdown signals, etc. If we FATAL, we'll just rapidly restart > over and over. Given the dicussion about misconfigured archiving > parameters [0], I doubt folks will be okay with giving priority to one or > the other. > > I'm currently thinking that the checkpointer should set a flag and clear > the recovery callbacks when a misconfiguration is detected. Anytime the > checkpointer tries to use the archive-cleanup callback, a WARNING would be > emitted. This is similar to an approach I proposed for archiving > misconfigurations (that we didn't proceed with) [1]. Given the > aformentioned problems, this approach might be more suitable for the > checkpointer than it is for the archiver. So, by doing that, archive_library would be ignored. What should be the checkpointer do when a aconfiguration error is found on misconfiguration? Would archive_cleanup_command be equally ignored or could there be a risk to see it used by the checkpointer? Ignoring it would be fine as the user intended the use of a library, rather than enforcing its use via a value one did not really want. So, on top of cleaning the callbacks, archive_cleanup_command should also be cleaned up in the checkpointer? Issuing one WARNING per checkpoint would be indeed much better than looping over and over, impacting the system reliability. Thoughts or comments from anyone? -- Michael
Attachment
Hi, On 2023-01-27 15:28:21 -0800, Nathan Bossart wrote: > The more I think about this, the more I wonder whether we really need to > include archive_cleanup_command and recovery_end_command in recovery > modules. I think it would be hard to write a good module that isn't just implementing the existing commands without it. Needing to clean up archives and reacting to the end of recovery seems a pretty core task. > Another weird thing with the checkpointer is that the restore_library will > stay loaded long after recovery is finished, and it'll be loaded regardless > of whether recovery is required in the first place. I don't see a problem with that. And I suspect we might even end up there for other reasons. I was briefly wondering whether it'd be worth trying to offload things like archive_cleanup_command from checkpointer to a different process, for robustness. But given that it's pretty much required for performance that the module runs in the startup process, that ship probably has sailed. Greetings, Andres Freund
Hi, On 2023-01-25 16:34:21 +0900, Michael Paquier wrote: > diff --git a/src/include/access/xlogarchive.h b/src/include/access/xlogarchive.h > index 299304703e..71c9b88165 100644 > --- a/src/include/access/xlogarchive.h > +++ b/src/include/access/xlogarchive.h > @@ -30,9 +30,45 @@ extern bool XLogArchiveIsReady(const char *xlog); > extern bool XLogArchiveIsReadyOrDone(const char *xlog); > extern void XLogArchiveCleanup(const char *xlog); > > -extern bool shell_restore(const char *file, const char *path, > - const char *lastRestartPointFileName); > -extern void shell_archive_cleanup(const char *lastRestartPointFileName); > -extern void shell_recovery_end(const char *lastRestartPointFileName); > +/* > + * Recovery module callbacks > + * > + * These callback functions should be defined by recovery libraries and > + * returned via _PG_recovery_module_init(). For more information about the > + * purpose of each callback, refer to the recovery modules documentation. > + */ > +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path, > + const char *lastRestartPointFileName); > +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName); > +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName); > +typedef void (*RecoveryShutdownCB) (void); I think the signature of these forces bad coding practices, because there's no way to have state within a recovery module (as there's no parameter for it). It's possible we would eventually support multiple modules, e.g. restoring from shorter term file based archiving and from longer term archiving in some blob store. Then we'll regret not having a varible for this. > +typedef struct RecoveryModuleCallbacks > +{ > + RecoveryRestoreCB restore_cb; > + RecoveryArchiveCleanupCB archive_cleanup_cb; > + RecoveryEndCB recovery_end_cb; > + RecoveryShutdownCB shutdown_cb; > +} RecoveryModuleCallbacks; > + > +extern RecoveryModuleCallbacks RecoveryContext; I think that'll typically be interpreteted as a MemoryContext by readers. Also, why is this a global var? Exported too? > +/* > + * Type of the shared library symbol _PG_recovery_module_init that is looked up > + * when loading a recovery library. > + */ > +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb); I think this is a bad way to return callbacks. This way the RecoveryModuleCallbacks needs to be modifiable, which makes the job for the compiler harder (and isn't the greatest for security). I strongly encourage you to follow the model used e.g. by tableam. The init function should return a pointer to a *constant* struct. Which is compile-time initialized with the function pointers. See the bottom of heapam_handler.c for how that ends up looking. > +void > +LoadRecoveryCallbacks(void) > +{ > + RecoveryModuleInit init; > + > + /* > + * If the shell command is enabled, use our special initialization > + * function. Otherwise, load the library and call its > + * _PG_recovery_module_init(). > + */ > + if (restoreLibrary[0] == '\0') > + init = shell_restore_init; > + else > + init = (RecoveryModuleInit) > + load_external_function(restoreLibrary, "_PG_recovery_module_init", > + false, NULL); Why a special rule for shell, instead of just defaulting the GUC to it? > + /* > + * If using shell commands, remove callbacks for any commands that are not > + * set. > + */ > + if (restoreLibrary[0] == '\0') > + { > + if (recoveryRestoreCommand[0] == '\0') > + RecoveryContext.restore_cb = NULL; > + if (archiveCleanupCommand[0] == '\0') > + RecoveryContext.archive_cleanup_cb = NULL; > + if (recoveryEndCommand[0] == '\0') > + RecoveryContext.recovery_end_cb = NULL; I'd just mandate that these are implemented and that the module has to handle if it doesn't need to do anything. > + /* > + * Check for invalid combinations of the command/library parameters and > + * load the callbacks. > + */ > + CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library", > + recoveryRestoreCommand, "restore_command"); > + CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library", > + recoveryEndCommand, "recovery_end_command"); > + before_shmem_exit(call_recovery_module_shutdown_cb, 0); > + LoadRecoveryCallbacks(); This kind of sequence is duplicated into several places. Greetings, Andres Freund
On Fri, Jan 27, 2023 at 04:09:39PM -0800, Andres Freund wrote: > I think it would be hard to write a good module that isn't just implementing > the existing commands without it. Needing to clean up archives and reacting to > the end of recovery seems a pretty core task. FWIW, recovery_end_command is straight-forward as it is done by the startup process, so that's an easy take. You could split the cake into two parts, as well, aka first focus on restore_command and recovery_end_command as a first step, then we could try to figure out how archive_cleanup_command would fit in this picture with the checkpointer or a different process. There are a bunch of deployments where WAL archive retention is controlled by the age of the backups, not by the backend deciding when these should be removed as a checkpoint runs depending on the computed redo LSN, so recovery modules would still be useful with just coverage for restore_command and recovery_end_command. > I was briefly wondering whether it'd be worth trying to offload things like > archive_cleanup_command from checkpointer to a different process, for > robustness. But given that it's pretty much required for performance that the > module runs in the startup process, that ship probably has sailed. Yeah, agreed that this could be interesting. That could leverage the work of the checkpointer. Nathan has proposed a patch for that recently, as far as I recall, to offload some tasks from the startup and checkpointer processes: https://commitfest.postgresql.org/41/3448/ So that pretty much goes into the same area? -- Michael
Attachment
On Fri, Jan 27, 2023 at 04:23:19PM -0800, Andres Freund wrote: >> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path, >> + const char *lastRestartPointFileName); >> +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName); >> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName); >> +typedef void (*RecoveryShutdownCB) (void); > > I think the signature of these forces bad coding practices, because there's no > way to have state within a recovery module (as there's no parameter for it). > > It's possible we would eventually support multiple modules, e.g. restoring > from shorter term file based archiving and from longer term archiving in some > blob store. Then we'll regret not having a varible for this. Are you suggesting that we add a "void *arg" to each one of these? Or put the arguments into a struct? Or something else? >> +extern RecoveryModuleCallbacks RecoveryContext; > > I think that'll typically be interpreteted as a MemoryContext by readers. How about RecoveryCallbacks? > Also, why is this a global var? Exported too? It's needed in xlog.c, xlogarchive.c, and xlogrecovery.c. Would you rather it be static to xlogarchive.c and provide accessors for the others? >> +/* >> + * Type of the shared library symbol _PG_recovery_module_init that is looked up >> + * when loading a recovery library. >> + */ >> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb); > > I think this is a bad way to return callbacks. This way the > RecoveryModuleCallbacks needs to be modifiable, which makes the job for the > compiler harder (and isn't the greatest for security). > > I strongly encourage you to follow the model used e.g. by tableam. The init > function should return a pointer to a *constant* struct. Which is compile-time > initialized with the function pointers. > > See the bottom of heapam_handler.c for how that ends up looking. Hm. I used the existing strategy for archive modules and logical decoding output plugins here. I think it would be weird for the archive module and recovery module interfaces to look so different, but if that's okay, I can change it. >> +void >> +LoadRecoveryCallbacks(void) >> +{ >> + RecoveryModuleInit init; >> + >> + /* >> + * If the shell command is enabled, use our special initialization >> + * function. Otherwise, load the library and call its >> + * _PG_recovery_module_init(). >> + */ >> + if (restoreLibrary[0] == '\0') >> + init = shell_restore_init; >> + else >> + init = (RecoveryModuleInit) >> + load_external_function(restoreLibrary, "_PG_recovery_module_init", >> + false, NULL); > > Why a special rule for shell, instead of just defaulting the GUC to it? I'm not following this one. The default value of the restore_library GUC is an empty string, which means that the shell commands should be used. >> + /* >> + * If using shell commands, remove callbacks for any commands that are not >> + * set. >> + */ >> + if (restoreLibrary[0] == '\0') >> + { >> + if (recoveryRestoreCommand[0] == '\0') >> + RecoveryContext.restore_cb = NULL; >> + if (archiveCleanupCommand[0] == '\0') >> + RecoveryContext.archive_cleanup_cb = NULL; >> + if (recoveryEndCommand[0] == '\0') >> + RecoveryContext.recovery_end_cb = NULL; > > I'd just mandate that these are implemented and that the module has to handle > if it doesn't need to do anything. Wouldn't this just force module authors to write empty functions for the parts they don't need? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote: > On Fri, Jan 27, 2023 at 04:23:19PM -0800, Andres Freund wrote: > >> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path, > >> + const char *lastRestartPointFileName); > >> +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName); > >> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName); > >> +typedef void (*RecoveryShutdownCB) (void); > > > > I think the signature of these forces bad coding practices, because there's no > > way to have state within a recovery module (as there's no parameter for it). > > > > It's possible we would eventually support multiple modules, e.g. restoring > > from shorter term file based archiving and from longer term archiving in some > > blob store. Then we'll regret not having a varible for this. > > Are you suggesting that we add a "void *arg" to each one of these? Yes. Or pass a pointer to a struct with a "private_data" style field to all of them. > >> +extern RecoveryModuleCallbacks RecoveryContext; > > > > I think that'll typically be interpreteted as a MemoryContext by readers. > > How about RecoveryCallbacks? Callbacks is better than Context here imo, but I think 'Recovery' makes it sound like this actually performs WAL replay or such. Seems like it should be RestoreCallbacks at the very least? > > Also, why is this a global var? Exported too? > > It's needed in xlog.c, xlogarchive.c, and xlogrecovery.c. Would you rather > it be static to xlogarchive.c and provide accessors for the others? Maybe? Something about this feels wrong to me, but I can't entirely put my finger on it. > >> +/* > >> + * Type of the shared library symbol _PG_recovery_module_init that is looked up > >> + * when loading a recovery library. > >> + */ > >> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb); > > > > I think this is a bad way to return callbacks. This way the > > RecoveryModuleCallbacks needs to be modifiable, which makes the job for the > > compiler harder (and isn't the greatest for security). > > > > I strongly encourage you to follow the model used e.g. by tableam. The init > > function should return a pointer to a *constant* struct. Which is compile-time > > initialized with the function pointers. > > > > See the bottom of heapam_handler.c for how that ends up looking. > > Hm. I used the existing strategy for archive modules and logical decoding > output plugins here. Unfortunately I didn't realize the problem when I was designing the output plugin interface. But there's probably too many users of it out there now to change it. The interface does at least provide a way to have its own "per instance" state, via the startup callback and LogicalDecodingContext->output_plugin_private. The worst interface in this area is index AMs - the handler returns a pointer to a palloced struct with callbacks. That then is copied into a new allocation in the relcache entry. We have hundreds to thousands of copies of what bthandler() sets up in memory. Without any sort of benefit. > I think it would be weird for the archive module and > recovery module interfaces to look so different, but if that's okay, I can > change it. I'm a bit sad about the archive module case - I wonder if we should change it now, there can't be many users of it out there. And I think it's more likely that we'll eventually want multiple archiving scripts to run concurrently - which will be quite hard with the current interface (no private state). Just btw: It's imo a bit awkward for the definition of the archiving plugin interface to be in pgarch.h: "Exports from postmaster/pgarch.c" doesn't describe that well. A dedicated header seems cleaner. > > >> +void > >> +LoadRecoveryCallbacks(void) > >> +{ > >> + RecoveryModuleInit init; > >> + > >> + /* > >> + * If the shell command is enabled, use our special initialization > >> + * function. Otherwise, load the library and call its > >> + * _PG_recovery_module_init(). > >> + */ > >> + if (restoreLibrary[0] == '\0') > >> + init = shell_restore_init; > >> + else > >> + init = (RecoveryModuleInit) > >> + load_external_function(restoreLibrary, "_PG_recovery_module_init", > >> + false, NULL); > > > > Why a special rule for shell, instead of just defaulting the GUC to it? > > I'm not following this one. The default value of the restore_library GUC > is an empty string, which means that the shell commands should be used. I was wondering why we implement "shell" via a separate mechanism from restore_library. I.e. a) why doesn't restore_library default to 'shell', instead of an empty string, b) why aren't restore_command et al implemented using a restore module. > >> + /* > >> + * If using shell commands, remove callbacks for any commands that are not > >> + * set. > >> + */ > >> + if (restoreLibrary[0] == '\0') > >> + { > >> + if (recoveryRestoreCommand[0] == '\0') > >> + RecoveryContext.restore_cb = NULL; > >> + if (archiveCleanupCommand[0] == '\0') > >> + RecoveryContext.archive_cleanup_cb = NULL; > >> + if (recoveryEndCommand[0] == '\0') > >> + RecoveryContext.recovery_end_cb = NULL; > > > > I'd just mandate that these are implemented and that the module has to handle > > if it doesn't need to do anything. > > Wouldn't this just force module authors to write empty functions for the > parts they don't need? Yes. But what's the point of a restore library that doesn't implement a restore command? Making some/all callbacks mandatory and validating mandatory callbacks are set, during load, IME makes it easier to evolve the interface over time, because problems become immediately apparent, rather than having to wait for a certain callback to be hit. It's not actually clear to me why another restore library shouldn't be able to use restore_command etc, given that we have the parameters. One quite useful module would be a version of the "shell" interface that runs multiple restore commands in parallel, assuming we'll need subsequent files as well. The fact that restore_command are not run in parallel, and that many useful restore commands have a fair bit of latency, is an issue. So a shell_parallel restore library would e.g. be useful? Greetings, Andres Freund
On Fri, Jan 27, 2023 at 05:55:42PM -0800, Andres Freund wrote: > On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote: >> I think it would be weird for the archive module and >> recovery module interfaces to look so different, but if that's okay, I can >> change it. > > I'm a bit sad about the archive module case - I wonder if we should change it > now, there can't be many users of it out there. And I think it's more likely > that we'll eventually want multiple archiving scripts to run concurrently - > which will be quite hard with the current interface (no private state). I'm open to that. IIUC it wouldn't require too many changes to existing archive modules, and if it gets us closer to batching or parallelism, it's probably worth doing sooner than later. > I was wondering why we implement "shell" via a separate mechanism from > restore_library. I.e. a) why doesn't restore_library default to 'shell', > instead of an empty string, b) why aren't restore_command et al implemented > using a restore module. I think that's the long-term idea. For archive modules, there were concerns about backward compatibility [0]. [0] https://postgr.es/m/CABUevEx8cKy%3D%2BYQU_3NaeXnZV2bSB7Lk6EE%2B-FEcmE4JO4V1hg%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Fri, Jan 27, 2023 at 08:17:46PM -0800, Nathan Bossart wrote: > On Fri, Jan 27, 2023 at 05:55:42PM -0800, Andres Freund wrote: >> On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote: >>> I think it would be weird for the archive module and >>> recovery module interfaces to look so different, but if that's okay, I can >>> change it. >> >> I'm a bit sad about the archive module case - I wonder if we should change it >> now, there can't be many users of it out there. And I think it's more likely >> that we'll eventually want multiple archiving scripts to run concurrently - >> which will be quite hard with the current interface (no private state). > > I'm open to that. IIUC it wouldn't require too many changes to existing > archive modules, and if it gets us closer to batching or parallelism, it's > probably worth doing sooner than later. Here is a work-in-progress patch set for adjusting the archive modules interface. Is this roughly what you had in mind? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Fri, Jan 27, 2023 at 10:27:29PM -0800, Nathan Bossart wrote: > Here is a work-in-progress patch set for adjusting the archive modules > interface. Is this roughly what you had in mind? I have been catching up with what is happening here. I can get behind the idea to use the term "callbacks" vs "context" for clarity, to move the callback definitions into their own header, and to add extra arguments to the callback functions for some private data. -void -_PG_archive_module_init(ArchiveModuleCallbacks *cb) +const ArchiveModuleCallbacks * +_PG_archive_module_init(void **arg) { AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit); - cb->check_configured_cb = basic_archive_configured; - cb->archive_file_cb = basic_archive_file; + (*arg) = (void *) AllocSetContextCreate(TopMemoryContext, + "basic_archive", + ALLOCSET_DEFAULT_SIZES); + + return &basic_archive_callbacks; Now, I find this part, where we use a double pointer to allow the module initialization to create and give back a private area, rather confusing, and I think that this could be bug-prone, as well. Once you incorporate some data within the set of callbacks, isn't this stuff close to a "state" data, or just something that we could call only an "ArchiveModule"? Could it make more sense to have _PG_archive_module_init return a structure with everything rather than a separate in/out argument? Here is the idea, simply: typedef struct ArchiveModule { ArchiveCallbacks *routines; void *private_data; /* Potentially more here, like some flags? */ } ArchiveModule; That would be closer to the style of xlogreader.h, for example. All these choices should be documented in archive_module.h, at the end. -- Michael
Attachment
On Mon, Jan 30, 2023 at 04:51:38PM +0900, Michael Paquier wrote: > Now, I find this part, where we use a double pointer to allow the > module initialization to create and give back a private area, rather > confusing, and I think that this could be bug-prone, as well. Once > you incorporate some data within the set of callbacks, isn't this > stuff close to a "state" data, or just something that we could call > only an "ArchiveModule"? Could it make more sense to have > _PG_archive_module_init return a structure with everything rather than > a separate in/out argument? Here is the idea, simply: > typedef struct ArchiveModule { > ArchiveCallbacks *routines; > void *private_data; > /* Potentially more here, like some flags? */ > } ArchiveModule; Yeah, we could probably invent an ArchiveModuleContext struct. I think this is similar to how LogicalDecodingContext is used. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-01-30 16:51:38 +0900, Michael Paquier wrote: > On Fri, Jan 27, 2023 at 10:27:29PM -0800, Nathan Bossart wrote: > > Here is a work-in-progress patch set for adjusting the archive modules > > interface. Is this roughly what you had in mind? > > I have been catching up with what is happening here. I can get > behind the idea to use the term "callbacks" vs "context" for clarity, > to move the callback definitions into their own header, and to add > extra arguments to the callback functions for some private data. > > -void > -_PG_archive_module_init(ArchiveModuleCallbacks *cb) > +const ArchiveModuleCallbacks * > +_PG_archive_module_init(void **arg) > { > AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit); > > - cb->check_configured_cb = basic_archive_configured; > - cb->archive_file_cb = basic_archive_file; > + (*arg) = (void *) AllocSetContextCreate(TopMemoryContext, > + "basic_archive", > + ALLOCSET_DEFAULT_SIZES); > + > + return &basic_archive_callbacks; > Now, I find this part, where we use a double pointer to allow the > module initialization to create and give back a private area, rather > confusing, and I think that this could be bug-prone, as well. I don't think _PG_archive_module_init() should actually allocate a memory context and do other similar initializations. Instead it should just return 'const ArchiveModuleCallbacks*', typically a single line. Allocations etc should happen in one of the callbacks. That way we can actually have multiple instances of a module. > Once > you incorporate some data within the set of callbacks, isn't this > stuff close to a "state" data, or just something that we could call > only an "ArchiveModule"? Could it make more sense to have > _PG_archive_module_init return a structure with everything rather than > a separate in/out argument? Here is the idea, simply: > typedef struct ArchiveModule { > ArchiveCallbacks *routines; > void *private_data; > /* Potentially more here, like some flags? */ > } ArchiveModule; I don't like this much. This still basically ends up with the module callbacks not being sufficient to instantiate an archive module. Greetings, Andres Freund
On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote: > I don't think _PG_archive_module_init() should actually allocate a memory > context and do other similar initializations. Instead it should just return > 'const ArchiveModuleCallbacks*', typically a single line. > > Allocations etc should happen in one of the callbacks. That way we can > actually have multiple instances of a module. I think we'd need to invent a startup callback for archive modules for this to work, but that's easy enough. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Mon, Jan 30, 2023 at 12:04:22PM -0800, Nathan Bossart wrote: > On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote: >> I don't think _PG_archive_module_init() should actually allocate a memory >> context and do other similar initializations. Instead it should just return >> 'const ArchiveModuleCallbacks*', typically a single line. >> >> Allocations etc should happen in one of the callbacks. That way we can >> actually have multiple instances of a module. > > I think we'd need to invent a startup callback for archive modules for this > to work, but that's easy enough. If you don't return some (void *) pointing to a private area that would be stored by the backend, allocated as part of the loading path, I agree that an extra callback is what makes the most sense, presumably called around the beginning of PgArchiverMain(). Doing this kind of one-time action in the file callback woud be weird.. -- Michael
Attachment
On Tue, Jan 31, 2023 at 08:13:11AM +0900, Michael Paquier wrote: > On Mon, Jan 30, 2023 at 12:04:22PM -0800, Nathan Bossart wrote: >> On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote: >>> I don't think _PG_archive_module_init() should actually allocate a memory >>> context and do other similar initializations. Instead it should just return >>> 'const ArchiveModuleCallbacks*', typically a single line. >>> >>> Allocations etc should happen in one of the callbacks. That way we can >>> actually have multiple instances of a module. >> >> I think we'd need to invent a startup callback for archive modules for this >> to work, but that's easy enough. > > If you don't return some (void *) pointing to a private area that > would be stored by the backend, allocated as part of the loading path, > I agree that an extra callback is what makes the most sense, > presumably called around the beginning of PgArchiverMain(). Doing > this kind of one-time action in the file callback woud be weird.. Okay, here is a new patch set with the aforementioned adjustments and documentation updates. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Jan 31, 2023 at 03:30:13PM -0800, Nathan Bossart wrote: > Okay, here is a new patch set with the aforementioned adjustments and > documentation updates. So, it looks like you have addressed the feedback received here, as of: - Rename of Context to Callback. - Move of the definition into their own header. - Introduction of a callback for the startup initialization. - Pass down a private state to each callback. I have a few minor comments. +/* + * Since the logic for archiving via a shell command is in the core server + * and does not need to be loaded via a shared library, it has a special + * initialization function. + */ +extern const ArchiveModuleCallbacks *shell_archive_init(void); Storing that in archive_module.h is not incorrect, still feels a bit unnatural. I would have used a separate header for clarity. It may not sound like a big deal, but we may want this separation if archive_module.h is used in some frontend code in the future. Perhaps that will never be the case, but I've seen many fancy (as in useful) proposals in the past when it comes to such things. static bool -shell_archive_configured(void) +shell_archive_configured(void *private_state) { return XLogArchiveCommand[0] != '\0'; Maybe check that in this context private_state should be NULL? The other two callbacks could use an assert, as well. - <function>_PG_archive_module_init</function>. This function is passed a - struct that needs to be filled with the callback function pointers for - individual actions. + <function>_PG_archive_module_init</function>. This function must return a + struct filled with the callback function pointers for individual actions. Worth mentioning the name of the structure, as of "This function must return a structure ArchiveModuleCallbacks filled with.." + The <function>startup_cb</function> callback is called shortly after the + module is loaded. This callback can be used to perform any additional + initialization required. If the archive module needs a state, it should + return a pointer to the state. This pointer will be passed to each of the + module's other callbacks via the <literal>void *private_state</literal> + argument. Not sure about the complexity of two sentences here. This could simply be: This function can return a pointer to an area of memory dedicated to the state of the archive module loaded. This pointer is passed to each of the module's other callbacks as the argument <literal>private_state</literal>. Side note: it looks like there is nothing in archive-modules.sgml telling that these modules are only loaded by the archiver process. -- Michael
Attachment
Hi, On 2023-01-31 15:30:13 -0800, Nathan Bossart wrote: > +/* > + * basic_archive_startup > + * > + * Creates the module's memory context. > + */ > +void * > +basic_archive_startup(void) > +{ > + return (void *) AllocSetContextCreate(TopMemoryContext, > + "basic_archive", > + ALLOCSET_DEFAULT_SIZES); > } I'd make basic_archive's private data a struct, with a member for the context, but it's not that important. I'd also be inclined to do the same for the private_state you're passing around for each module. Even if it's just to reduce the number of functions accepting void * - loosing compiler type checking isn't great. So maybe an ArchiveModuleState { void *private_data } that's passed to basic_archive_startup() and all the other callbacks. Greetings, Andres Freund
On Wed, Feb 01, 2023 at 03:54:26AM -0800, Andres Freund wrote: > I'd make basic_archive's private data a struct, with a member for the > context, but it's not that important. > > I'd also be inclined to do the same for the private_state you're passing > around for each module. Even if it's just to reduce the number of > functions accepting void * - loosing compiler type checking isn't great. > > So maybe an ArchiveModuleState { void *private_data } that's passed to > basic_archive_startup() and all the other callbacks. Here's a new patch set in which I've attempted to address this feedback and Michael's feedback. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-01 12:15:29 -0800, Nathan Bossart wrote: > Here's a new patch set in which I've attempted to address this feedback and > Michael's feedback. Looks better! > @@ -25,12 +34,14 @@ extern PGDLLIMPORT char *XLogArchiveLibrary; > * For more information about the purpose of each callback, refer to the > * archive modules documentation. > */ > -typedef bool (*ArchiveCheckConfiguredCB) (void); > -typedef bool (*ArchiveFileCB) (const char *file, const char *path); > -typedef void (*ArchiveShutdownCB) (void); > +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state); > +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); > +typedef bool (*ArchiveFileCB) (const char *file, const char *path, ArchiveModuleState *state); > +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); Personally I'd always pass ArchiveModuleState *state as the first arg, but it's not important. Greetings, Andres Freund
On Wed, Feb 01, 2023 at 01:06:06PM -0800, Andres Freund wrote: > On 2023-02-01 12:15:29 -0800, Nathan Bossart wrote: >> Here's a new patch set in which I've attempted to address this feedback and >> Michael's feedback. > > Looks better! Thanks! >> @@ -25,12 +34,14 @@ extern PGDLLIMPORT char *XLogArchiveLibrary; >> * For more information about the purpose of each callback, refer to the >> * archive modules documentation. >> */ >> -typedef bool (*ArchiveCheckConfiguredCB) (void); >> -typedef bool (*ArchiveFileCB) (const char *file, const char *path); >> -typedef void (*ArchiveShutdownCB) (void); >> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state); >> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); >> +typedef bool (*ArchiveFileCB) (const char *file, const char *path, ArchiveModuleState *state); >> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); > > Personally I'd always pass ArchiveModuleState *state as the first arg, > but it's not important. Yeah, that's nicer. cfbot is complaining about a missing #include, so I need to send a new revision anyway. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Wed, Feb 01, 2023 at 01:23:26PM -0800, Nathan Bossart wrote: > Yeah, that's nicer. cfbot is complaining about a missing #include, so I > need to send a new revision anyway. Okay, the changes done here look straight-forward seen from here. I got one small-ish comment. +basic_archive_startup(ArchiveModuleState *state) +{ + BasicArchiveData *data = palloc0(sizeof(BasicArchiveData)); Perhaps this should use MemoryContextAlloc() rather than a plain palloc(). This should not matter based on the position where the startup callback is called, still that may be a pattern worth encouraging. -- Michael
Attachment
On Thu, Feb 02, 2023 at 02:34:17PM +0900, Michael Paquier wrote: > Okay, the changes done here look straight-forward seen from here. I > got one small-ish comment. > > +basic_archive_startup(ArchiveModuleState *state) > +{ > + BasicArchiveData *data = palloc0(sizeof(BasicArchiveData)); > > Perhaps this should use MemoryContextAlloc() rather than a plain > palloc(). This should not matter based on the position where the > startup callback is called, still that may be a pattern worth > encouraging. Good call. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Feb 02, 2023 at 11:37:00AM -0800, Nathan Bossart wrote: > On Thu, Feb 02, 2023 at 02:34:17PM +0900, Michael Paquier wrote: >> Okay, the changes done here look straight-forward seen from here. I >> got one small-ish comment. >> >> +basic_archive_startup(ArchiveModuleState *state) >> +{ >> + BasicArchiveData *data = palloc0(sizeof(BasicArchiveData)); >> >> Perhaps this should use MemoryContextAlloc() rather than a plain >> palloc(). This should not matter based on the position where the >> startup callback is called, still that may be a pattern worth >> encouraging. > > Good call. + ArchiveModuleCallbacks struct filled with the callback function pointers for This needs a structname markup. + can use <literal>state->private_data</literal> to store it. And here it would be structfield. As far as I can see, all the points raised about this redesign seem to have been addressed. Andres, any comments? -- Michael
Attachment
On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote: > + ArchiveModuleCallbacks struct filled with the callback function pointers for > This needs a structname markup. > > + can use <literal>state->private_data</literal> to store it. > And here it would be structfield. fixed -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Sat, Feb 04, 2023 at 10:14:36AM -0800, Nathan Bossart wrote: > On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote: >> + ArchiveModuleCallbacks struct filled with the callback function pointers for >> This needs a structname markup. >> >> + can use <literal>state->private_data</literal> to store it. >> And here it would be structfield. > > fixed Andres, did you have the change to look at that? I did look at it, but it may not address all the points you may have in mind. -- Michael
Attachment
Hi, On 2023-02-08 16:23:34 +0900, Michael Paquier wrote: > On Sat, Feb 04, 2023 at 10:14:36AM -0800, Nathan Bossart wrote: > > On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote: > >> + ArchiveModuleCallbacks struct filled with the callback function pointers for > >> This needs a structname markup. > >> > >> + can use <literal>state->private_data</literal> to store it. > >> And here it would be structfield. > > > > fixed > > Andres, did you have the change to look at that? I did look at it, > but it may not address all the points you may have in mind. Yes, I think this looks pretty good now. One minor thing: I don't think we really need the AssertVariableIsOfType() for anything but the Init() one? Greetings, Andres Freund
On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote: > One minor thing: I don't think we really need the AssertVariableIsOfType() for > anything but the Init() one? This is another part that was borrowed from logical decoding output plugins. I'm not sure this adds much since f2b73c8 ("Add central declarations for dlsym()ed symbols"). Perhaps we should remove all of these assertions for functions that now have central declarations. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Hi, On 2023-02-08 09:27:05 -0800, Nathan Bossart wrote: > On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote: > > One minor thing: I don't think we really need the AssertVariableIsOfType() for > > anything but the Init() one? > > This is another part that was borrowed from logical decoding output > plugins. I know :(. It was needed in an earlier version of the output plugin interface, where all the different callbacks were looked up via dlsym(), but should have been removed after that. > I'm not sure this adds much since f2b73c8 ("Add central > declarations for dlsym()ed symbols"). Perhaps we should remove all of > these assertions for functions that now have central declarations. Most of them weren't needed even before that. And yes, I'd be for a patch to remove all of those assertions. Greetings, Andres Freund
On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote: > And yes, I'd be for a patch to remove all of those assertions. done -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-08 09:57:56 -0800, Nathan Bossart wrote: > On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote: > > And yes, I'd be for a patch to remove all of those assertions. > > done If you'd reorder it so that 0004 applies independently from the other changes, I'd just push that now. I was remembering additional AssertVariableIsOfType(), but it looks like we actually did remember to take them out when redesigning the output plugin interface... Greetings, Andres Freund
On Wed, Feb 08, 2023 at 10:24:18AM -0800, Andres Freund wrote: > If you'd reorder it so that 0004 applies independently from the other changes, > I'd just push that now. done -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On 2023-02-08 10:55:44 -0800, Nathan Bossart wrote: > done Pushed. Thanks!
On Wed, Feb 08, 2023 at 09:16:19PM -0800, Andres Freund wrote: > Pushed. Thanks! Thanks! -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
rebased for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote: > rebased for cfbot I think this nearly ready. Michael, are you planning to commit this? Personally I'd probably squash these into a single commit. > diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml > index ef02051f7f..2db1b19216 100644 > --- a/doc/src/sgml/archive-modules.sgml > +++ b/doc/src/sgml/archive-modules.sgml > @@ -47,23 +47,30 @@ > normal library search path is used to locate the library. To provide the > required archive module callbacks and to indicate that the library is > actually an archive module, it needs to provide a function named > - <function>_PG_archive_module_init</function>. This function is passed a > - struct that needs to be filled with the callback function pointers for > - individual actions. > + <function>_PG_archive_module_init</function>. This function must return an > + <structname>ArchiveModuleCallbacks</structname> struct filled with the > + callback function pointers for individual actions. I'd probably mention that this should typically be of server lifetime / a 'static const' struct. Tableam documents this as follows: The result of the function must be a pointer to a struct of type <structname>TableAmRoutine</structname>, which contains everything that the core code needs to know to make use of the table access method. The return value needs to be of server lifetime, which is typically achieved by defining it as a <literal>static const</literal> variable in global scope > + > + <note> > + <para> > + <varname>archive_library</varname> is only loaded in the archiver process. > + </para> > + </note> > </sect1> That's not really related to any of the changes here, right? I'm not sure it's a good idea to document that. We e.g. probably should allow the library to check that the configuration is correct, at postmaster start, rather than later, at runtime. > <sect1 id="archive-module-callbacks"> > @@ -73,6 +80,20 @@ typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb); > The server will call them as required to process each individual WAL file. > </para> > > + <sect2 id="archive-module-startup"> > + <title>Startup Callback</title> > + <para> > + The <function>startup_cb</function> callback is called shortly after the > + module is loaded. This callback can be used to perform any additional > + initialization required. If the archive module needs to have a state, it > + can use <structfield>state->private_data</structfield> to store it. s/needs to have a state/has state/? > @@ -83,7 +104,7 @@ typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb); > assumes the module is configured. > > <programlisting> > -typedef bool (*ArchiveCheckConfiguredCB) (void); > +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); > </programlisting> > > If <literal>true</literal> is returned, the server will proceed with Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the state. We're not really doing anything yet, at that point, so it shouldn't really need state? The reason I'm wondering is that I think we should consider calling this from the GUC assignment hook, at least in postmaster. Which would make it more convenient to not have state, I think? > @@ -128,7 +149,7 @@ typedef bool (*ArchiveFileCB) (const char *file, const char *path); > these situations. > > <programlisting> > -typedef void (*ArchiveShutdownCB) (void); > +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); > </programlisting> > </para> > </sect2> Perhaps mention that this needs to free state it allocated in the ArchiveModuleState, or it'll be leaked? Greetings, Andres Freund
On Thu, Feb 09, 2023 at 12:18:55PM -0800, Andres Freund wrote: > On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote: > Personally I'd probably squash these into a single commit. done > I'd probably mention that this should typically be of server lifetime / a > 'static const' struct. Tableam documents this as follows: done >> + <note> >> + <para> >> + <varname>archive_library</varname> is only loaded in the archiver process. >> + </para> >> + </note> >> </sect1> > > That's not really related to any of the changes here, right? > > I'm not sure it's a good idea to document that. We e.g. probably should allow > the library to check that the configuration is correct, at postmaster start, > rather than later, at runtime. removed >> + <sect2 id="archive-module-startup"> >> + <title>Startup Callback</title> >> + <para> >> + The <function>startup_cb</function> callback is called shortly after the >> + module is loaded. This callback can be used to perform any additional >> + initialization required. If the archive module needs to have a state, it >> + can use <structfield>state->private_data</structfield> to store it. > > s/needs to have a state/has state/? done >> <programlisting> >> -typedef bool (*ArchiveCheckConfiguredCB) (void); >> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); >> </programlisting> >> >> If <literal>true</literal> is returned, the server will proceed with > > Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the > state. We're not really doing anything yet, at that point, so it shouldn't > really need state? > > The reason I'm wondering is that I think we should consider calling this from > the GUC assignment hook, at least in postmaster. Which would make it more > convenient to not have state, I think? I agree that this callback should typically not need the state, but I'm not sure whether it fits into the assignment hook for archive_library. This callback is primarily meant for situations when you have archiving enabled, but your module isn't configured yet (e.g., archive_command is empty). In this case, we keep the WAL around, but we don't try to archive it until this hook returns true. It's up to each module to define that criteria. I can imagine someone introducing a GUC in their archive module that temporarily halts archiving via this callback. In that case, calling it via a GUC assignment hook probably won't work. In fact, checking whether archive_command is empty in that hook might not work either. >> <programlisting> >> -typedef void (*ArchiveShutdownCB) (void); >> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); >> </programlisting> >> </para> >> </sect2> > > Perhaps mention that this needs to free state it allocated in the > ArchiveModuleState, or it'll be leaked? done I left this out originally because the archiver exits shortly after calling this. However, if you have DSM segments or something, it's probably wise to make sure those are cleaned up. And I suppose we might not always exit immediately after this callback, so establishing the habit of freeing the state could be a good idea. In addition to updating this part of the docs, I wrote a shutdown callback for basic_archive that frees its state. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Thu, Feb 09, 2023 at 12:18:55PM -0800, Andres Freund wrote: > I think this nearly ready. Michael, are you planning to commit this? I could take a stab at it, now if you feel strongly about doing it yourselfof course feel free :) > Personally I'd probably squash these into a single commit. Same impression here. Agreed that all these had better be merged together, still keeping them separated made their review so much easier. -- Michael
Attachment
On Thu, Feb 09, 2023 at 02:48:26PM -0800, Nathan Bossart wrote: > On Thu, Feb 09, 2023 at 12:18:55PM -0800, Andres Freund wrote: >>> <programlisting> >>> -typedef bool (*ArchiveCheckConfiguredCB) (void); >>> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); >>> </programlisting> >>> >>> If <literal>true</literal> is returned, the server will proceed with >> >> Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the >> state. We're not really doing anything yet, at that point, so it shouldn't >> really need state? >> >> The reason I'm wondering is that I think we should consider calling this from >> the GUC assignment hook, at least in postmaster. Which would make it more >> convenient to not have state, I think? > > I agree that this callback should typically not need the state, but I'm not > sure whether it fits into the assignment hook for archive_library. This > callback is primarily meant for situations when you have archiving enabled, > but your module isn't configured yet (e.g., archive_command is empty). In > this case, we keep the WAL around, but we don't try to archive it until > this hook returns true. It's up to each module to define that criteria. I > can imagine someone introducing a GUC in their archive module that > temporarily halts archiving via this callback. In that case, calling it > via a GUC assignment hook probably won't work. In fact, checking whether > archive_command is empty in that hook might not work either. Keeping the state in the configure check callback does not strike me as a problem, FWIW. >>> <programlisting> >>> -typedef void (*ArchiveShutdownCB) (void); >>> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); >>> </programlisting> >>> </para> >>> </sect2> >> >> Perhaps mention that this needs to free state it allocated in the >> ArchiveModuleState, or it'll be leaked? > > done > > I left this out originally because the archiver exits shortly after calling > this. However, if you have DSM segments or something, it's probably wise > to make sure those are cleaned up. And I suppose we might not always exit > immediately after this callback, so establishing the habit of freeing the > state could be a good idea. In addition to updating this part of the docs, > I wrote a shutdown callback for basic_archive that frees its state. This makes sense to me. Still, DSM segments had better be cleaned up with dsm_backend_shutdown(). + basic_archive_context = data->context; + if (CurrentMemoryContext == basic_archive_context) + MemoryContextSwitchTo(TopMemoryContext); + + if (MemoryContextIsValid(basic_archive_context)) + MemoryContextDelete(basic_archive_context); This is a bit confusing, because it means that we enter in the shutdown callback with one context, but exit it under TopMemoryContext. Are you sure that this will be OK when there could be multiple callbacks piled up with before_shmem_exit()? shmem_exit() has nothing specific to memory contexts. Is putting the new headers in src/include/postmaster/ the best move in the long term? Perhaps that's fine, but I was wondering whether a new location like archive/ would make more sense. pg_arch.h being in the postmaster section is fine. -- Michael
Attachment
On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote: > + basic_archive_context = data->context; > + if (CurrentMemoryContext == basic_archive_context) > + MemoryContextSwitchTo(TopMemoryContext); > + > + if (MemoryContextIsValid(basic_archive_context)) > + MemoryContextDelete(basic_archive_context); > > This is a bit confusing, because it means that we enter in the > shutdown callback with one context, but exit it under > TopMemoryContext. Are you sure that this will be OK when there could > be multiple callbacks piled up with before_shmem_exit()? shmem_exit() > has nothing specific to memory contexts. Well, we can't free the memory context while we are in it, so we have to switch to another one. I agree that this is a bit confusing, though. On second thought, I'm not sure it's important to make sure the state is freed in the shutdown callback. It's only called just before the archiver process exits, so we're not really at risk of leaking anything. I suppose we might not always restart the archiver in this case, but I also don't anticipate that behavior changing in the near future. I think this callback is more useful for things like shutting down background workers. I went ahead and removed the shutdown callback from basic_archive and the note about leaking from the documentation. > Is putting the new headers in src/include/postmaster/ the best move in > the long term? Perhaps that's fine, but I was wondering whether a new > location like archive/ would make more sense. pg_arch.h being in the > postmaster section is fine. I moved them to archive/. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote: > On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote: > > + basic_archive_context = data->context; > > + if (CurrentMemoryContext == basic_archive_context) > > + MemoryContextSwitchTo(TopMemoryContext); > > + > > + if (MemoryContextIsValid(basic_archive_context)) > > + MemoryContextDelete(basic_archive_context); > > > > This is a bit confusing, because it means that we enter in the > > shutdown callback with one context, but exit it under > > TopMemoryContext. Are you sure that this will be OK when there could > > be multiple callbacks piled up with before_shmem_exit()? shmem_exit() > > has nothing specific to memory contexts. > > Well, we can't free the memory context while we are in it, so we have to > switch to another one. I agree that this is a bit confusing, though. Why would we be in that memory context? I'd just add an assert documenting we're not. > On second thought, I'm not sure it's important to make sure the state is > freed in the shutdown callback. It's only called just before the archiver > process exits, so we're not really at risk of leaking anything. I suppose > we might not always restart the archiver in this case, but I also don't > anticipate that behavior changing in the near future. I think this > callback is more useful for things like shutting down background workers. I think it's crucial. Otherwise we're just ossifying the design that there's just one archive module active at a time. > I went ahead and removed the shutdown callback from basic_archive and the > note about leaking from the documentation. -1 Greetings, Andres Freund
On Mon, Feb 13, 2023 at 03:37:33PM -0800, Andres Freund wrote: > On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote: >> On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote: >> > + basic_archive_context = data->context; >> > + if (CurrentMemoryContext == basic_archive_context) >> > + MemoryContextSwitchTo(TopMemoryContext); >> > + >> > + if (MemoryContextIsValid(basic_archive_context)) >> > + MemoryContextDelete(basic_archive_context); >> > >> > This is a bit confusing, because it means that we enter in the >> > shutdown callback with one context, but exit it under >> > TopMemoryContext. Are you sure that this will be OK when there could >> > be multiple callbacks piled up with before_shmem_exit()? shmem_exit() >> > has nothing specific to memory contexts. >> >> Well, we can't free the memory context while we are in it, so we have to >> switch to another one. I agree that this is a bit confusing, though. > > Why would we be in that memory context? I'd just add an assert documenting > we're not. > > >> On second thought, I'm not sure it's important to make sure the state is >> freed in the shutdown callback. It's only called just before the archiver >> process exits, so we're not really at risk of leaking anything. I suppose >> we might not always restart the archiver in this case, but I also don't >> anticipate that behavior changing in the near future. I think this >> callback is more useful for things like shutting down background workers. > > I think it's crucial. Otherwise we're just ossifying the design that there's > just one archive module active at a time. > > >> I went ahead and removed the shutdown callback from basic_archive and the >> note about leaking from the documentation. > > -1 Okay. I've added it back in v12 with the suggested adjustment for the memory context stuff. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Feb 13, 2023 at 04:55:58PM -0800, Nathan Bossart wrote: > Okay. I've added it back in v12 with the suggested adjustment for the > memory context stuff. Sorry for then noise, cfbot alerted me to a missing #include, which I've added in v13. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Mon, Feb 13, 2023 at 05:02:37PM -0800, Nathan Bossart wrote: > Sorry for then noise, cfbot alerted me to a missing #include, which I've > added in v13. + basic_archive_context = data->context; + Assert(CurrentMemoryContext != basic_archive_context); So this is what it means to document that we are not in the memory context we are freeing here. That seems good enough to me in this context. Tracking if one of CurrentMemoryContext's parents is the memory context that would be deleted would be another thing, but this does not apply here. I may tweak a bit the comments, but nothing more. And I don't think I have more to add. Andres, do you have anything you would like to mention? -- Michael
Attachment
On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote: > On Mon, Feb 13, 2023 at 05:02:37PM -0800, Nathan Bossart wrote: >> Sorry for then noise, cfbot alerted me to a missing #include, which I've >> added in v13. Sorry for more noise. I noticed that I missed updating the IDENTIFICATION line for shell_archive.c. That's the only change in v14. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote: > On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote: > > I may tweak a bit the comments, but nothing more. And I don't think I > > have more to add. Andres, do you have anything you would like to > > mention? Just some minor comments below. None of them needs to be addressed. > @@ -144,10 +170,12 @@ basic_archive_configured(void) > * Archives one file. > */ > static bool > -basic_archive_file(const char *file, const char *path) > +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) > { > sigjmp_buf local_sigjmp_buf; Not related the things changed here, but this should never have been pushed down into individual archive modules. There's absolutely no way that we're going to keep this up2date and working correctly in random archive modules. And it would break if archive modules are ever called outside of pgarch.c. > +static void > +basic_archive_shutdown(ArchiveModuleState *state) > +{ > + BasicArchiveData *data = (BasicArchiveData *) (state->private_data); The parens around (state->private_data) are imo odd. > + basic_archive_context = data->context; > + Assert(CurrentMemoryContext != basic_archive_context); > + > + if (MemoryContextIsValid(basic_archive_context)) > + MemoryContextDelete(basic_archive_context); I guess I'd personally be paranoid and clean data->context after this. Obviously doesn't matter right now, but at some later date it could be that we'd error out after this point, and re-entered the shutdown callback. > + > +/* > + * Archive module callbacks > + * > + * These callback functions should be defined by archive libraries and returned > + * via _PG_archive_module_init(). ArchiveFileCB is the only required callback. > + * For more information about the purpose of each callback, refer to the > + * archive modules documentation. > + */ > +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state); > +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); > +typedef bool (*ArchiveFileCB) (ArchiveModuleState *state, const char *file, const char *path); > +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); > + > +typedef struct ArchiveModuleCallbacks > +{ > + ArchiveStartupCB startup_cb; > + ArchiveCheckConfiguredCB check_configured_cb; > + ArchiveFileCB archive_file_cb; > + ArchiveShutdownCB shutdown_cb; > +} ArchiveModuleCallbacks; If you wanted you could just define the callback types in the struct now, as we don't need asserts for the types. Greetings, Andres Freund
Thanks for reviewing. On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote: > On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote: >> @@ -144,10 +170,12 @@ basic_archive_configured(void) >> * Archives one file. >> */ >> static bool >> -basic_archive_file(const char *file, const char *path) >> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) >> { >> sigjmp_buf local_sigjmp_buf; > > Not related the things changed here, but this should never have been pushed > down into individual archive modules. There's absolutely no way that we're > going to keep this up2date and working correctly in random archive > modules. And it would break if archive modules are ever called outside of > pgarch.c. Yeah. IIRC I did briefly try to avoid this, but the difficulty was that each module will have its own custom cleanup logic. There's no requirement that a module creates an exception handler, but I imagine that any sufficiently complex one will. In any case, I agree that it's worth trying to pull this out of the individual modules. >> +static void >> +basic_archive_shutdown(ArchiveModuleState *state) >> +{ >> + BasicArchiveData *data = (BasicArchiveData *) (state->private_data); > > The parens around (state->private_data) are imo odd. Oops, removed. >> + basic_archive_context = data->context; >> + Assert(CurrentMemoryContext != basic_archive_context); >> + >> + if (MemoryContextIsValid(basic_archive_context)) >> + MemoryContextDelete(basic_archive_context); > > I guess I'd personally be paranoid and clean data->context after > this. Obviously doesn't matter right now, but at some later date it could be > that we'd error out after this point, and re-entered the shutdown callback. Done. >> +/* >> + * Archive module callbacks >> + * >> + * These callback functions should be defined by archive libraries and returned >> + * via _PG_archive_module_init(). ArchiveFileCB is the only required callback. >> + * For more information about the purpose of each callback, refer to the >> + * archive modules documentation. >> + */ >> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state); >> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state); >> +typedef bool (*ArchiveFileCB) (ArchiveModuleState *state, const char *file, const char *path); >> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state); >> + >> +typedef struct ArchiveModuleCallbacks >> +{ >> + ArchiveStartupCB startup_cb; >> + ArchiveCheckConfiguredCB check_configured_cb; >> + ArchiveFileCB archive_file_cb; >> + ArchiveShutdownCB shutdown_cb; >> +} ArchiveModuleCallbacks; > > If you wanted you could just define the callback types in the struct now, as > we don't need asserts for the types. This crossed my mind. I thought it was nice to have a declaration for each callback that we can copy into the docs, but I'm sure we could do without it, too. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Hi, On 2023-02-16 12:15:12 -0800, Nathan Bossart wrote: > Thanks for reviewing. > > On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote: > > On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote: > >> @@ -144,10 +170,12 @@ basic_archive_configured(void) > >> * Archives one file. > >> */ > >> static bool > >> -basic_archive_file(const char *file, const char *path) > >> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) > >> { > >> sigjmp_buf local_sigjmp_buf; > > > > Not related the things changed here, but this should never have been pushed > > down into individual archive modules. There's absolutely no way that we're > > going to keep this up2date and working correctly in random archive > > modules. And it would break if archive modules are ever called outside of > > pgarch.c. > > Yeah. IIRC I did briefly try to avoid this, but the difficulty was that > each module will have its own custom cleanup logic. It can use PG_TRY/CATCH for that, if the top-level sigsetjmp is in pgarch.c. Or you could add a cleanup callback to the API, to be called after the top-level cleanup in pgarch.c. I'm quite baffled by: /* Close any files left open by copy_file() or compare_files() */ AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId); in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files() completely outside the context of a transaction environment. And it only does the thing you want because you pass parameters that aren't actually valid in the normal use in AtEOSubXact_Files(). I really don't understand how that's supposed to be ok. Greetings, Andres Freund
On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote: > On 2023-02-16 12:15:12 -0800, Nathan Bossart wrote: >> On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote: >> > On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote: >> >> @@ -144,10 +170,12 @@ basic_archive_configured(void) >> >> * Archives one file. >> >> */ >> >> static bool >> >> -basic_archive_file(const char *file, const char *path) >> >> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) >> >> { >> >> sigjmp_buf local_sigjmp_buf; >> > >> > Not related the things changed here, but this should never have been pushed >> > down into individual archive modules. There's absolutely no way that we're >> > going to keep this up2date and working correctly in random archive >> > modules. And it would break if archive modules are ever called outside of >> > pgarch.c. >> >> Yeah. IIRC I did briefly try to avoid this, but the difficulty was that >> each module will have its own custom cleanup logic. > > It can use PG_TRY/CATCH for that, if the top-level sigsetjmp is in pgarch.c. > Or you could add a cleanup callback to the API, to be called after the > top-level cleanup in pgarch.c. Yeah, that seems workable. > I'm quite baffled by: > /* Close any files left open by copy_file() or compare_files() */ > AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId); > > in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files() > completely outside the context of a transaction environment. And it only does > the thing you want because you pass parameters that aren't actually valid in > the normal use in AtEOSubXact_Files(). I really don't understand how that's > supposed to be ok. Hm. Should copy_file() and compare_files() have PG_FINALLY blocks that attempt to close the files instead? What would you recommend? -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote: > On 2023-02-16 12:15:12 -0800, Nathan Bossart wrote: >> On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote: >>> Not related the things changed here, but this should never have been pushed >>> down into individual archive modules. There's absolutely no way that we're >>> going to keep this up2date and working correctly in random archive >>> modules. And it would break if archive modules are ever called outside of >>> pgarch.c. Hmm, yes. That's a bad idea to copy the error handling stack. The maintenance of this code could quickly go wrong. All that had better be put into their own threads, IMO, to bring more visibility on these subjects. > I'm quite baffled by: > /* Close any files left open by copy_file() or compare_files() */ > AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId); > > in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files() > completely outside the context of a transaction environment. And it only does > the thing you want because you pass parameters that aren't actually valid in > the normal use in AtEOSubXact_Files(). I really don't understand how that's > supposed to be ok. As does this part, probably with a backpatch.. Saying that, I have spent more time on the revamped version of the archive modules and it was already doing a lot, so I have applied it as it covered all the points discussed.. -- Michael
Attachment
Hi, On 2023-02-16 13:58:10 -0800, Nathan Bossart wrote: > On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote: > > I'm quite baffled by: > > /* Close any files left open by copy_file() or compare_files() */ > > AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId); > > > > in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files() > > completely outside the context of a transaction environment. And it only does > > the thing you want because you pass parameters that aren't actually valid in > > the normal use in AtEOSubXact_Files(). I really don't understand how that's > > supposed to be ok. > > Hm. Should copy_file() and compare_files() have PG_FINALLY blocks that > attempt to close the files instead? What would you recommend? I don't fully now, it's not entirely clear to me what the goals here were. I think you'd likely need to do a bit of infrastructure work to do this sanely. So far we just didn't have the need to handle files being released in a way like you want to do there. I suspect a good direction would be to use resource owners. Add a separate set of functions that release files on resource owner release. Most of the infrastructure is there already, for temporary files (c.f. OpenTemporaryFile()). Then that resource owner could be reset in case of error. I'm not even sure that erroring out is a reasonable way to implement copy_file(), compare_files(), particularly because you want to return via a return code from basic_archive_files(). Greetings, Andres Freund
On Fri, Feb 17, 2023 at 05:01:47PM +0900, Michael Paquier wrote: > All that had better > be put into their own threads, IMO, to bring more visibility on these > subjects. I created a new thread for these [0]. > Saying that, I have spent more time on the revamped version of the > archive modules and it was already doing a lot, so I have applied > it as it covered all the points discussed.. Thanks! [0] https://postgr.es/m/20230217215624.GA3131134%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Here is a new revision of the restore modules patch set. In this patch set, the interface looks similar to the recent archive modules redesign, and there are separate callbacks for retrieving different types of files. I've attempted to address all the feedback I've received, but there was a lot scattered across different threads, so it's possible I've missed something. Note that 0001 is the stopgap fix for restore_command that's being tracked elsewhere [0]. I was careful to avoid repeating the recent mistake with the SIGTERM handling. This patch set is still a little rough around the edges, but I wanted to post it in case folks had general thoughts about the structure, interface, etc. This implementation restores files synchronously one-by-one just like archive modules, but in the future, I would like to add asynchronous/parallel/batching support. My intent is for this work to move us closer to that. [0] https://postgr.es/m/20230214174755.GA1348509%40nathanxps13 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
- v12-0001-stopgap-fix-for-restore_command.patch
- v12-0002-introduce-routine-for-checking-mutually-exclusiv.patch
- v12-0003-refactor-code-for-restoring-via-shell.patch
- v12-0004-rename-archive-modules.sgml-to-archive-and-resto.patch
- v12-0005-restructure-archive-modules-docs-in-preparation-.patch
- v12-0006-introduce-restore_library.patch
Here is a rebased version of the restore modules patch set. I swapped the patch for the stopgap fix for restore_command with the latest version [0], and I marked the restore/ headers as installable (as was recently done for archive/ [1]). There are no other changes. [0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13 [1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6ad5793 -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
- v13-0001-Move-extra-code-out-of-the-Pre-PostRestoreComman.patch
- v13-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-.patch
- v13-0003-introduce-routine-for-checking-mutually-exclusiv.patch
- v13-0004-refactor-code-for-restoring-via-shell.patch
- v13-0005-rename-archive-modules.sgml-to-archive-and-resto.patch
- v13-0006-restructure-archive-modules-docs-in-preparation-.patch
- v13-0007-introduce-restore_library.patch
I noticed that the new TAP test for basic_archive was failing intermittently for cfbot. It looks like the query for checking that the post-backup WAL is restored sometimes executes before archive recovery is complete (because hot_standby is on). To fix this, I adjusted the test to use poll_query_until instead. There are no other changes in v14. I first tried to set hot_standby to off on the restored node so that the query wouldn't run until archive recovery completed. This seemed like it would work because start() useѕ "pg_ctl --wait", which has the following note in the docs: Startup is considered complete when the PID file indicates that the server is ready to accept connections. However, that's not what happens when hot_standby is off. In that case, the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery starts, which wait_for_postmaster_start() interprets as "ready." I see this was reported before [0], but that discussion fizzled out. IIUC it was done this way to avoid infinite waits when hot_standby is off and standby mode is enabled. I could be missing something obvious, but that doesn't seem necessary when hot_standby is off and recovery mode is enabled because recovery should end at some point (never mind the halting problem). I'm still digging into this and may spin off a new thread if I can conjure up a proposal. [0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
- v14-0001-Move-extra-code-out-of-the-Pre-PostRestoreComman.patch
- v14-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-.patch
- v14-0003-introduce-routine-for-checking-mutually-exclusiv.patch
- v14-0004-refactor-code-for-restoring-via-shell.patch
- v14-0005-rename-archive-modules.sgml-to-archive-and-resto.patch
- v14-0006-restructure-archive-modules-docs-in-preparation-.patch
- v14-0007-introduce-restore_library.patch
On Tue, Mar 14, 2023 at 09:13:09PM -0700, Nathan Bossart wrote: > However, that's not what happens when hot_standby is off. In that case, > the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery > starts, which wait_for_postmaster_start() interprets as "ready." I see > this was reported before [0], but that discussion fizzled out. IIUC it was > done this way to avoid infinite waits when hot_standby is off and standby > mode is enabled. I could be missing something obvious, but that doesn't > seem necessary when hot_standby is off and recovery mode is enabled because > recovery should end at some point (never mind the halting problem). I'm > still digging into this and may spin off a new thread if I can conjure up a > proposal. > > [0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com These days, knowing hot_standby is on by default, and that users would recover up to the end-of-backup record of just use read replicas, do we have a strong case for keeping this GUC parameter at all? It does not strike me that we really need to change a five-year-old behavior if there has been few complaints about it. I agree that it is confusing as it stands, but the long-term simplifications may be worth it in the recovery code (aka less booleans needed to track the flow of the startup process, and less confusion around that). -- Michael
Attachment
On Fri, Feb 17, 2023 at 11:41:32AM -0800, Andres Freund wrote: > I don't fully now, it's not entirely clear to me what the goals here were. I > think you'd likely need to do a bit of infrastructure work to do this > sanely. So far we just didn't have the need to handle files being released in > a way like you want to do there. > > I suspect a good direction would be to use resource owners. Add a separate set > of functions that release files on resource owner release. Most of the > infrastructure is there already, for temporary files > (c.f. OpenTemporaryFile()). Yes, perhaps. I've had good experience with these when it comes to avoid leakages when releasing resources, particularly for resources allocated by external libraries (cough, OpenSSL, cough). And there was some work to make these more scalable, for example. At this stage of the CF, it seems pretty clear to me that this should be pushed to v17, so moved to next CF. -- Michael
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
- v15-0001-Move-extra-code-out-of-the-Pre-PostRestoreComman.patch
- v15-0002-Don-t-proc_exit-in-startup-s-SIGTERM-handler-if-.patch
- v15-0003-introduce-routine-for-checking-mutually-exclusiv.patch
- v15-0004-refactor-code-for-restoring-via-shell.patch
- v15-0005-rename-archive-modules.sgml-to-archive-and-resto.patch
- v15-0006-restructure-archive-modules-docs-in-preparation-.patch
- v15-0007-introduce-restore_library.patch
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Here is another rebase. Given the size of this patch set and the lack of review, I am going to punt this one to v18. -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
another rebase for cfbot -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
rebased -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
Given the lack of interest, I plan to mark the commitfest entry for this patch set as "Withdrawn" shortly. -- nathan