Re: archive modules loose ends - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: archive modules loose ends |
Date | |
Msg-id | 20231113233528.tmocodjrypchzvti@awork3.anarazel.de Whole thread Raw |
In response to | Re: archive modules loose ends (Nathan Bossart <nathandbossart@gmail.com>) |
Responses |
Re: archive modules loose ends
|
List | pgsql-hackers |
Hi, On 2023-11-13 16:42:31 -0600, Nathan Bossart wrote: > There seems to be no interest in this patch, so I plan to withdraw it from > the commitfest system by the end of the month unless such interest > materializes. I think it might just have arrived too shortly before the feature freeze to be worth looking at at the time, and then it didn't really re-raise attention until now. I'm so far behind on keeping up with the list that I rarely end up looking far back for things I'd like to have answered... Sorry. I think it's somewhat important to fix this - having a dedicated "recover from error" implementation in a bunch of extension modules seems quite likely to cause problems down the line, when another type of resource needs to be dealt with after errors. I think many non-toy implementations would e.g. need to release lwlocks in case of errors (e.g. because they use a shared hashtable to queue jobs for workers or such). > On Fri, Feb 17, 2023 at 01:56:24PM -0800, Nathan Bossart wrote: > > The first one is the requirement that archive module authors create their > > own exception handlers if they want to make use of ERROR. Ideally, there > > would be a handler in pgarch.c so that authors wouldn't need to deal with > > this. I do see some previous dicussion about this [1] in which I expressed > > concerns about memory management. Looking at this again, I may have been > > overthinking it. IIRC I was thinking about creating a memory context that > > would be switched into for only the archiving callback (and reset > > afterwards), but that might not be necessary. Instead, we could rely on > > module authors to handle this. One example is basic_archive, which > > maintains its own memory context. Alternatively, authors could simply > > pfree() anything that was allocated. > > > > Furthermore, by moving the exception handling to pgarch.c, module authors > > can begin using PG_TRY, etc. in their archiving callbacks, which simplifies > > things a bit. I've attached a work-in-progress patch for this change. > > I took another look at this, and I think І remembered what I was worried > about with memory management. One example is the built-in shell archiving. > Presently, whenever there is an ERROR during archiving via shell, it gets > bumped up to FATAL because the archiver operates at the bottom of the > exception stack. Consequently, there's no need to worry about managing > memory contexts to ensure that palloc'd memory is cleared up after an > error. With the attached patch, we no longer call the archiving callback > while we're at the bottom of the exception stack, so ERRORs no longer get > bumped up to FATALs, and any palloc'd memory won't be freed. > > I see two main options for dealing with this. One option is to simply have > shell_archive (and any other archive modules out there) maintain its own > memory context like basic_archive does. This ends up requiring a whole lot > of duplicate code between the two built-in modules, though. Another option > is to have the archiver manage a memory context that it resets after every > invocation of the archiving callback, ERROR or not. I think passing in a short-lived memory context is a lot nicer to deal with. > This has the advantage of avoiding code duplication and simplifying things > for the built-in modules, but any external modules that rely on palloc'd > state being long-lived would need to be adjusted to manage their own > long-lived context. (This would need to be appropriately documented.) Alternatively we could provide a longer-lived memory context in ArchiveModuleState, set up by the genric infrastructure. That context would obviously still need to be explicitly utilized by a module, but no duplicated setup code would be required. > /* > * check_archive_directory > * > @@ -172,67 +147,19 @@ basic_archive_configured(ArchiveModuleState *state) > static bool > basic_archive_file(ArchiveModuleState *state, const char *file, const char *path) > { > ... > + PG_TRY(); > + { > + /* Archive the file! */ > + basic_archive_file_internal(file, path); > + } > + PG_CATCH(); > { > - /* Since not using PG_TRY, must reset error stack by hand */ > - error_context_stack = NULL; > - > - /* Prevent interrupts while cleaning up */ > - HOLD_INTERRUPTS(); > - > - /* Report the error and clear ErrorContext for next time */ > - EmitErrorReport(); > - FlushErrorState(); > - > /* Close any files left open by copy_file() or compare_files() */ > - AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId); > - > - /* Reset our memory context and switch back to the original one */ > - MemoryContextSwitchTo(oldcontext); > - MemoryContextReset(basic_archive_context); > - > - /* Remove our exception handler */ > - PG_exception_stack = NULL; > + AtEOXact_Files(false); > > - /* Now we can allow interrupts again */ > - RESUME_INTERRUPTS(); > - > - /* Report failure so that the archiver retries this file */ > - return false; > + PG_RE_THROW(); > } I think we should just have the AtEOXact_Files() in pgarch.c, then no PG_TRY/CATCH is needed here. At the moment I think just about every possible use of an archive modules would require using files, so there doesn't seem much of a reason to not handle it in pgarch.c. I'd probably reset a few other subsystems at the same time (there's probably more): - disable_all_timeouts() - LWLockReleaseAll() - ConditionVariableCancelSleep() - pgstat_report_wait_end() - ReleaseAuxProcessResources() > @@ -511,7 +519,58 @@ pgarch_archiveXlog(char *xlog) > snprintf(activitymsg, sizeof(activitymsg), "archiving %s", xlog); > set_ps_display(activitymsg); > > - ret = ArchiveCallbacks->archive_file_cb(archive_module_state, xlog, pathname); > + oldcontext = MemoryContextSwitchTo(archive_context); > + > + /* > + * Since the archiver operates at the bottom of the exception stack, > + * ERRORs turn into FATALs and cause the archiver process to restart. > + * However, using ereport(ERROR, ...) when there are problems is easy to > + * code and maintain. Therefore, we create our own exception handler to > + * catch ERRORs and return false instead of restarting the archiver > + * whenever there is a failure. > + */ > + if (sigsetjmp(local_sigjmp_buf, 1) != 0) > + { > + /* Since not using PG_TRY, must reset error stack by hand */ > + error_context_stack = NULL; > + > + /* Prevent interrupts while cleaning up */ > + HOLD_INTERRUPTS(); > + > + /* Report the error and clear ErrorContext for next time */ > + EmitErrorReport(); > + MemoryContextSwitchTo(oldcontext); > + FlushErrorState(); > + > + /* Flush any leaked data */ > + MemoryContextReset(archive_context); > + > + /* Remove our exception handler */ > + PG_exception_stack = NULL; > + > + /* Now we can allow interrupts again */ > + RESUME_INTERRUPTS(); > + > + /* Report failure so that the archiver retries this file */ > + ret = false; > + } > + else > + { > + /* Enable our exception handler */ > + PG_exception_stack = &local_sigjmp_buf; > + > + /* Archive the file! */ > + ret = ArchiveCallbacks->archive_file_cb(archive_module_state, > + xlog, pathname); > + > + /* Remove our exception handler */ > + PG_exception_stack = NULL; > + > + /* Reset our memory context and switch back to the original one */ > + MemoryContextSwitchTo(oldcontext); > + MemoryContextReset(archive_context); > + } It could be worth setting up an errcontext providing the module and file that's being processed. I personally find that at least as important as setting up a ps string detailing the log file... But I guess that could be a separate patch. It'd be nice to add a comment explaining why pgarch_archiveXlog() is the right place to handle errors. Greetings, Andres Freund
pgsql-hackers by date: