Re: archive modules loose ends - Mailing list pgsql-hackers

From Nathan Bossart
Subject Re: archive modules loose ends
Date
Msg-id 20231121043044.GA3521465@nathanxps13
Whole thread Raw
In response to Re: archive modules loose ends  (Andres Freund <andres@anarazel.de>)
Responses Re: archive modules loose ends
List pgsql-hackers
On Mon, Nov 13, 2023 at 03:35:28PM -0800, Andres Freund wrote:
> On 2023-11-13 16:42:31 -0600, Nathan Bossart wrote:
>> There seems to be no interest in this patch, so I plan to withdraw it from
>> the commitfest system by the end of the month unless such interest
>> materializes.
> 
> I think it might just have arrived too shortly before the feature freeze to be
> worth looking at at the time, and then it didn't really re-raise attention
> until now.  I'm so far behind on keeping up with the list that I rarely end up
> looking far back for things I'd like to have answered... Sorry.

No worries.  I appreciate the review.

>> I see two main options for dealing with this.  One option is to simply have
>> shell_archive (and any other archive modules out there) maintain its own
>> memory context like basic_archive does.  This ends up requiring a whole lot
>> of duplicate code between the two built-in modules, though.  Another option
>> is to have the archiver manage a memory context that it resets after every
>> invocation of the archiving callback, ERROR or not.
> 
> I think passing in a short-lived memory context is a lot nicer to deal with.

Cool.

>> This has the advantage of avoiding code duplication and simplifying things
>> for the built-in modules, but any external modules that rely on palloc'd
>> state being long-lived would need to be adjusted to manage their own
>> long-lived context.  (This would need to be appropriately documented.)
> 
> Alternatively we could provide a longer-lived memory context in
> ArchiveModuleState, set up by the genric infrastructure. That context would
> obviously still need to be explicitly utilized by a module, but no duplicated
> setup code would be required.

Sure.  Right now, I'm not sure there's too much need for that.  A module
could just throw stuff in TopMemoryContext, and you probably wouldn't have
any leaks because the archiver just restarts on any ERROR or
archive_library change.  But that's probably not a pattern we want to
encourage long-term.  I'll jot this down for a follow-up patch idea.

> I think we should just have the AtEOXact_Files() in pgarch.c, then no
> PG_TRY/CATCH is needed here. At the moment I think just about every possible
> use of an archive modules would require using files, so there doesn't seem
> much of a reason to not handle it in pgarch.c.

WFM

> I'd probably reset a few other subsystems at the same time (there's probably
> more):
> - disable_all_timeouts()
> - LWLockReleaseAll()
> - ConditionVariableCancelSleep()
> - pgstat_report_wait_end()
> - ReleaseAuxProcessResources()

I looked around a bit and thought AtEOXact_HashTables() belonged here as
well.  I'll probably give this one another pass to see if there's anything
else obvious.

> It could be worth setting up an errcontext providing the module and file
> that's being processed. I personally find that at least as important as
> setting up a ps string detailing the log file... But I guess that could be a
> separate patch.

Indeed.  Right now we rely on the module to emit sufficiently-detailed
logs, but it'd be nice if they got that for free.

> It'd be nice to add a comment explaining why pgarch_archiveXlog() is the right
> place to handle errors.

Will do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Synchronizing slots from primary to standby
Next
From: "Zhijie Hou (Fujitsu)"
Date:
Subject: RE: Synchronizing slots from primary to standby