Thread: recovery modules

recovery modules

From

Nathan Bossart

Date:

27 December 2022, 19:24:49

I've attached a patch set that adds the restore_library,
archive_cleanup_library, and recovery_end_library parameters to allow
archive recovery via loadable modules.  This is a follow-up to the
archive_library parameter added in v15 [0] [1].

The motivation behind this change is similar to that of archive_library
(e.g., robustness, performance).  The recovery functions are provided via a
similar interface to archive modules (i.e., an initialization function that
returns the function pointers).  Also, I've extended basic_archive to work
as a restore_library, which makes it easy to demonstrate both archiving and
recovery via a loadable module in a TAP test.

A few miscellaneous design notes:

* Unlike archive modules, recovery libraries cannot be changed at runtime.
There isn't a safe way to unload a library, and archive libraries work
around this restriction by restarting the archiver process.  Since recovery
libraries are loaded via the startup and checkpointer processes (which
cannot be trivially restarted like the archiver), the same workaround is
not feasible.

* pg_rewind uses restore_command, but there isn't a straightforward path to
support restore_library.  I haven't addressed this in the attached patches,
but perhaps this is a reason to allow specifying both restore_command and
restore_library at the same time.  pg_rewind would use restore_command, and
the server would use restore_library.

* I've combined the documentation to create one "Archive and Recovery
Modules" chapter.  They are similar enough that it felt silly to write a
separate chapter for recovery modules.  However, I've still split them up
within the chapter, and they have separate initialization functions.  This
retains backward compatibility with v15 archive modules, keeps them
logically separate, and hopefully hints at the functional differences.
Even so, if you want to create one library for both archive and recovery,
there is nothing stopping you.

* Unlike archive modules, I didn't add any sort of "check" or "shutdown"
callbacks.  The recovery_end_library parameter makes a "shutdown" callback
largely redundant, and I couldn't think of any use-case for a "check"
callback.  However, new callbacks could be added in the future if needed.

* Unlike archive modules, restore_library and recovery_end_library may be
loaded in single-user mode.  I believe this works out-of-the-box, but it's
an extra thing to be cognizant of.

* If both the library and command parameter for a recovery action is
specified, the server should fail to startup, but if a misconfiguration is
detected after SIGHUP, we emit a WARNING and continue using the library.  I
originally thought about emitting an ERROR like the archiver does in this
case, but failing recovery and stopping the server felt a bit too harsh.
I'm curious what folks think about this.

* Іt could be nice to rewrite pg_archivecleanup for use as an
archive_cleanup_library, but I don't think that needs to be a part of this
patch set.

[0] https://postgr.es/m/668D2428-F73B-475E-87AE-F89D67942270%40amazon.com
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=5ef1eef

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

27 December 2022, 22:11:11

Hi,

On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote:
> I've attached a patch set that adds the restore_library,
> archive_cleanup_library, and recovery_end_library parameters to allow
> archive recovery via loadable modules.  This is a follow-up to the
> archive_library parameter added in v15 [0] [1].

Why do we need N parameters for this? To me it seems more sensible to have one
parameter that then allows a library to implement all these (potentially
optionally).

> * Unlike archive modules, recovery libraries cannot be changed at runtime.
> There isn't a safe way to unload a library, and archive libraries work
> around this restriction by restarting the archiver process.  Since recovery
> libraries are loaded via the startup and checkpointer processes (which
> cannot be trivially restarted like the archiver), the same workaround is
> not feasible.

I don't think that's a convincing reason to not support configuration
changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded
library is cheap. All that's needed is to redirect the relevant function
calls.

> * pg_rewind uses restore_command, but there isn't a straightforward path to
> support restore_library.  I haven't addressed this in the attached patches,
> but perhaps this is a reason to allow specifying both restore_command and
> restore_library at the same time.  pg_rewind would use restore_command, and
> the server would use restore_library.

That seems problematic, leading to situations where one might not be able to
use restore_command anymore, because it's not feasible to do
segment-by-segment restoration.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

27 December 2022, 22:37:11

On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote:
> On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote:
>> I've attached a patch set that adds the restore_library,
>> archive_cleanup_library, and recovery_end_library parameters to allow
>> archive recovery via loadable modules.  This is a follow-up to the
>> archive_library parameter added in v15 [0] [1].
> 
> Why do we need N parameters for this? To me it seems more sensible to have one
> parameter that then allows a library to implement all these (potentially
> optionally).

The main reason is flexibility.  Separate parameters allow using a library
for one thing and a command for another, or different libraries for
different things.  If that isn't a use-case we wish to support, I don't
mind combining all three into a single recovery_library parameter.

>> * Unlike archive modules, recovery libraries cannot be changed at runtime.
>> There isn't a safe way to unload a library, and archive libraries work
>> around this restriction by restarting the archiver process.  Since recovery
>> libraries are loaded via the startup and checkpointer processes (which
>> cannot be trivially restarted like the archiver), the same workaround is
>> not feasible.
> 
> I don't think that's a convincing reason to not support configuration
> changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded
> library is cheap. All that's needed is to redirect the relevant function
> calls.

This might leave some stuff around (e.g., GUCs, background workers), but if
that isn't a concern, I can adjust it to work as you describe.

>> * pg_rewind uses restore_command, but there isn't a straightforward path to
>> support restore_library.  I haven't addressed this in the attached patches,
>> but perhaps this is a reason to allow specifying both restore_command and
>> restore_library at the same time.  pg_rewind would use restore_command, and
>> the server would use restore_library.
> 
> That seems problematic, leading to situations where one might not be able to
> use restore_command anymore, because it's not feasible to do
> segment-by-segment restoration.

I'm not following why this would make segment-by-segment restoration
infeasible.  Would you mind elaborating?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

27 December 2022, 22:45:30

Hi,

On 2022-12-27 14:37:11 -0800, Nathan Bossart wrote:
> On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote:
> > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote:
> >> I've attached a patch set that adds the restore_library,
> >> archive_cleanup_library, and recovery_end_library parameters to allow
> >> archive recovery via loadable modules.  This is a follow-up to the
> >> archive_library parameter added in v15 [0] [1].
> > 
> > Why do we need N parameters for this? To me it seems more sensible to have one
> > parameter that then allows a library to implement all these (potentially
> > optionally).
> 
> The main reason is flexibility.  Separate parameters allow using a library
> for one thing and a command for another, or different libraries for
> different things.  If that isn't a use-case we wish to support, I don't
> mind combining all three into a single recovery_library parameter.

I think the configuration complexity is a sufficient concern to not go that
direction...


> >> * Unlike archive modules, recovery libraries cannot be changed at runtime.
> >> There isn't a safe way to unload a library, and archive libraries work
> >> around this restriction by restarting the archiver process.  Since recovery
> >> libraries are loaded via the startup and checkpointer processes (which
> >> cannot be trivially restarted like the archiver), the same workaround is
> >> not feasible.
> > 
> > I don't think that's a convincing reason to not support configuration
> > changes. Sure, libraries cannot be unloaded, but an unnecessarily loaded
> > library is cheap. All that's needed is to redirect the relevant function
> > calls.
> 
> This might leave some stuff around (e.g., GUCs, background workers), but if
> that isn't a concern, I can adjust it to work as you describe.

You can still have a shutdown hook re background workers. I don't think the
GUCs matter, given that it's the startup/checkpointer processes.


> >> * pg_rewind uses restore_command, but there isn't a straightforward path to
> >> support restore_library.  I haven't addressed this in the attached patches,
> >> but perhaps this is a reason to allow specifying both restore_command and
> >> restore_library at the same time.  pg_rewind would use restore_command, and
> >> the server would use restore_library.
> > 
> > That seems problematic, leading to situations where one might not be able to
> > use restore_command anymore, because it's not feasible to do
> > segment-by-segment restoration.
> 
> I'm not following why this would make segment-by-segment restoration
> infeasible.  Would you mind elaborating?

Latency effects for example can make it infeasible to do segment-by-segment
restoration infeasible performance wise. On the most extreme end, imagine WAL
archived to tape or such.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

27 December 2022, 23:04:28

On Tue, Dec 27, 2022 at 02:45:30PM -0800, Andres Freund wrote:
> On 2022-12-27 14:37:11 -0800, Nathan Bossart wrote:
>> On Tue, Dec 27, 2022 at 02:11:11PM -0800, Andres Freund wrote:
>> > On 2022-12-27 11:24:49 -0800, Nathan Bossart wrote:
>> >> * pg_rewind uses restore_command, but there isn't a straightforward path to
>> >> support restore_library.  I haven't addressed this in the attached patches,
>> >> but perhaps this is a reason to allow specifying both restore_command and
>> >> restore_library at the same time.  pg_rewind would use restore_command, and
>> >> the server would use restore_library.
>> > 
>> > That seems problematic, leading to situations where one might not be able to
>> > use restore_command anymore, because it's not feasible to do
>> > segment-by-segment restoration.
>> 
>> I'm not following why this would make segment-by-segment restoration
>> infeasible.  Would you mind elaborating?
> 
> Latency effects for example can make it infeasible to do segment-by-segment
> restoration infeasible performance wise. On the most extreme end, imagine WAL
> archived to tape or such.

I'm sorry, I'm still lost here.  Wouldn't restoration via library tend to
improve latency?  Is your point that clusters may end up depending on this
improvement so much that a shell command would no longer be able to keep
up?  I might be creating a straw man, but this seems like less of a concern
for pg_rewind since it isn't meant for continuous, ongoing restoration.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

28 December 2022, 00:43:07

Hi,

On 2022-12-27 15:04:28 -0800, Nathan Bossart wrote:
> I'm sorry, I'm still lost here.  Wouldn't restoration via library tend to
> improve latency?  Is your point that clusters may end up depending on this
> improvement so much that a shell command would no longer be able to keep
> up?

Yes.


> I might be creating a straw man, but this seems like less of a concern
> for pg_rewind since it isn't meant for continuous, ongoing restoration.

pg_rewind is in the critical path of a bunch of HA scenarios, so I wouldn't
say that restore performance isn't important...

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

29 December 2022, 19:43:26

Here is a new patch set with the following changes:

* The restore_library, archive_cleanup_library, and recovery_end_library
parameters are merged into one parameter.

* restore_library can now be changed via SIGHUP.  To provide a way for
modules to clean up when their callbacks are unloaded, I've introduced an
optional shutdown callback.

* Parameter misconfigurations are now always ERRORs.  I'm less confident
that we can get by with just a WARNING now that restore_library can be
changed via SIGHUP, and this makes things more consistent with
archive_library/command.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

03 January 2023, 17:59:17

Here is a rebased patch set for cfbot.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

03 January 2023, 19:05:38

On Tue, Jan 03, 2023 at 09:59:17AM -0800, Nathan Bossart wrote:
> Here is a rebased patch set for cfbot.

I noticed that cfbot's Windows tests are failing because the backslashes in
the archive directory path are causing escaping problems.  Here is an
attempt to fix that by converting all backslashes to forward slashes, which
is what other tests (e.g., 025_stuck_on_old_timeline.pl) do.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

11 January 2023, 19:29:01

Thanks for taking a look.

On Wed, Jan 11, 2023 at 04:53:39PM +0900, Michael Paquier wrote:
> Hmm.  Is passing down the file name used as a cutoff point the best
> interface for the modules?  Perhaps passing down the redo LSN and its
> TLI would be a cleaner approach in terms of flexibility?  I agree with
> letting the startup enforce these numbers as that can be easy to mess
> up for plugin authors, leading to critical problems.

I'm having trouble thinking of any practical advantage of providing the
redo LSN and TLI.  If the main use-case is removing older archives as the
documentation indicates, it seems better to provide the file name so that
you can plug it straight into strcmp() to determine whether the file can be
removed (i.e., what pg_archivecleanup does).  If we provided the LSN and
TLI instead, you'd either need to convert that into a WAL file name for
strcmp(), or you'd need to convert the candidate file name into an LSN and
TLI and compare against those.

> "basic_archive" does not reflect what this module does.  Using one
> library simplifies the whole configuration picture and the tests, so
> perhaps something like basic_wal_module, or something like that, would
> be better long-term?

I initially created a separate basic_restore module, but decided to fold it
into basic_archive to simplify the patch and tests.  I hesitated to rename
it because it already exists in v15, and since it deals with creating and
restoring archive files, the name still seemed somewhat accurate.  That
being said, I don't mind renaming it if that's what folks want.

I've attached a new patch set that is rebased over c96de2c.  There are no
other changes.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

12 January 2023, 18:17:21

On Thu, Jan 12, 2023 at 03:30:40PM +0900, Michael Paquier wrote:
> On Wed, Jan 11, 2023 at 11:29:01AM -0800, Nathan Bossart wrote:
>> I initially created a separate basic_restore module, but decided to fold it
>> into basic_archive to simplify the patch and tests.  I hesitated to rename
>> it because it already exists in v15, and since it deals with creating and
>> restoring archive files, the name still seemed somewhat accurate.  That
>> being said, I don't mind renaming it if that's what folks want.
> 
> I've done that in the past for pg_verify_checksums -> pg_checksums, so
> I would not mind renaming it so as it reflects better its role.
> (Being outvoted is fine for me if this suggestion sounds bad).

IMHO I don't think there's an urgent need to rename it, but if there's a
better name that people like, I'm happy to do so.

> Saying that, 0001 seems fine on its own (minus the redo LSN/TLI with
> the duplication for the segment name build), so I would be tempted to
> get this one done.  My gut tells me that we'd better remove the
> duplication and just pass down the two fields to
> shell_archive_cleanup() and shell_recovery_end(), with the segment
> name given to ExecuteRecoveryCommand()..

I moved the duplicated logic to its own function in v6.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

16 January 2023, 22:40:40

On Mon, Jan 16, 2023 at 04:36:01PM +0900, Michael Paquier wrote:
> Once this issue was fixed, nothing else stood out, so applied this
> part.

Thanks!  I've attached a rebased version of the rest of the patch set.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

17 January 2023, 18:23:56

On Tue, Jan 17, 2023 at 02:32:03PM +0900, Michael Paquier wrote:
> Could it be cleaner in the long term to remove entirely
> BuildRestoreCommand() and move the conversion of the xlogpath with
> make_native_path() one level higher in the stack?

Yeah, this seems cleaner.  I removed BuildRestoreCommand() in v8.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

18 January 2023, 04:44:27

On Wed, Jan 18, 2023 at 11:29:20AM +0900, Michael Paquier wrote:
> And 0002 is applied.

Thanks.  Here's a rebased version of the last patch.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v9-0001-Allow-recovery-via-loadable-modules.patch

Re: recovery modules

From

Nathan Bossart

Date:

23 January 2023, 21:44:28

On Mon, Jan 23, 2023 at 11:44:57AM +0900, Michael Paquier wrote:
> Thanks for the rebase.

Thanks for the detailed review.

> The final state of the documentation is as follows:
> 51. Archive and Recovery Modules
>     51.1. Archive Module Initialization Functions
>     51.2. Archive Module Callbacks
>     51.3. Recovery Module Initialization Functions
>     51.4. Recovery Module Callbacks
> 
> I am not completely sure whether this grouping is the best thing to
> do.  Wouldn't it be better to move that into two different
> sub-sections instead?  One layout suggestion:
> 51. WAL Modules
>   51.1. Archive Modules
>     51.1.1. Archive Module Initialization Functions
>     51.1.2. Archive Module Callbacks
>   51.2. Recovery Modules
>     51.2.1. Recovery Module Initialization Functions
>     51.2.2. Recovery Module Callbacks
> 
> Putting both of them in the same section sounds like a good idea per
> the symmetry that one would likely have between the code paths of
> archiving and recovery, so as they share some common knowledge.
> 
> This kinds of comes back to the previous suggestion to rename
> basic_archive to something like wal_modules, that covers both
> archiving and recovery.  I does not seem like this would overlap with
> RMGRs, which is much more internal anyway.

I updated the documentation as you suggested, and I renamed basic_archive
to basic_wal_module.

>      (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -     errmsg("must specify restore_command when standby mode is not enabled")));
> +     errmsg("must specify restore_command or a restore_library that defines "
> +            "a restore callback when standby mode is not enabled")));
> This is long.  Shouldn't this be split into an errdetail() to list all
> the options at hand?

Should the errmsg() be something like "recovery parameters misconfigured"?

> -   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
> -       ereport(ERROR,
> -               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
> -                errmsg("both archive_command and archive_library set"),
> -                errdetail("Only one of archive_command, archive_library may be set.")));
> +   CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library",
> +                              XLogArchiveCommand, "archive_command");
> 
> The introduction of this routine could be a patch on its own, as it
> impacts the archiving path.

I moved this to a separate patch.

> - Startup reloading, as of StartupRereadConfig().  This code could
> involve a WAL receiver restart depending on a change in the slot
> change or in primary_conninfo, and force at the same time an ERROR
> because of conflicting recovery library and command configuration.
> This one should be safe because pendingWalRcvRestart would just be
> considered later on by the startup process itself while waiting for
> WAL to become available.  Still this could deserve a comment?  Even if
> there is a misconfiguration, a reload on a standby would enforce a
> FATAL in the startup process, taking down the whole server.

Do you think the parameter checks should go before the WAL receiver restart
logic?

> - Checkpointer initialization, as of CheckpointerMain().  A
> configuration failure in this code path, aka server startup, causes
> the server to loop infinitely on FATAL with the misconfiguration
> showing up all the time..  This is a problem.

Perhaps this is a reason to move the parameter check in CheckpointerMain()
to after the sigsetjmp() block.  This should avoid full server restarts.
Only the checkpointer process would loop with the ERROR.

> - Last comes the checkpointer GUC reloading, as of
> HandleCheckpointerInterrupts(), with a second problem.  This
> introduces a failure path where ConfigReloadPending is processed at
> the same time as ShutdownRequestPending based on the way it is coded,
> interacting with what would be a normal shutdown in some cases?  And
> actually, if you enforce a misconfiguration on reload, the
> checkpointer reports an error but it does not enforce a process
> restart, hence it keeps around the new, incorrect, configuration while
> waiting for a new checkpoint to happen once restore_library and
> archive_cleanup_command are set.  This could lead to surprises, IMO.
> Upgrading to a FATAL in this code path triggers an infinite loop, like
> the startup path.

If we move the parameter check in CheckpointerMain() as described above,
the checkpointer should be unable to proceed with an incorrect
configuration.  For the normal shutdown part, do you think the
ShutdownRequestPending block should be moved to before the
ConfigReloadPending block in HandleCheckpointerInterrupts()?

> If the archive_cleanup_command ballback of a restore library triggers
> a FATAL, it seems to me that it would continuously trigger a server
> restart, actually.  Perhaps that's something to document, in
> comparison to the safe fallbacks of the shell command where we don't
> force an ERROR to give priority to the stability of the checkpointer.

I'm not sure it's worth documenting that ereport(FATAL, ...) in the
checkpointer process will cause a server restart.  In most cases, an
extension author would use ERROR, which, if we make the aforementioned
changes, would at most cause the checkpointer to effectively restart.  This
is similar to archive modules where an ERROR causes only the archiver
process to restart.  Also, we document that recovery libraries are loaded
in the startup and checkpointer processes, so IMO it should be relatively
apparent that doing something like FATAL or proc_exit() is bad.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Michael Paquier

Date:

25 January 2023, 07:34:21

On Mon, Jan 23, 2023 at 01:44:28PM -0800, Nathan Bossart wrote:
> On Mon, Jan 23, 2023 at 11:44:57AM +0900, Michael Paquier wrote:
> I updated the documentation as you suggested, and I renamed basic_archive
> to basic_wal_module.

Thanks.  The renaming of basic_archive to basic_wal_module looked
fine, so applied.

While looking at the docs, I found a bit confusing that the main
section of the WAL modules included the full description for the
archive modules, so I have moved that into the sect2 dedicated to the
archive modules instead, as of the attached.  0004 has been updated in
consequence, with details about the recovery bits within its own
sect2.

>>      (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> -     errmsg("must specify restore_command when standby mode is not enabled")));
>> +     errmsg("must specify restore_command or a restore_library that defines "
>> +            "a restore callback when standby mode is not enabled")));
>> This is long.  Shouldn't this be split into an errdetail() to list all
>> the options at hand?
>
> Should the errmsg() be something like "recovery parameters misconfigured"?

Hmm.  Here is an idea:
- errmsg: "must specify restore option when standby mode is not enabled"
- errdetail: "Either restore_command or restore_library need to be
specified."

>> -   if (XLogArchiveLibrary[0] != '\0' && XLogArchiveCommand[0] != '\0')
>> -       ereport(ERROR,
>> -               (errcode(ERRCODE_INVALID_PARAMETER_VALUE),
>> -                errmsg("both archive_command and archive_library set"),
>> -                errdetail("Only one of archive_command, archive_library may be set.")));
>> +   CheckMutuallyExclusiveGUCs(XLogArchiveLibrary, "archive_library",
>> +                              XLogArchiveCommand, "archive_command");
>>
>> The introduction of this routine could be a patch on its own, as it
>> impacts the archiving path.
>
> I moved this to a separate patch.

While pondering about that, I found a bit sad that this only works for
string GUCs, while it could be possible to do the same kind of checks
for the other GUC types with a more generic routine?  Not enum,
obviously, but int, float, bool and real, with the a failure if both
GUCs are set to non-default values?  Also, and I may be missing
something here, do we really need to pass the value of the parameters
to check?  Wouldn't it be enough to check for the case where both
parameters are set to their non-default values after reloading?

>> - Startup reloading, as of StartupRereadConfig().  This code could
>> involve a WAL receiver restart depending on a change in the slot
>> change or in primary_conninfo, and force at the same time an ERROR
>> because of conflicting recovery library and command configuration.
>> This one should be safe because pendingWalRcvRestart would just be
>> considered later on by the startup process itself while waiting for
>> WAL to become available.  Still this could deserve a comment?  Even if
>> there is a misconfiguration, a reload on a standby would enforce a
>> FATAL in the startup process, taking down the whole server.
>
> Do you think the parameter checks should go before the WAL receiver restart
> logic?

Yeah, switching the order makes the logic more robust IMO.

>> - Checkpointer initialization, as of CheckpointerMain().  A
>> configuration failure in this code path, aka server startup, causes
>> the server to loop infinitely on FATAL with the misconfiguration
>> showing up all the time..  This is a problem.
>
> Perhaps this is a reason to move the parameter check in CheckpointerMain()
> to after the sigsetjmp() block.  This should avoid full server restarts.
> Only the checkpointer process would loop with the ERROR.

The loop part is annoying..  I've never been a fan of adding this
cross-value checks for the archiver modules in the first place, and it
would make things much simpler in the checkpointer if we need to think
about that as we want these values to be reloadable.  Perhaps this
could just be an exception where we just give priority on one over the
other archive_cleanup_command?  The startup process has a well-defined
sequence after a failure, while the checkpointer is designed to remain
robust.

>> - Last comes the checkpointer GUC reloading, as of
>> HandleCheckpointerInterrupts(), with a second problem.  This
>> introduces a failure path where ConfigReloadPending is processed at
>> the same time as ShutdownRequestPending based on the way it is coded,
>> interacting with what would be a normal shutdown in some cases?  And
>> actually, if you enforce a misconfiguration on reload, the
>> checkpointer reports an error but it does not enforce a process
>> restart, hence it keeps around the new, incorrect, configuration while
>> waiting for a new checkpoint to happen once restore_library and
>> archive_cleanup_command are set.  This could lead to surprises, IMO.
>> Upgrading to a FATAL in this code path triggers an infinite loop, like
>> the startup path.
>
> If we move the parameter check in CheckpointerMain() as described above,
> the checkpointer should be unable to proceed with an incorrect
> configuration.  For the normal shutdown part, do you think the
> ShutdownRequestPending block should be moved to before the
> ConfigReloadPending block in HandleCheckpointerInterrupts()?

I would not touch this order.  This could influence the setup a
shutdown checkpoint relies on, for one.

>> If the archive_cleanup_command ballback of a restore library triggers
>> a FATAL, it seems to me that it would continuously trigger a server
>> restart, actually.  Perhaps that's something to document, in
>> comparison to the safe fallbacks of the shell command where we don't
>> force an ERROR to give priority to the stability of the checkpointer.
>
> I'm not sure it's worth documenting that ereport(FATAL, ...) in the
> checkpointer process will cause a server restart.  In most cases, an
> extension author would use ERROR, which, if we make the aforementioned
> changes, would at most cause the checkpointer to effectively restart.  This
> is similar to archive modules where an ERROR causes only the archiver
> process to restart.  Also, we document that recovery libraries are loaded
> in the startup and checkpointer processes, so IMO it should be relatively
> apparent that doing something like FATAL or proc_exit() is bad.

Okay.  Fine by me.  This could always be amended later, as required.

+       if (recoveryRestoreCommand[0] == '\0')
+           RecoveryContext.restore_cb = NULL;
+       if (archiveCleanupCommand[0] == '\0')
+           RecoveryContext.archive_cleanup_cb = NULL;
+       if (recoveryEndCommand[0] == '\0')
+           RecoveryContext.recovery_end_cb = NULL;

Could it be cleaner to put this knowledge directly in shell_restore.c
with a fast-exit path after entering each callback?  It does not
strike me as a good thing to sprinkle more than necessary the
knowledge about the commands.

Another question that popped in my mind: could it be better to have
two different shutdown callbacks for the checkpointer and the startup
process?  Having some tests for both, like shell_archive.c, would be
nice, actually.
--
Michael

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

27 January 2023, 05:40:58

On Wed, Jan 25, 2023 at 04:34:21PM +0900, Michael Paquier wrote:
> The loop part is annoying..  I've never been a fan of adding this
> cross-value checks for the archiver modules in the first place, and it
> would make things much simpler in the checkpointer if we need to think
> about that as we want these values to be reloadable.  Perhaps this
> could just be an exception where we just give priority on one over the
> other archive_cleanup_command?  The startup process has a well-defined
> sequence after a failure, while the checkpointer is designed to remain
> robust.

Yeah, there are some problems here.  If we ERROR, we'll just bounce back to
the sigsetjmp() block once a second, and we'll never pick up configuration
reloads, shutdown signals, etc.  If we FATAL, we'll just rapidly restart
over and over.  Given the dicussion about misconfigured archiving
parameters [0], I doubt folks will be okay with giving priority to one or
the other.

I'm currently thinking that the checkpointer should set a flag and clear
the recovery callbacks when a misconfiguration is detected.  Anytime the
checkpointer tries to use the archive-cleanup callback, a WARNING would be
emitted.  This is similar to an approach I proposed for archiving
misconfigurations (that we didn't proceed with) [1].  Given the
aformentioned problems, this approach might be more suitable for the
checkpointer than it is for the archiver.

Thoughts?

[0] https://postgr.es/m/9ee5d180-2c32-a1ca-d3d7-63a723f68d9a%40enterprisedb.com
[1] https://postgr.es/m/20220914222736.GA3042279%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Nathan Bossart

Date:

27 January 2023, 23:28:21

On Thu, Jan 26, 2023 at 09:40:58PM -0800, Nathan Bossart wrote:
> On Wed, Jan 25, 2023 at 04:34:21PM +0900, Michael Paquier wrote:
>> The loop part is annoying..  I've never been a fan of adding this
>> cross-value checks for the archiver modules in the first place, and it
>> would make things much simpler in the checkpointer if we need to think
>> about that as we want these values to be reloadable.  Perhaps this
>> could just be an exception where we just give priority on one over the
>> other archive_cleanup_command?  The startup process has a well-defined
>> sequence after a failure, while the checkpointer is designed to remain
>> robust.
> 
> Yeah, there are some problems here.  If we ERROR, we'll just bounce back to
> the sigsetjmp() block once a second, and we'll never pick up configuration
> reloads, shutdown signals, etc.  If we FATAL, we'll just rapidly restart
> over and over.  Given the dicussion about misconfigured archiving
> parameters [0], I doubt folks will be okay with giving priority to one or
> the other.
> 
> I'm currently thinking that the checkpointer should set a flag and clear
> the recovery callbacks when a misconfiguration is detected.  Anytime the
> checkpointer tries to use the archive-cleanup callback, a WARNING would be
> emitted.  This is similar to an approach I proposed for archiving
> misconfigurations (that we didn't proceed with) [1].  Given the
> aformentioned problems, this approach might be more suitable for the
> checkpointer than it is for the archiver.

The more I think about this, the more I wonder whether we really need to
include archive_cleanup_command and recovery_end_command in recovery
modules.  Another weird thing with the checkpointer is that the
restore_library will stay loaded long after recovery is finished, and it'll
be loaded regardless of whether recovery is required in the first place.
Of course, that typically won't cause any problems, and we could wait until
we need to do archive cleanup to load the library (and call its shutdown
callback when recovery is finished), but this strikes me as potentially
more complexity than the feature is worth.  Perhaps we should just focus on
covering the restore_command functionality for now and add the rest later.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

28 January 2023, 00:09:39

Hi,

On 2023-01-27 15:28:21 -0800, Nathan Bossart wrote:
> The more I think about this, the more I wonder whether we really need to
> include archive_cleanup_command and recovery_end_command in recovery
> modules.

I think it would be hard to write a good module that isn't just implementing
the existing commands without it. Needing to clean up archives and reacting to
the end of recovery seems a pretty core task.

> Another weird thing with the checkpointer is that the restore_library will
> stay loaded long after recovery is finished, and it'll be loaded regardless
> of whether recovery is required in the first place.

I don't see a problem with that. And I suspect we might even end up there
for other reasons.

I was briefly wondering whether it'd be worth trying to offload things like
archive_cleanup_command from checkpointer to a different process, for
robustness. But given that it's pretty much required for performance that the
module runs in the startup process, that ship probably has sailed.

Greetings,

Andres Freund

Re: recovery modules

From

Andres Freund

Date:

28 January 2023, 00:23:19

Hi,

On 2023-01-25 16:34:21 +0900, Michael Paquier wrote:
> diff --git a/src/include/access/xlogarchive.h b/src/include/access/xlogarchive.h
> index 299304703e..71c9b88165 100644
> --- a/src/include/access/xlogarchive.h
> +++ b/src/include/access/xlogarchive.h
> @@ -30,9 +30,45 @@ extern bool XLogArchiveIsReady(const char *xlog);
>  extern bool XLogArchiveIsReadyOrDone(const char *xlog);
>  extern void XLogArchiveCleanup(const char *xlog);
>  
> -extern bool shell_restore(const char *file, const char *path,
> -                          const char *lastRestartPointFileName);
> -extern void shell_archive_cleanup(const char *lastRestartPointFileName);
> -extern void shell_recovery_end(const char *lastRestartPointFileName);
> +/*
> + * Recovery module callbacks
> + *
> + * These callback functions should be defined by recovery libraries and
> + * returned via _PG_recovery_module_init().  For more information about the
> + * purpose of each callback, refer to the recovery modules documentation.
> + */
> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path,
> +                                   const char *lastRestartPointFileName);
> +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName);
> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName);
> +typedef void (*RecoveryShutdownCB) (void);

I think the signature of these forces bad coding practices, because there's no
way to have state within a recovery module (as there's no parameter for it).


It's possible we would eventually support multiple modules, e.g. restoring
from shorter term file based archiving and from longer term archiving in some
blob store. Then we'll regret not having a varible for this.


> +typedef struct RecoveryModuleCallbacks
> +{
> +    RecoveryRestoreCB restore_cb;
> +    RecoveryArchiveCleanupCB archive_cleanup_cb;
> +    RecoveryEndCB recovery_end_cb;
> +    RecoveryShutdownCB shutdown_cb;
> +} RecoveryModuleCallbacks;
> +
> +extern RecoveryModuleCallbacks RecoveryContext;

I think that'll typically be interpreteted as a MemoryContext by readers.

Also, why is this a global var? Exported too?


> +/*
> + * Type of the shared library symbol _PG_recovery_module_init that is looked up
> + * when loading a recovery library.
> + */
> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb);

I think this is a bad way to return callbacks. This way the
RecoveryModuleCallbacks needs to be modifiable, which makes the job for the
compiler harder (and isn't the greatest for security).

I strongly encourage you to follow the model used e.g. by tableam. The init
function should return a pointer to a *constant* struct. Which is compile-time
initialized with the function pointers.

See the bottom of heapam_handler.c for how that ends up looking.


> +void
> +LoadRecoveryCallbacks(void)
> +{
> +    RecoveryModuleInit init;
> +
> +    /*
> +     * If the shell command is enabled, use our special initialization
> +     * function.  Otherwise, load the library and call its
> +     * _PG_recovery_module_init().
> +     */
> +    if (restoreLibrary[0] == '\0')
> +        init = shell_restore_init;
> +    else
> +        init = (RecoveryModuleInit)
> +            load_external_function(restoreLibrary, "_PG_recovery_module_init",
> +                                   false, NULL);

Why a special rule for shell, instead of just defaulting the GUC to it?


> +    /*
> +     * If using shell commands, remove callbacks for any commands that are not
> +     * set.
> +     */
> +    if (restoreLibrary[0] == '\0')
> +    {
> +        if (recoveryRestoreCommand[0] == '\0')
> +            RecoveryContext.restore_cb = NULL;
> +        if (archiveCleanupCommand[0] == '\0')
> +            RecoveryContext.archive_cleanup_cb = NULL;
> +        if (recoveryEndCommand[0] == '\0')
> +            RecoveryContext.recovery_end_cb = NULL;

I'd just mandate that these are implemented and that the module has to handle
if it doesn't need to do anything.



> +    /*
> +     * Check for invalid combinations of the command/library parameters and
> +     * load the callbacks.
> +     */
> +    CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library",
> +                               recoveryRestoreCommand, "restore_command");
> +    CheckMutuallyExclusiveGUCs(restoreLibrary, "restore_library",
> +                               recoveryEndCommand, "recovery_end_command");
> +    before_shmem_exit(call_recovery_module_shutdown_cb, 0);
> +    LoadRecoveryCallbacks();

This kind of sequence is duplicated into several places.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

28 January 2023, 00:59:10

On Fri, Jan 27, 2023 at 04:23:19PM -0800, Andres Freund wrote:
>> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path,
>> +                                   const char *lastRestartPointFileName);
>> +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName);
>> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName);
>> +typedef void (*RecoveryShutdownCB) (void);
> 
> I think the signature of these forces bad coding practices, because there's no
> way to have state within a recovery module (as there's no parameter for it).
> 
> It's possible we would eventually support multiple modules, e.g. restoring
> from shorter term file based archiving and from longer term archiving in some
> blob store. Then we'll regret not having a varible for this.

Are you suggesting that we add a "void *arg" to each one of these?  Or put
the arguments into a struct?  Or something else?

>> +extern RecoveryModuleCallbacks RecoveryContext;
> 
> I think that'll typically be interpreteted as a MemoryContext by readers.

How about RecoveryCallbacks?

> Also, why is this a global var? Exported too?

It's needed in xlog.c, xlogarchive.c, and xlogrecovery.c.  Would you rather
it be static to xlogarchive.c and provide accessors for the others?

>> +/*
>> + * Type of the shared library symbol _PG_recovery_module_init that is looked up
>> + * when loading a recovery library.
>> + */
>> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb);
> 
> I think this is a bad way to return callbacks. This way the
> RecoveryModuleCallbacks needs to be modifiable, which makes the job for the
> compiler harder (and isn't the greatest for security).
> 
> I strongly encourage you to follow the model used e.g. by tableam. The init
> function should return a pointer to a *constant* struct. Which is compile-time
> initialized with the function pointers.
> 
> See the bottom of heapam_handler.c for how that ends up looking.

Hm.  I used the existing strategy for archive modules and logical decoding
output plugins here.  I think it would be weird for the archive module and
recovery module interfaces to look so different, but if that's okay, I can
change it.

>> +void
>> +LoadRecoveryCallbacks(void)
>> +{
>> +    RecoveryModuleInit init;
>> +
>> +    /*
>> +     * If the shell command is enabled, use our special initialization
>> +     * function.  Otherwise, load the library and call its
>> +     * _PG_recovery_module_init().
>> +     */
>> +    if (restoreLibrary[0] == '\0')
>> +        init = shell_restore_init;
>> +    else
>> +        init = (RecoveryModuleInit)
>> +            load_external_function(restoreLibrary, "_PG_recovery_module_init",
>> +                                   false, NULL);
> 
> Why a special rule for shell, instead of just defaulting the GUC to it?

I'm not following this one.  The default value of the restore_library GUC
is an empty string, which means that the shell commands should be used.

>> +    /*
>> +     * If using shell commands, remove callbacks for any commands that are not
>> +     * set.
>> +     */
>> +    if (restoreLibrary[0] == '\0')
>> +    {
>> +        if (recoveryRestoreCommand[0] == '\0')
>> +            RecoveryContext.restore_cb = NULL;
>> +        if (archiveCleanupCommand[0] == '\0')
>> +            RecoveryContext.archive_cleanup_cb = NULL;
>> +        if (recoveryEndCommand[0] == '\0')
>> +            RecoveryContext.recovery_end_cb = NULL;
> 
> I'd just mandate that these are implemented and that the module has to handle
> if it doesn't need to do anything.

Wouldn't this just force module authors to write empty functions for the
parts they don't need?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

28 January 2023, 01:55:42

Hi,

On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote:
> On Fri, Jan 27, 2023 at 04:23:19PM -0800, Andres Freund wrote:
> >> +typedef bool (*RecoveryRestoreCB) (const char *file, const char *path,
> >> +                                   const char *lastRestartPointFileName);
> >> +typedef void (*RecoveryArchiveCleanupCB) (const char *lastRestartPointFileName);
> >> +typedef void (*RecoveryEndCB) (const char *lastRestartPointFileName);
> >> +typedef void (*RecoveryShutdownCB) (void);
> > 
> > I think the signature of these forces bad coding practices, because there's no
> > way to have state within a recovery module (as there's no parameter for it).
> > 
> > It's possible we would eventually support multiple modules, e.g. restoring
> > from shorter term file based archiving and from longer term archiving in some
> > blob store. Then we'll regret not having a varible for this.
> 
> Are you suggesting that we add a "void *arg" to each one of these?

Yes.  Or pass a pointer to a struct with a "private_data" style field to all
of them.

> >> +extern RecoveryModuleCallbacks RecoveryContext;
> > 
> > I think that'll typically be interpreteted as a MemoryContext by readers.
> 
> How about RecoveryCallbacks?

Callbacks is better than Context here imo, but I think 'Recovery' makes it
sound like this actually performs WAL replay or such. Seems like it should be
RestoreCallbacks at the very least?

> > Also, why is this a global var? Exported too?
> 
> It's needed in xlog.c, xlogarchive.c, and xlogrecovery.c.  Would you rather
> it be static to xlogarchive.c and provide accessors for the others?

Maybe? Something about this feels wrong to me, but I can't entirely put my
finger on it.

> >> +/*
> >> + * Type of the shared library symbol _PG_recovery_module_init that is looked up
> >> + * when loading a recovery library.
> >> + */
> >> +typedef void (*RecoveryModuleInit) (RecoveryModuleCallbacks *cb);
> > 
> > I think this is a bad way to return callbacks. This way the
> > RecoveryModuleCallbacks needs to be modifiable, which makes the job for the
> > compiler harder (and isn't the greatest for security).
> > 
> > I strongly encourage you to follow the model used e.g. by tableam. The init
> > function should return a pointer to a *constant* struct. Which is compile-time
> > initialized with the function pointers.
> > 
> > See the bottom of heapam_handler.c for how that ends up looking.
> 
> Hm.  I used the existing strategy for archive modules and logical decoding
> output plugins here.

Unfortunately I didn't realize the problem when I was designing the output
plugin interface. But there's probably too many users of it out there now to
change it.

The interface does at least provide a way to have its own "per instance"
state, via the startup callback and LogicalDecodingContext->output_plugin_private.

The worst interface in this area is index AMs - the handler returns a pointer
to a palloced struct with callbacks. That then is copied into a new allocation
in the relcache entry. We have hundreds to thousands of copies of what
bthandler() sets up in memory. Without any sort of benefit.

> I think it would be weird for the archive module and
> recovery module interfaces to look so different, but if that's okay, I can
> change it.

I'm a bit sad about the archive module case - I wonder if we should change it
now, there can't be many users of it out there. And I think it's more likely
that we'll eventually want multiple archiving scripts to run concurrently -
which will be quite hard with the current interface (no private state).

Just btw: It's imo a bit awkward for the definition of the archiving plugin
interface to be in pgarch.h: "Exports from postmaster/pgarch.c" doesn't
describe that well. A dedicated header seems cleaner.

> 
> >> +void
> >> +LoadRecoveryCallbacks(void)
> >> +{
> >> +    RecoveryModuleInit init;
> >> +
> >> +    /*
> >> +     * If the shell command is enabled, use our special initialization
> >> +     * function.  Otherwise, load the library and call its
> >> +     * _PG_recovery_module_init().
> >> +     */
> >> +    if (restoreLibrary[0] == '\0')
> >> +        init = shell_restore_init;
> >> +    else
> >> +        init = (RecoveryModuleInit)
> >> +            load_external_function(restoreLibrary, "_PG_recovery_module_init",
> >> +                                   false, NULL);
> > 
> > Why a special rule for shell, instead of just defaulting the GUC to it?
> 
> I'm not following this one.  The default value of the restore_library GUC
> is an empty string, which means that the shell commands should be used.

I was wondering why we implement "shell" via a separate mechanism from
restore_library. I.e. a) why doesn't restore_library default to 'shell',
instead of an empty string, b) why aren't restore_command et al implemented
using a restore module.

> >> +    /*
> >> +     * If using shell commands, remove callbacks for any commands that are not
> >> +     * set.
> >> +     */
> >> +    if (restoreLibrary[0] == '\0')
> >> +    {
> >> +        if (recoveryRestoreCommand[0] == '\0')
> >> +            RecoveryContext.restore_cb = NULL;
> >> +        if (archiveCleanupCommand[0] == '\0')
> >> +            RecoveryContext.archive_cleanup_cb = NULL;
> >> +        if (recoveryEndCommand[0] == '\0')
> >> +            RecoveryContext.recovery_end_cb = NULL;
> > 
> > I'd just mandate that these are implemented and that the module has to handle
> > if it doesn't need to do anything.
> 
> Wouldn't this just force module authors to write empty functions for the
> parts they don't need?

Yes. But what's the point of a restore library that doesn't implement a
restore command?  Making some/all callbacks mandatory and validating mandatory
callbacks are set, during load, IME makes it easier to evolve the interface
over time, because problems become immediately apparent, rather than having to
wait for a certain callback to be hit.

It's not actually clear to me why another restore library shouldn't be able to
use restore_command etc, given that we have the parameters.  One quite useful
module would be a version of the "shell" interface that runs multiple restore
commands in parallel, assuming we'll need subsequent files as well.

The fact that restore_command are not run in parallel, and that many useful
restore commands have a fair bit of latency, is an issue. So a shell_parallel
restore library would e.g. be useful?

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

28 January 2023, 04:17:46

On Fri, Jan 27, 2023 at 05:55:42PM -0800, Andres Freund wrote:
> On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote:
>> I think it would be weird for the archive module and
>> recovery module interfaces to look so different, but if that's okay, I can
>> change it.
> 
> I'm a bit sad about the archive module case - I wonder if we should change it
> now, there can't be many users of it out there. And I think it's more likely
> that we'll eventually want multiple archiving scripts to run concurrently -
> which will be quite hard with the current interface (no private state).

I'm open to that.  IIUC it wouldn't require too many changes to existing
archive modules, and if it gets us closer to batching or parallelism, it's
probably worth doing sooner than later.

> I was wondering why we implement "shell" via a separate mechanism from
> restore_library. I.e. a) why doesn't restore_library default to 'shell',
> instead of an empty string, b) why aren't restore_command et al implemented
> using a restore module.

I think that's the long-term idea.  For archive modules, there were
concerns about backward compatibility [0].

[0] https://postgr.es/m/CABUevEx8cKy%3D%2BYQU_3NaeXnZV2bSB7Lk6EE%2B-FEcmE4JO4V1hg%40mail.gmail.com

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Nathan Bossart

Date:

28 January 2023, 06:27:29

On Fri, Jan 27, 2023 at 08:17:46PM -0800, Nathan Bossart wrote:
> On Fri, Jan 27, 2023 at 05:55:42PM -0800, Andres Freund wrote:
>> On 2023-01-27 16:59:10 -0800, Nathan Bossart wrote:
>>> I think it would be weird for the archive module and
>>> recovery module interfaces to look so different, but if that's okay, I can
>>> change it.
>> 
>> I'm a bit sad about the archive module case - I wonder if we should change it
>> now, there can't be many users of it out there. And I think it's more likely
>> that we'll eventually want multiple archiving scripts to run concurrently -
>> which will be quite hard with the current interface (no private state).
> 
> I'm open to that.  IIUC it wouldn't require too many changes to existing
> archive modules, and if it gets us closer to batching or parallelism, it's
> probably worth doing sooner than later.

Here is a work-in-progress patch set for adjusting the archive modules
interface.  Is this roughly what you had in mind?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

30 January 2023, 19:38:20

On Mon, Jan 30, 2023 at 04:51:38PM +0900, Michael Paquier wrote:
> Now, I find this part, where we use a double pointer to allow the
> module initialization to create and give back a private area, rather
> confusing, and I think that this could be bug-prone, as well.  Once
> you incorporate some data within the set of callbacks, isn't this
> stuff close to a "state" data, or just something that we could call
> only an "ArchiveModule"?  Could it make more sense to have
> _PG_archive_module_init return a structure with everything rather than
> a separate in/out argument?  Here is the idea, simply:
> typedef struct ArchiveModule {
>     ArchiveCallbacks *routines;
>     void *private_data;
>     /* Potentially more here, like some flags? */
> } ArchiveModule;

Yeah, we could probably invent an ArchiveModuleContext struct.  I think
this is similar to how LogicalDecodingContext is used.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

30 January 2023, 19:48:10

Hi,

On 2023-01-30 16:51:38 +0900, Michael Paquier wrote:
> On Fri, Jan 27, 2023 at 10:27:29PM -0800, Nathan Bossart wrote:
> > Here is a work-in-progress patch set for adjusting the archive modules
> > interface.  Is this roughly what you had in mind?
> 
> I have been catching up with what is happening here.  I can get
> behind the idea to use the term "callbacks" vs "context" for clarity,
> to move the callback definitions into their own header, and to add
> extra arguments to the callback functions for some private data.
> 
> -void
> -_PG_archive_module_init(ArchiveModuleCallbacks *cb)
> +const ArchiveModuleCallbacks *
> +_PG_archive_module_init(void **arg)
>  {
>     AssertVariableIsOfType(&_PG_archive_module_init, ArchiveModuleInit);
>  
> -   cb->check_configured_cb = basic_archive_configured;
> -   cb->archive_file_cb = basic_archive_file;
> +   (*arg) = (void *) AllocSetContextCreate(TopMemoryContext,
> +                                           "basic_archive",
> +                                           ALLOCSET_DEFAULT_SIZES);
> +
> +   return &basic_archive_callbacks;

> Now, I find this part, where we use a double pointer to allow the
> module initialization to create and give back a private area, rather
> confusing, and I think that this could be bug-prone, as well.

I don't think _PG_archive_module_init() should actually allocate a memory
context and do other similar initializations. Instead it should just return
'const ArchiveModuleCallbacks*', typically a single line.

Allocations etc should happen in one of the callbacks. That way we can
actually have multiple instances of a module.


>  Once
> you incorporate some data within the set of callbacks, isn't this
> stuff close to a "state" data, or just something that we could call
> only an "ArchiveModule"?  Could it make more sense to have
> _PG_archive_module_init return a structure with everything rather than
> a separate in/out argument?  Here is the idea, simply:
> typedef struct ArchiveModule {
>     ArchiveCallbacks *routines;
>     void *private_data;
>     /* Potentially more here, like some flags? */
> } ArchiveModule;

I don't like this much. This still basically ends up with the module callbacks
not being sufficient to instantiate an archive module.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

30 January 2023, 20:04:22

On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote:
> I don't think _PG_archive_module_init() should actually allocate a memory
> context and do other similar initializations. Instead it should just return
> 'const ArchiveModuleCallbacks*', typically a single line.
> 
> Allocations etc should happen in one of the callbacks. That way we can
> actually have multiple instances of a module.

I think we'd need to invent a startup callback for archive modules for this
to work, but that's easy enough.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Nathan Bossart

Date:

31 January 2023, 23:30:13

On Tue, Jan 31, 2023 at 08:13:11AM +0900, Michael Paquier wrote:
> On Mon, Jan 30, 2023 at 12:04:22PM -0800, Nathan Bossart wrote:
>> On Mon, Jan 30, 2023 at 11:48:10AM -0800, Andres Freund wrote:
>>> I don't think _PG_archive_module_init() should actually allocate a memory
>>> context and do other similar initializations. Instead it should just return
>>> 'const ArchiveModuleCallbacks*', typically a single line.
>>> 
>>> Allocations etc should happen in one of the callbacks. That way we can
>>> actually have multiple instances of a module.
>> 
>> I think we'd need to invent a startup callback for archive modules for this
>> to work, but that's easy enough.
> 
> If you don't return some (void *) pointing to a private area that
> would be stored by the backend, allocated as part of the loading path,
> I agree that an extra callback is what makes the most sense,
> presumably called around the beginning of PgArchiverMain().  Doing
> this kind of one-time action in the file callback woud be weird..

Okay, here is a new patch set with the aforementioned adjustments and
documentation updates.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

01 February 2023, 11:54:26

Hi,

On 2023-01-31 15:30:13 -0800, Nathan Bossart wrote:

> +/*
> + * basic_archive_startup
> + *
> + * Creates the module's memory context.
> + */
> +void *
> +basic_archive_startup(void)
> +{
> +    return (void *) AllocSetContextCreate(TopMemoryContext,
> +                                          "basic_archive",
> +                                          ALLOCSET_DEFAULT_SIZES);
>  }

I'd make basic_archive's private data a struct, with a member for the
context, but it's not that important.

I'd also be inclined to do the same for the private_state you're passing
around for each module. Even if it's just to reduce the number of
functions accepting void * - loosing compiler type checking isn't great.

So maybe an ArchiveModuleState { void *private_data } that's passed to
basic_archive_startup() and all the other callbacks.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

01 February 2023, 20:15:29

On Wed, Feb 01, 2023 at 03:54:26AM -0800, Andres Freund wrote:
> I'd make basic_archive's private data a struct, with a member for the
> context, but it's not that important.
> 
> I'd also be inclined to do the same for the private_state you're passing
> around for each module. Even if it's just to reduce the number of
> functions accepting void * - loosing compiler type checking isn't great.
> 
> So maybe an ArchiveModuleState { void *private_data } that's passed to
> basic_archive_startup() and all the other callbacks.

Here's a new patch set in which I've attempted to address this feedback and
Michael's feedback.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

01 February 2023, 21:06:06

Hi,

On 2023-02-01 12:15:29 -0800, Nathan Bossart wrote:
> Here's a new patch set in which I've attempted to address this feedback and
> Michael's feedback.

Looks better!


> @@ -25,12 +34,14 @@ extern PGDLLIMPORT char *XLogArchiveLibrary;
>   * For more information about the purpose of each callback, refer to the
>   * archive modules documentation.
>   */
> -typedef bool (*ArchiveCheckConfiguredCB) (void);
> -typedef bool (*ArchiveFileCB) (const char *file, const char *path);
> -typedef void (*ArchiveShutdownCB) (void);
> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state);
> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
> +typedef bool (*ArchiveFileCB) (const char *file, const char *path, ArchiveModuleState *state);
> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);

Personally I'd always pass ArchiveModuleState *state as the first arg,
but it's not important.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

01 February 2023, 21:23:26

On Wed, Feb 01, 2023 at 01:06:06PM -0800, Andres Freund wrote:
> On 2023-02-01 12:15:29 -0800, Nathan Bossart wrote:
>> Here's a new patch set in which I've attempted to address this feedback and
>> Michael's feedback.
> 
> Looks better!

Thanks!

>> @@ -25,12 +34,14 @@ extern PGDLLIMPORT char *XLogArchiveLibrary;
>>   * For more information about the purpose of each callback, refer to the
>>   * archive modules documentation.
>>   */
>> -typedef bool (*ArchiveCheckConfiguredCB) (void);
>> -typedef bool (*ArchiveFileCB) (const char *file, const char *path);
>> -typedef void (*ArchiveShutdownCB) (void);
>> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state);
>> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
>> +typedef bool (*ArchiveFileCB) (const char *file, const char *path, ArchiveModuleState *state);
>> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);
> 
> Personally I'd always pass ArchiveModuleState *state as the first arg,
> but it's not important.

Yeah, that's nicer.  cfbot is complaining about a missing #include, so I
need to send a new revision anyway.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

02 February 2023, 19:37:00

On Thu, Feb 02, 2023 at 02:34:17PM +0900, Michael Paquier wrote:
> Okay, the changes done here look straight-forward seen from here.  I
> got one small-ish comment.
> 
> +basic_archive_startup(ArchiveModuleState *state)
> +{
> +   BasicArchiveData *data = palloc0(sizeof(BasicArchiveData));
> 
> Perhaps this should use MemoryContextAlloc() rather than a plain
> palloc().  This should not matter based on the position where the
> startup callback is called, still that may be a pattern worth
> encouraging.

Good call.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

04 February 2023, 18:14:36

On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote:
> +   ArchiveModuleCallbacks struct filled with the callback function pointers for
> This needs a structname markup.
> 
> +   can use <literal>state->private_data</literal> to store it.
> And here it would be structfield.

fixed

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

08 February 2023, 16:27:13

Hi,

On 2023-02-08 16:23:34 +0900, Michael Paquier wrote:
> On Sat, Feb 04, 2023 at 10:14:36AM -0800, Nathan Bossart wrote:
> > On Sat, Feb 04, 2023 at 11:59:20AM +0900, Michael Paquier wrote:
> >> +   ArchiveModuleCallbacks struct filled with the callback function pointers for
> >> This needs a structname markup.
> >> 
> >> +   can use <literal>state->private_data</literal> to store it.
> >> And here it would be structfield.
> > 
> > fixed
> 
> Andres, did you have the change to look at that?  I did look at it,
> but it may not address all the points you may have in mind.

Yes, I think this looks pretty good now.

One minor thing: I don't think we really need the AssertVariableIsOfType() for
anything but the Init() one?

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

08 February 2023, 17:27:05

On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote:
> One minor thing: I don't think we really need the AssertVariableIsOfType() for
> anything but the Init() one?

This is another part that was borrowed from logical decoding output
plugins.  I'm not sure this adds much since f2b73c8 ("Add central
declarations for dlsym()ed symbols").  Perhaps we should remove all of
these assertions for functions that now have central declarations.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

08 February 2023, 17:33:44

Hi,

On 2023-02-08 09:27:05 -0800, Nathan Bossart wrote:
> On Wed, Feb 08, 2023 at 08:27:13AM -0800, Andres Freund wrote:
> > One minor thing: I don't think we really need the AssertVariableIsOfType() for
> > anything but the Init() one?
> 
> This is another part that was borrowed from logical decoding output
> plugins.

I know :(. It was needed in an earlier version of the output plugin interface,
where all the different callbacks were looked up via dlsym(), but should have
been removed after that.

> I'm not sure this adds much since f2b73c8 ("Add central
> declarations for dlsym()ed symbols").  Perhaps we should remove all of
> these assertions for functions that now have central declarations.

Most of them weren't needed even before that.

And yes, I'd be for a patch to remove all of those assertions.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

08 February 2023, 17:57:56

On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote:
> And yes, I'd be for a patch to remove all of those assertions.

done

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

08 February 2023, 18:24:18

Hi,

On 2023-02-08 09:57:56 -0800, Nathan Bossart wrote:
> On Wed, Feb 08, 2023 at 09:33:44AM -0800, Andres Freund wrote:
> > And yes, I'd be for a patch to remove all of those assertions.
> 
> done

If you'd reorder it so that 0004 applies independently from the other changes,
I'd just push that now.

I was remembering additional AssertVariableIsOfType(), but it looks like we
actually did remember to take them out when redesigning the output plugin
interface...

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

08 February 2023, 18:55:44

On Wed, Feb 08, 2023 at 10:24:18AM -0800, Andres Freund wrote:
> If you'd reorder it so that 0004 applies independently from the other changes,
> I'd just push that now.

done

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

09 February 2023, 05:16:19

On 2023-02-08 10:55:44 -0800, Nathan Bossart wrote:
> done

Pushed. Thanks!

Re: recovery modules

From

Nathan Bossart

Date:

09 February 2023, 17:24:58

On Wed, Feb 08, 2023 at 09:16:19PM -0800, Andres Freund wrote:
> Pushed. Thanks!

Thanks!

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Nathan Bossart

Date:

09 February 2023, 19:39:17

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Andres Freund

Date:

09 February 2023, 20:18:55

Hi,

On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote:
> rebased for cfbot

I think this nearly ready. Michael, are you planning to commit this?

Personally I'd probably squash these into a single commit.

> diff --git a/doc/src/sgml/archive-modules.sgml b/doc/src/sgml/archive-modules.sgml
> index ef02051f7f..2db1b19216 100644
> --- a/doc/src/sgml/archive-modules.sgml
> +++ b/doc/src/sgml/archive-modules.sgml
> @@ -47,23 +47,30 @@
>     normal library search path is used to locate the library.  To provide the
>     required archive module callbacks and to indicate that the library is
>     actually an archive module, it needs to provide a function named
> -   <function>_PG_archive_module_init</function>.  This function is passed a
> -   struct that needs to be filled with the callback function pointers for
> -   individual actions.
> +   <function>_PG_archive_module_init</function>.  This function must return an
> +   <structname>ArchiveModuleCallbacks</structname> struct filled with the
> +   callback function pointers for individual actions.

I'd probably mention that this should typically be of server lifetime / a
'static const' struct. Tableam documents this as follows:

  The result of the function must be a pointer to a struct of type
  <structname>TableAmRoutine</structname>, which contains everything that the
  core code needs to know to make use of the table access method. The return
  value needs to be of server lifetime, which is typically achieved by
  defining it as a <literal>static const</literal> variable in global
  scope

> +
> +  <note>
> +   <para>
> +    <varname>archive_library</varname> is only loaded in the archiver process.
> +   </para>
> +  </note>
>   </sect1>

That's not really related to any of the changes here, right?

I'm not sure it's a good idea to document that. We e.g. probably should allow
the library to check that the configuration is correct, at postmaster start,
rather than later, at runtime.

>   <sect1 id="archive-module-callbacks">
> @@ -73,6 +80,20 @@ typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
>     The server will call them as required to process each individual WAL file.
>    </para>
>  
> +  <sect2 id="archive-module-startup">
> +   <title>Startup Callback</title>
> +   <para>
> +    The <function>startup_cb</function> callback is called shortly after the
> +    module is loaded.  This callback can be used to perform any additional
> +    initialization required.  If the archive module needs to have a state, it
> +    can use <structfield>state->private_data</structfield> to store it.

s/needs to have a state/has state/?

> @@ -83,7 +104,7 @@ typedef void (*ArchiveModuleInit) (struct ArchiveModuleCallbacks *cb);
>      assumes the module is configured.
>  
>  <programlisting>
> -typedef bool (*ArchiveCheckConfiguredCB) (void);
> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
>  </programlisting>
>  
>      If <literal>true</literal> is returned, the server will proceed with

Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the
state. We're not really doing anything yet, at that point, so it shouldn't
really need state?

The reason I'm wondering is that I think we should consider calling this from
the GUC assignment hook, at least in postmaster. Which would make it more
convenient to not have state, I think?

> @@ -128,7 +149,7 @@ typedef bool (*ArchiveFileCB) (const char *file, const char *path);
>      these situations.
>  
>  <programlisting>
> -typedef void (*ArchiveShutdownCB) (void);
> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);
>  </programlisting>
>     </para>
>    </sect2>

Perhaps mention that this needs to free state it allocated in the
ArchiveModuleState, or it'll be leaked?

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

09 February 2023, 22:48:26

On Thu, Feb 09, 2023 at 12:18:55PM -0800, Andres Freund wrote:
> On 2023-02-09 11:39:17 -0800, Nathan Bossart wrote:
> Personally I'd probably squash these into a single commit.

done

> I'd probably mention that this should typically be of server lifetime / a
> 'static const' struct. Tableam documents this as follows:

done

>> +  <note>
>> +   <para>
>> +    <varname>archive_library</varname> is only loaded in the archiver process.
>> +   </para>
>> +  </note>
>>   </sect1>
> 
> That's not really related to any of the changes here, right?
> 
> I'm not sure it's a good idea to document that. We e.g. probably should allow
> the library to check that the configuration is correct, at postmaster start,
> rather than later, at runtime.

removed

>> +  <sect2 id="archive-module-startup">
>> +   <title>Startup Callback</title>
>> +   <para>
>> +    The <function>startup_cb</function> callback is called shortly after the
>> +    module is loaded.  This callback can be used to perform any additional
>> +    initialization required.  If the archive module needs to have a state, it
>> +    can use <structfield>state->private_data</structfield> to store it.
> 
> s/needs to have a state/has state/?

done

>>  <programlisting>
>> -typedef bool (*ArchiveCheckConfiguredCB) (void);
>> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
>>  </programlisting>
>>  
>>      If <literal>true</literal> is returned, the server will proceed with
> 
> Hm. I wonder if ArchiveCheckConfiguredCB() should actually work without the
> state. We're not really doing anything yet, at that point, so it shouldn't
> really need state?
> 
> The reason I'm wondering is that I think we should consider calling this from
> the GUC assignment hook, at least in postmaster. Which would make it more
> convenient to not have state, I think?

I agree that this callback should typically not need the state, but I'm not
sure whether it fits into the assignment hook for archive_library.  This
callback is primarily meant for situations when you have archiving enabled,
but your module isn't configured yet (e.g., archive_command is empty).  In
this case, we keep the WAL around, but we don't try to archive it until
this hook returns true.  It's up to each module to define that criteria.  I
can imagine someone introducing a GUC in their archive module that
temporarily halts archiving via this callback.  In that case, calling it
via a GUC assignment hook probably won't work.  In fact, checking whether
archive_command is empty in that hook might not work either.

>>  <programlisting>
>> -typedef void (*ArchiveShutdownCB) (void);
>> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);
>>  </programlisting>
>>     </para>
>>    </sect2>
> 
> Perhaps mention that this needs to free state it allocated in the
> ArchiveModuleState, or it'll be leaked?

done

I left this out originally because the archiver exits shortly after calling
this.  However, if you have DSM segments or something, it's probably wise
to make sure those are cleaned up.  And I suppose we might not always exit
immediately after this callback, so establishing the habit of freeing the
state could be a good idea.  In addition to updating this part of the docs,
I wrote a shutdown callback for basic_archive that frees its state.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v10-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Nathan Bossart

Date:

13 February 2023, 22:56:47

On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
> +   basic_archive_context = data->context;
> +   if (CurrentMemoryContext == basic_archive_context)
> +       MemoryContextSwitchTo(TopMemoryContext);
> +
> +   if (MemoryContextIsValid(basic_archive_context))
> +       MemoryContextDelete(basic_archive_context);
> 
> This is a bit confusing, because it means that we enter in the
> shutdown callback with one context, but exit it under
> TopMemoryContext.  Are you sure that this will be OK when there could
> be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
> has nothing specific to memory contexts.

Well, we can't free the memory context while we are in it, so we have to
switch to another one.  I agree that this is a bit confusing, though.

On second thought, I'm not sure it's important to make sure the state is
freed in the shutdown callback.  It's only called just before the archiver
process exits, so we're not really at risk of leaking anything.  I suppose
we might not always restart the archiver in this case, but I also don't
anticipate that behavior changing in the near future.  I think this
callback is more useful for things like shutting down background workers.

I went ahead and removed the shutdown callback from basic_archive and the
note about leaking from the documentation.

> Is putting the new headers in src/include/postmaster/ the best move in
> the long term?  Perhaps that's fine, but I was wondering whether a new
> location like archive/ would make more sense.  pg_arch.h being in the
> postmaster section is fine.

I moved them to archive/.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v11-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Andres Freund

Date:

13 February 2023, 23:37:33

Hi,

On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote:
> On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
> > +   basic_archive_context = data->context;
> > +   if (CurrentMemoryContext == basic_archive_context)
> > +       MemoryContextSwitchTo(TopMemoryContext);
> > +
> > +   if (MemoryContextIsValid(basic_archive_context))
> > +       MemoryContextDelete(basic_archive_context);
> > 
> > This is a bit confusing, because it means that we enter in the
> > shutdown callback with one context, but exit it under
> > TopMemoryContext.  Are you sure that this will be OK when there could
> > be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
> > has nothing specific to memory contexts.
> 
> Well, we can't free the memory context while we are in it, so we have to
> switch to another one.  I agree that this is a bit confusing, though.

Why would we be in that memory context? I'd just add an assert documenting
we're not.


> On second thought, I'm not sure it's important to make sure the state is
> freed in the shutdown callback.  It's only called just before the archiver
> process exits, so we're not really at risk of leaking anything.  I suppose
> we might not always restart the archiver in this case, but I also don't
> anticipate that behavior changing in the near future.  I think this
> callback is more useful for things like shutting down background workers.

I think it's crucial. Otherwise we're just ossifying the design that there's
just one archive module active at a time.


> I went ahead and removed the shutdown callback from basic_archive and the
> note about leaking from the documentation.

-1


Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

14 February 2023, 00:55:58

On Mon, Feb 13, 2023 at 03:37:33PM -0800, Andres Freund wrote:
> On 2023-02-13 14:56:47 -0800, Nathan Bossart wrote:
>> On Mon, Feb 13, 2023 at 04:37:10PM +0900, Michael Paquier wrote:
>> > +   basic_archive_context = data->context;
>> > +   if (CurrentMemoryContext == basic_archive_context)
>> > +       MemoryContextSwitchTo(TopMemoryContext);
>> > +
>> > +   if (MemoryContextIsValid(basic_archive_context))
>> > +       MemoryContextDelete(basic_archive_context);
>> > 
>> > This is a bit confusing, because it means that we enter in the
>> > shutdown callback with one context, but exit it under
>> > TopMemoryContext.  Are you sure that this will be OK when there could
>> > be multiple callbacks piled up with before_shmem_exit()?  shmem_exit()
>> > has nothing specific to memory contexts.
>> 
>> Well, we can't free the memory context while we are in it, so we have to
>> switch to another one.  I agree that this is a bit confusing, though.
> 
> Why would we be in that memory context? I'd just add an assert documenting
> we're not.
> 
> 
>> On second thought, I'm not sure it's important to make sure the state is
>> freed in the shutdown callback.  It's only called just before the archiver
>> process exits, so we're not really at risk of leaking anything.  I suppose
>> we might not always restart the archiver in this case, but I also don't
>> anticipate that behavior changing in the near future.  I think this
>> callback is more useful for things like shutting down background workers.
> 
> I think it's crucial. Otherwise we're just ossifying the design that there's
> just one archive module active at a time.
> 
> 
>> I went ahead and removed the shutdown callback from basic_archive and the
>> note about leaking from the documentation.
> 
> -1

Okay.  I've added it back in v12 with the suggested adjustment for the
memory context stuff.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v12-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Nathan Bossart

Date:

14 February 2023, 01:02:37

On Mon, Feb 13, 2023 at 04:55:58PM -0800, Nathan Bossart wrote:
> Okay.  I've added it back in v12 with the suggested adjustment for the
> memory context stuff.

Sorry for then noise, cfbot alerted me to a missing #include, which I've
added in v13.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v13-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Nathan Bossart

Date:

15 February 2023, 18:44:07

On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote:
> On Mon, Feb 13, 2023 at 05:02:37PM -0800, Nathan Bossart wrote:
>> Sorry for then noise, cfbot alerted me to a missing #include, which I've
>> added in v13.

Sorry for more noise.  I noticed that I missed updating the IDENTIFICATION
line for shell_archive.c.  That's the only change in v14.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v14-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Andres Freund

Date:

16 February 2023, 19:29:56

Hi,

On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote:
> On Wed, Feb 15, 2023 at 03:38:21PM +0900, Michael Paquier wrote:
> > I may tweak a bit the comments, but nothing more.  And I don't think I
> > have more to add.  Andres, do you have anything you would like to
> > mention?

Just some minor comments below. None of them needs to be addressed.


> @@ -144,10 +170,12 @@ basic_archive_configured(void)
>   * Archives one file.
>   */
>  static bool
> -basic_archive_file(const char *file, const char *path)
> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
>  {
>      sigjmp_buf    local_sigjmp_buf;

Not related the things changed here, but this should never have been pushed
down into individual archive modules. There's absolutely no way that we're
going to keep this up2date and working correctly in random archive
modules. And it would break if archive modules are ever called outside of
pgarch.c.


> +static void
> +basic_archive_shutdown(ArchiveModuleState *state)
> +{
> +    BasicArchiveData *data = (BasicArchiveData *) (state->private_data);

The parens around (state->private_data) are imo odd.

> +    basic_archive_context = data->context;
> +    Assert(CurrentMemoryContext != basic_archive_context);
> +
> +    if (MemoryContextIsValid(basic_archive_context))
> +        MemoryContextDelete(basic_archive_context);

I guess I'd personally be paranoid and clean data->context after
this. Obviously doesn't matter right now, but at some later date it could be
that we'd error out after this point, and re-entered the shutdown callback.


> +
> +/*
> + * Archive module callbacks
> + *
> + * These callback functions should be defined by archive libraries and returned
> + * via _PG_archive_module_init().  ArchiveFileCB is the only required callback.
> + * For more information about the purpose of each callback, refer to the
> + * archive modules documentation.
> + */
> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state);
> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
> +typedef bool (*ArchiveFileCB) (ArchiveModuleState *state, const char *file, const char *path);
> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);
> +
> +typedef struct ArchiveModuleCallbacks
> +{
> +    ArchiveStartupCB startup_cb;
> +    ArchiveCheckConfiguredCB check_configured_cb;
> +    ArchiveFileCB archive_file_cb;
> +    ArchiveShutdownCB shutdown_cb;
> +} ArchiveModuleCallbacks;

If you wanted you could just define the callback types in the struct now, as
we don't need asserts for the types.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

16 February 2023, 20:15:12

Thanks for reviewing.

On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote:
> On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote:
>> @@ -144,10 +170,12 @@ basic_archive_configured(void)
>>   * Archives one file.
>>   */
>>  static bool
>> -basic_archive_file(const char *file, const char *path)
>> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
>>  {
>>      sigjmp_buf    local_sigjmp_buf;
> 
> Not related the things changed here, but this should never have been pushed
> down into individual archive modules. There's absolutely no way that we're
> going to keep this up2date and working correctly in random archive
> modules. And it would break if archive modules are ever called outside of
> pgarch.c.

Yeah.  IIRC I did briefly try to avoid this, but the difficulty was that
each module will have its own custom cleanup logic.  There's no requirement
that a module creates an exception handler, but I imagine that any
sufficiently complex one will.  In any case, I agree that it's worth trying
to pull this out of the individual modules.

>> +static void
>> +basic_archive_shutdown(ArchiveModuleState *state)
>> +{
>> +    BasicArchiveData *data = (BasicArchiveData *) (state->private_data);
> 
> The parens around (state->private_data) are imo odd.

Oops, removed.

>> +    basic_archive_context = data->context;
>> +    Assert(CurrentMemoryContext != basic_archive_context);
>> +
>> +    if (MemoryContextIsValid(basic_archive_context))
>> +        MemoryContextDelete(basic_archive_context);
> 
> I guess I'd personally be paranoid and clean data->context after
> this. Obviously doesn't matter right now, but at some later date it could be
> that we'd error out after this point, and re-entered the shutdown callback.

Done.

>> +/*
>> + * Archive module callbacks
>> + *
>> + * These callback functions should be defined by archive libraries and returned
>> + * via _PG_archive_module_init().  ArchiveFileCB is the only required callback.
>> + * For more information about the purpose of each callback, refer to the
>> + * archive modules documentation.
>> + */
>> +typedef void (*ArchiveStartupCB) (ArchiveModuleState *state);
>> +typedef bool (*ArchiveCheckConfiguredCB) (ArchiveModuleState *state);
>> +typedef bool (*ArchiveFileCB) (ArchiveModuleState *state, const char *file, const char *path);
>> +typedef void (*ArchiveShutdownCB) (ArchiveModuleState *state);
>> +
>> +typedef struct ArchiveModuleCallbacks
>> +{
>> +    ArchiveStartupCB startup_cb;
>> +    ArchiveCheckConfiguredCB check_configured_cb;
>> +    ArchiveFileCB archive_file_cb;
>> +    ArchiveShutdownCB shutdown_cb;
>> +} ArchiveModuleCallbacks;
> 
> If you wanted you could just define the callback types in the struct now, as
> we don't need asserts for the types.

This crossed my mind.  I thought it was nice to have a declaration for each
callback that we can copy into the docs, but I'm sure we could do without
it, too.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

v15-0001-restructure-archive-modules-API.patch

Re: recovery modules

From

Andres Freund

Date:

16 February 2023, 21:17:54

Hi,

On 2023-02-16 12:15:12 -0800, Nathan Bossart wrote:
> Thanks for reviewing.
> 
> On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote:
> > On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote:
> >> @@ -144,10 +170,12 @@ basic_archive_configured(void)
> >>   * Archives one file.
> >>   */
> >>  static bool
> >> -basic_archive_file(const char *file, const char *path)
> >> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
> >>  {
> >>      sigjmp_buf    local_sigjmp_buf;
> > 
> > Not related the things changed here, but this should never have been pushed
> > down into individual archive modules. There's absolutely no way that we're
> > going to keep this up2date and working correctly in random archive
> > modules. And it would break if archive modules are ever called outside of
> > pgarch.c.
> 
> Yeah.  IIRC I did briefly try to avoid this, but the difficulty was that
> each module will have its own custom cleanup logic.

It can use PG_TRY/CATCH for that, if the top-level sigsetjmp is in pgarch.c.
Or you could add a cleanup callback to the API, to be called after the
top-level cleanup in pgarch.c.


I'm quite baffled by:
        /* Close any files left open by copy_file() or compare_files() */
        AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);

in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files()
completely outside the context of a transaction environment. And it only does
the thing you want because you pass parameters that aren't actually valid in
the normal use in AtEOSubXact_Files().  I really don't understand how that's
supposed to be ok.

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

16 February 2023, 21:58:10

On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote:
> On 2023-02-16 12:15:12 -0800, Nathan Bossart wrote:
>> On Thu, Feb 16, 2023 at 11:29:56AM -0800, Andres Freund wrote:
>> > On 2023-02-15 10:44:07 -0800, Nathan Bossart wrote:
>> >> @@ -144,10 +170,12 @@ basic_archive_configured(void)
>> >>   * Archives one file.
>> >>   */
>> >>  static bool
>> >> -basic_archive_file(const char *file, const char *path)
>> >> +basic_archive_file(ArchiveModuleState *state, const char *file, const char *path)
>> >>  {
>> >>      sigjmp_buf    local_sigjmp_buf;
>> > 
>> > Not related the things changed here, but this should never have been pushed
>> > down into individual archive modules. There's absolutely no way that we're
>> > going to keep this up2date and working correctly in random archive
>> > modules. And it would break if archive modules are ever called outside of
>> > pgarch.c.
>> 
>> Yeah.  IIRC I did briefly try to avoid this, but the difficulty was that
>> each module will have its own custom cleanup logic.
> 
> It can use PG_TRY/CATCH for that, if the top-level sigsetjmp is in pgarch.c.
> Or you could add a cleanup callback to the API, to be called after the
> top-level cleanup in pgarch.c.

Yeah, that seems workable.

> I'm quite baffled by:
>         /* Close any files left open by copy_file() or compare_files() */
>         AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);
> 
> in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files()
> completely outside the context of a transaction environment. And it only does
> the thing you want because you pass parameters that aren't actually valid in
> the normal use in AtEOSubXact_Files().  I really don't understand how that's
> supposed to be ok.

Hm.  Should copy_file() and compare_files() have PG_FINALLY blocks that
attempt to close the files instead?  What would you recommend?

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Andres Freund

Date:

17 February 2023, 19:41:32

Hi,

On 2023-02-16 13:58:10 -0800, Nathan Bossart wrote:
> On Thu, Feb 16, 2023 at 01:17:54PM -0800, Andres Freund wrote:
> > I'm quite baffled by:
> >         /* Close any files left open by copy_file() or compare_files() */
> >         AtEOSubXact_Files(false, InvalidSubTransactionId, InvalidSubTransactionId);
> > 
> > in basic_archive_file(). It seems *really* off to call AtEOSubXact_Files()
> > completely outside the context of a transaction environment. And it only does
> > the thing you want because you pass parameters that aren't actually valid in
> > the normal use in AtEOSubXact_Files().  I really don't understand how that's
> > supposed to be ok.
> 
> Hm.  Should copy_file() and compare_files() have PG_FINALLY blocks that
> attempt to close the files instead?  What would you recommend?

I don't fully now, it's not entirely clear to me what the goals here were. I
think you'd likely need to do a bit of infrastructure work to do this
sanely. So far we just didn't have the need to handle files being released in
a way like you want to do there.

I suspect a good direction would be to use resource owners. Add a separate set
of functions that release files on resource owner release. Most of the
infrastructure is there already, for temporary files
(c.f. OpenTemporaryFile()).

Then that resource owner could be reset in case of error.

I'm not even sure that erroring out is a reasonable way to implement
copy_file(), compare_files(), particularly because you want to return via a
return code from basic_archive_files().

Greetings,

Andres Freund

Re: recovery modules

From

Nathan Bossart

Date:

17 February 2023, 22:32:43

On Fri, Feb 17, 2023 at 05:01:47PM +0900, Michael Paquier wrote:
> All that had better
> be put into their own threads, IMO, to bring more visibility on these
> subjects.

I created a new thread for these [0].

> Saying that, I have spent more time on the revamped version of the
> archive modules and it was already doing a lot, so I have applied
> it as it covered all the points discussed..

Thanks!

[0] https://postgr.es/m/20230217215624.GA3131134%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Re: recovery modules

From

Nathan Bossart

Date:

17 February 2023, 22:53:42

Here is a new revision of the restore modules patch set.  In this patch
set, the interface looks similar to the recent archive modules redesign,
and there are separate callbacks for retrieving different types of files.
I've attempted to address all the feedback I've received, but there was a
lot scattered across different threads, so it's possible I've missed
something.  Note that 0001 is the stopgap fix for restore_command that's
being tracked elsewhere [0].  I was careful to avoid repeating the recent
mistake with the SIGTERM handling.

This patch set is still a little rough around the edges, but I wanted to
post it in case folks had general thoughts about the structure, interface,
etc.  This implementation restores files synchronously one-by-one just like
archive modules, but in the future, I would like to add
asynchronous/parallel/batching support.  My intent is for this work to move
us closer to that.

[0] https://postgr.es/m/20230214174755.GA1348509%40nathanxps13

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

10 March 2023, 22:02:56

Here is a rebased version of the restore modules patch set.  I swapped the
patch for the stopgap fix for restore_command with the latest version [0],
and I marked the restore/ headers as installable (as was recently done for
archive/ [1]).  There are no other changes.

[0] https://postgr.es/m/20230301224751.GA1823946%40nathanxps13
[1] https://git.postgresql.org/gitweb/?p=postgresql.git;a=commit;h=6ad5793

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

15 March 2023, 04:13:09

I noticed that the new TAP test for basic_archive was failing
intermittently for cfbot. It looks like the query for checking that the
post-backup WAL is restored sometimes executes before archive recovery is
complete (because hot_standby is on). To fix this, I adjusted the test to
use poll_query_until instead. There are no other changes in v14.

I first tried to set hot_standby to off on the restored node so that the
query wouldn't run until archive recovery completed. This seemed like it
would work because start() useѕ "pg_ctl --wait", which has the following
note in the docs:

Startup is considered complete when the PID file indicates that the
server is ready to accept connections.

However, that's not what happens when hot_standby is off. In that case,
the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery
starts, which wait_for_postmaster_start() interprets as "ready." I see
this was reported before [0], but that discussion fizzled out. IIUC it was
done this way to avoid infinite waits when hot_standby is off and standby
mode is enabled. I could be missing something obvious, but that doesn't
seem necessary when hot_standby is off and recovery mode is enabled because
recovery should end at some point (never mind the halting problem). I'm
still digging into this and may spin off a new thread if I can conjure up a
proposal.

[0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com

--
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Michael Paquier

Date:

16 March 2023, 01:10:51

On Tue, Mar 14, 2023 at 09:13:09PM -0700, Nathan Bossart wrote:
> However, that's not what happens when hot_standby is off.  In that case,
> the postmaster.pid file is updated with PM_STATUS_STANDBY once recovery
> starts, which wait_for_postmaster_start() interprets as "ready."  I see
> this was reported before [0], but that discussion fizzled out.  IIUC it was
> done this way to avoid infinite waits when hot_standby is off and standby
> mode is enabled.  I could be missing something obvious, but that doesn't
> seem necessary when hot_standby is off and recovery mode is enabled because
> recovery should end at some point (never mind the halting problem).  I'm
> still digging into this and may spin off a new thread if I can conjure up a
> proposal.
>
> [0] https://postgr.es/m/CAMkU%3D1wrMqPggnEfszE-c3PPLmKgRK17_qr7tmxBECYEbyV-4Q%40mail.gmail.com

These days, knowing hot_standby is on by default, and that users would
recover up to the end-of-backup record of just use read replicas, do
we have a strong case for keeping this GUC parameter at all?  It does
not strike me that we really need to change a five-year-old behavior
if there has been few complaints about it.  I agree that it is
confusing as it stands, but the long-term simplifications may be worth
it in the recovery code (aka less booleans needed to track the flow of
the startup process, and less confusion around that).
--
Michael

Attachment

signature.asc

Re: recovery modules

From

Michael Paquier

Date:

06 April 2023, 00:34:45

On Fri, Feb 17, 2023 at 11:41:32AM -0800, Andres Freund wrote:
> I don't fully now, it's not entirely clear to me what the goals here were. I
> think you'd likely need to do a bit of infrastructure work to do this
> sanely. So far we just didn't have the need to handle files being released in
> a way like you want to do there.
>
> I suspect a good direction would be to use resource owners. Add a separate set
> of functions that release files on resource owner release. Most of the
> infrastructure is there already, for temporary files
> (c.f. OpenTemporaryFile()).

Yes, perhaps.  I've had good experience with these when it comes to
avoid leakages when releasing resources, particularly for resources
allocated by external libraries (cough, OpenSSL, cough).  And there
was some work to make these more scalable, for example.

At this stage of the CF, it seems pretty clear to me that this should
be pushed to v17, so moved to next CF.
--
Michael

Attachment

signature.asc

Re: recovery modules

From

Nathan Bossart

Date:

17 April 2023, 16:19:14

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

23 October 2023, 20:08:01

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

11 January 2024, 16:56:33

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

06 March 2024, 16:46:46

Here is another rebase.  Given the size of this patch set and the lack of
review, I am going to punt this one to v18.

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

15 March 2024, 14:16:07

rebased for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

27 March 2024, 02:40:50

another rebase for cfbot

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

10 April 2024, 21:51:49

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

15 May 2024, 21:10:39

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

21 May 2024, 02:52:34

rebased

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: recovery modules

From

Nathan Bossart

Date:

19 July 2024, 19:37:08

Given the lack of interest, I plan to mark the commitfest entry for this
patch set as "Withdrawn" shortly.

-- 
nathan