Thread: fix archive module shutdown callback
Hi hackers, Presently, when an archive module sets up a shutdown callback, it will be called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive library changes (via HandlePgArchInterrupts()), and upon normal shutdown. There are a couple of problems with this: * HandlePgArchInterrupts() calls the shutdown callback directly before proc_exit(). However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the shutdown callback. This means that the shutdown callback will be called twice whenever archive_library is changed via SIGHUP. * PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL. However, the archiver operates at the bottom of the exception stack, so ERRORs are treated as FATALs. This means that PG_ENSURE_ERROR_CLEANUP is excessive. We only need to set up the before_shmem_exit callback. To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and the call to the shutdown callback in HandlePgArchInterrupts() in favor of just setting up a before_shmem_exit callback in LoadArchiveLibrary(). -- Nathan Bossart Amazon Web Services: https://aws.amazon.com
Attachment
On Sun, Oct 16, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > Hi hackers, > > Presently, when an archive module sets up a shutdown callback, it will be > called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive > library changes (via HandlePgArchInterrupts()), and upon normal shutdown. > There are a couple of problems with this: Yeah. > * HandlePgArchInterrupts() calls the shutdown callback directly before > proc_exit(). However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to > pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the > shutdown callback. This means that the shutdown callback will be called > twice whenever archive_library is changed via SIGHUP. > > * PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL. However, > the archiver operates at the bottom of the exception stack, so ERRORs are > treated as FATALs. This means that PG_ENSURE_ERROR_CLEANUP is excessive. > We only need to set up the before_shmem_exit callback. > > To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and > the call to the shutdown callback in HandlePgArchInterrupts() in favor of > just setting up a before_shmem_exit callback in LoadArchiveLibrary(). We could have used a flag in call_archive_module_shutdown_callback() to avoid it being called multiple times, but having it as before_shmem_exit () callback without direct calls to it is the right approach IMO. +1 to remove PG_ENSURE_ERROR_CLEANUP and PG_END_ENSURE_ERROR_CLEANUP. Is the shutdown callback meant to be called only after the archive library is loaded? The documentation [1] says that it just gets called before the archiver process exits. If this is true, can we just place before_shmem_exit(call_archive_module_shutdown_callback, 0); in PgArchiverMain() after on_shmem_exit(pgarch_die, 0);? Also, I've noticed other 3 threads and CF entries all related to 'archive modules' feature. IMO, it could be better to have all of them under a single thread and single CF entry to reduce reviewers/committers' efforts and seek more thoughts about all of the fixes. https://commitfest.postgresql.org/40/3933/ https://commitfest.postgresql.org/40/3950/ https://commitfest.postgresql.org/40/3948/ [1] <sect2 id="archive-module-shutdown"> <title>Shutdown Callback</title> <para> The <function>shutdown_cb</function> callback is called when the archiver process exits (e.g., after an error) or the value of <xref linkend="guc-archive-library"/> changes. If no <function>shutdown_cb</function> is defined, no special action is taken in these situations. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
On Sun, Oct 16, 2022 at 01:39:14PM +0530, Bharath Rupireddy wrote: > Is the shutdown callback meant to be called only after the archive > library is loaded? The documentation [1] says that it just gets called > before the archiver process exits. If this is true, can we just place > before_shmem_exit(call_archive_module_shutdown_callback, 0); in > PgArchiverMain() after on_shmem_exit(pgarch_die, 0);? I am not sure to understand what you mean here. The shutdown callback is available once the archiver process has loaded the library defined in archive_library (assuming it is itself in shared_preload_libraries) and you cannot call something that does not exist yet. So, yes, you could define the call to before_shmem_exit() a bit earlier because that would be a no-op until the library is loaded, but at the end that would be just registering a callback that would do nothing useful in a larger window, aka until the library is loaded. FWIW, I think that the documentation should clarify that the shutdown callback is called before shmem exit. That's important. Another thing is that the shutdown callback is used by neither shell_archive.c nor basic_archive. We could do something about that, actually, say by plugging an elog(DEBUG1) in shutdown_cb for shell_archive.c to inform that the archiver is going down? basic_archive could do that, but we already use shell_archive.c in a bunch of tests, and this would need just log_min_messages=debug1 or lower, so.. > Also, I've noticed other 3 threads and CF entries all related to > 'archive modules' feature. IMO, it could be better to have all of them > under a single thread and single CF entry to reduce > reviewers/committers' efforts and seek more thoughts about all of the > fixes. I don't mind discussing each point separately as the first thread dealing with archive modules is already very long, so the current way of doing things makes sure to attract the correct type of attention for each problem, IMO. -- Michael
Attachment
At Mon, 17 Oct 2022 13:51:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in > On Sun, Oct 16, 2022 at 01:39:14PM +0530, Bharath Rupireddy wrote: > > Is the shutdown callback meant to be called only after the archive > > library is loaded? The documentation [1] says that it just gets called > > before the archiver process exits. If this is true, can we just place > > before_shmem_exit(call_archive_module_shutdown_callback, 0); in > > PgArchiverMain() after on_shmem_exit(pgarch_die, 0);? > > I am not sure to understand what you mean here. The shutdown callback > is available once the archiver process has loaded the library defined > in archive_library (assuming it is itself in shared_preload_libraries) > and you cannot call something that does not exist yet. So, yes, you I guess that the "callback" there means the callback-caller function (call_archive_module_shutdown_callback), which in turn is set as a callback... > could define the call to before_shmem_exit() a bit earlier because > that would be a no-op until the library is loaded, but at the end that > would be just registering a callback that would do nothing useful in a > larger window, aka until the library is loaded. I thought that Bharath's point is to use before_shmem_exit() instead of PG_ENSURE_ERROR_CLEANUP(). The place doesn't seem significant but if we use before_shmem_exit(), it would be cleaner to place it adjecent to on_sheme_exit() call. > FWIW, I think that the documentation should clarify that the shutdown > callback is called before shmem exit. That's important. Sure. What the shutdown callback can do differs by shared memory access. > Another thing is that the shutdown callback is used by neither > shell_archive.c nor basic_archive. We could do something about that, > actually, say by plugging an elog(DEBUG1) in shutdown_cb for > shell_archive.c to inform that the archiver is going down? > basic_archive could do that, but we already use shell_archive.c in a > bunch of tests, and this would need just log_min_messages=debug1 or > lower, so.. +1 for inserting DEBUG1. > > Also, I've noticed other 3 threads and CF entries all related to > > 'archive modules' feature. IMO, it could be better to have all of them > > under a single thread and single CF entry to reduce > > reviewers/committers' efforts and seek more thoughts about all of the > > fixes. > > I don't mind discussing each point separately as the first thread > dealing with archive modules is already very long, so the current way > of doing things makes sure to attract the correct type of attention > for each problem, IMO. I tend to agree to this. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
At Sun, 16 Oct 2022 13:39:14 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in > On Sun, Oct 16, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote: > > > > Hi hackers, > > > > Presently, when an archive module sets up a shutdown callback, it will be > > called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive > > library changes (via HandlePgArchInterrupts()), and upon normal shutdown. > > There are a couple of problems with this: > > Yeah. > > > * HandlePgArchInterrupts() calls the shutdown callback directly before > > proc_exit(). However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to > > pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the > > shutdown callback. This means that the shutdown callback will be called > > twice whenever archive_library is changed via SIGHUP. > > > > * PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL. However, > > the archiver operates at the bottom of the exception stack, so ERRORs are > > treated as FATALs. This means that PG_ENSURE_ERROR_CLEANUP is excessive. > > We only need to set up the before_shmem_exit callback. > > > > To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and > > the call to the shutdown callback in HandlePgArchInterrupts() in favor of > > just setting up a before_shmem_exit callback in LoadArchiveLibrary(). > > We could have used a flag in call_archive_module_shutdown_callback() > to avoid it being called multiple times, but having it as > before_shmem_exit () callback without direct calls to it is the right > approach IMO. > > +1 to remove PG_ENSURE_ERROR_CLEANUP and PG_END_ENSURE_ERROR_CLEANUP. That prevents archiver process from cleanly shut down when something wrong happnes outside the interuppt handler. In the frist place, why do we need to call the cleanup callback directly in the handler? We can let the handler return something instead to tell the pgarch_MainLoop to exit immediately on the normal path. > Is the shutdown callback meant to be called only after the archive > library is loaded? The documentation [1] says that it just gets called > before the archiver process exits. If this is true, can we just place > before_shmem_exit(call_archive_module_shutdown_callback, 0); in > PgArchiverMain() after on_shmem_exit(pgarch_die, 0);? +1 for using before_shmem_exit. regards. -- Kyotaro Horiguchi NTT Open Source Software Center
On Mon, Oct 17, 2022 at 02:30:52PM +0900, Kyotaro Horiguchi wrote: > At Mon, 17 Oct 2022 13:51:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in >> I am not sure to understand what you mean here. The shutdown callback >> is available once the archiver process has loaded the library defined >> in archive_library (assuming it is itself in shared_preload_libraries) >> and you cannot call something that does not exist yet. So, yes, you > > I guess that the "callback" there means the callback-caller function > (call_archive_module_shutdown_callback), which in turn is set as a > callback... A callback in a callback in a callback. >> could define the call to before_shmem_exit() a bit earlier because >> that would be a no-op until the library is loaded, but at the end that >> would be just registering a callback that would do nothing useful in a >> larger window, aka until the library is loaded. > > I thought that Bharath's point is to use before_shmem_exit() instead > of PG_ENSURE_ERROR_CLEANUP(). The place doesn't seem significant but > if we use before_shmem_exit(), it would be cleaner to place it > adjecent to on_sheme_exit() call. Removing PG_ENSURE_ERROR_CLEANUP() and relying on before_shmem_exit() is fine by me, that's what I imply upthread. -- Michael
Attachment
On Mon, Oct 17, 2022 at 11:17 AM Michael Paquier <michael@paquier.xyz> wrote: > > On Mon, Oct 17, 2022 at 02:30:52PM +0900, Kyotaro Horiguchi wrote: > > Removing PG_ENSURE_ERROR_CLEANUP() and relying on before_shmem_exit() > is fine by me, that's what I imply upthread. Hm. Here's a v2 patch that addresses review comments. In addition to making it a before_shmem_exit() callback, this patch also does the following things: 1) Renames call_archive_module_shutdown_callback() to be more meaningful and generic as before_shmem_exit() callback. 2) Clarifies when the archive module shutdown callback gets called in documentation. 3) Defines a shutdown callback that just emits a log message in shell_archive.c and tests it. Please review it further. -- Bharath Rupireddy PostgreSQL Contributors Team RDS Open Source Databases Amazon Web Services: https://aws.amazon.com
Attachment
On Tue, Oct 18, 2022 at 02:38:07PM +0530, Bharath Rupireddy wrote: > 2) Clarifies when the archive module shutdown callback gets called in > documentation. I have looked at that, and it was actually confusing as the callback would also be called on reload if archive_library changes, but the update somewhat outlines that this would happen only on postmaster shutdown. > 3) Defines a shutdown callback that just emits a log message in > shell_archive.c and tests it. The test had a few issues: - No need to wait for postmaster.pid in the test, as pg_ctl does this job. - The reload can be time-sensitive on slow machines, so I have added a query run to make sure that the reload happens before stopping the server. - slurp_file() was feeding on the full log file of standby2, but we should load it from the log location before stopping the server, even if log_min_messages was updated only at the end of the test. And done, after tweaking a few more things. -- Michael