Thread: fix archive module shutdown callback

fix archive module shutdown callback

From
Nathan Bossart
Date:
Hi hackers,

Presently, when an archive module sets up a shutdown callback, it will be
called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive
library changes (via HandlePgArchInterrupts()), and upon normal shutdown.
There are a couple of problems with this:

* HandlePgArchInterrupts() calls the shutdown callback directly before
proc_exit().  However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to
pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the
shutdown callback.  This means that the shutdown callback will be called
twice whenever archive_library is changed via SIGHUP.

* PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL.  However,
the archiver operates at the bottom of the exception stack, so ERRORs are
treated as FATALs.  This means that PG_ENSURE_ERROR_CLEANUP is excessive.
We only need to set up the before_shmem_exit callback.

To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and
the call to the shutdown callback in HandlePgArchInterrupts() in favor of
just setting up a before_shmem_exit callback in LoadArchiveLibrary().

-- 
Nathan Bossart
Amazon Web Services: https://aws.amazon.com

Attachment

Re: fix archive module shutdown callback

From
Bharath Rupireddy
Date:
On Sun, Oct 16, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
>
> Hi hackers,
>
> Presently, when an archive module sets up a shutdown callback, it will be
> called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive
> library changes (via HandlePgArchInterrupts()), and upon normal shutdown.
> There are a couple of problems with this:

Yeah.

> * HandlePgArchInterrupts() calls the shutdown callback directly before
> proc_exit().  However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to
> pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the
> shutdown callback.  This means that the shutdown callback will be called
> twice whenever archive_library is changed via SIGHUP.
>
> * PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL.  However,
> the archiver operates at the bottom of the exception stack, so ERRORs are
> treated as FATALs.  This means that PG_ENSURE_ERROR_CLEANUP is excessive.
> We only need to set up the before_shmem_exit callback.
>
> To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and
> the call to the shutdown callback in HandlePgArchInterrupts() in favor of
> just setting up a before_shmem_exit callback in LoadArchiveLibrary().

We could have used a flag in call_archive_module_shutdown_callback()
to avoid it being called multiple times, but having it as
before_shmem_exit () callback without direct calls to it is the right
approach IMO.

+1 to remove PG_ENSURE_ERROR_CLEANUP and PG_END_ENSURE_ERROR_CLEANUP.

Is the shutdown callback meant to be called only after the archive
library is loaded? The documentation [1] says that it just gets called
before the archiver process exits. If this is true, can we just place
before_shmem_exit(call_archive_module_shutdown_callback, 0); in
PgArchiverMain() after on_shmem_exit(pgarch_die, 0);?

Also, I've noticed other 3 threads and CF entries all related to
'archive modules' feature. IMO, it could be better to have all of them
under a single thread and single CF entry to reduce
reviewers/committers' efforts and seek more thoughts about all of the
fixes.

https://commitfest.postgresql.org/40/3933/
https://commitfest.postgresql.org/40/3950/
https://commitfest.postgresql.org/40/3948/

[1]
  <sect2 id="archive-module-shutdown">
   <title>Shutdown Callback</title>
   <para>
    The <function>shutdown_cb</function> callback is called when the archiver
    process exits (e.g., after an error) or the value of
    <xref linkend="guc-archive-library"/> changes.  If no
    <function>shutdown_cb</function> is defined, no special action is taken in
    these situations.

-- 
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com



Re: fix archive module shutdown callback

From
Michael Paquier
Date:
On Sun, Oct 16, 2022 at 01:39:14PM +0530, Bharath Rupireddy wrote:
> Is the shutdown callback meant to be called only after the archive
> library is loaded? The documentation [1] says that it just gets called
> before the archiver process exits. If this is true, can we just place
> before_shmem_exit(call_archive_module_shutdown_callback, 0); in
> PgArchiverMain() after on_shmem_exit(pgarch_die, 0);?

I am not sure to understand what you mean here.  The shutdown callback
is available once the archiver process has loaded the library defined
in archive_library (assuming it is itself in shared_preload_libraries)
and you cannot call something that does not exist yet.  So, yes, you
could define the call to before_shmem_exit() a bit earlier because
that would be a no-op until the library is loaded, but at the end that
would be just registering a callback that would do nothing useful in a
larger window, aka until the library is loaded.

FWIW, I think that the documentation should clarify that the shutdown
callback is called before shmem exit.  That's important.

Another thing is that the shutdown callback is used by neither
shell_archive.c nor basic_archive.  We could do something about that,
actually, say by plugging an elog(DEBUG1) in shutdown_cb for
shell_archive.c to inform that the archiver is going down?
basic_archive could do that, but we already use shell_archive.c in a
bunch of tests, and this would need just log_min_messages=debug1 or
lower, so..

> Also, I've noticed other 3 threads and CF entries all related to
> 'archive modules' feature. IMO, it could be better to have all of them
> under a single thread and single CF entry to reduce
> reviewers/committers' efforts and seek more thoughts about all of the
> fixes.

I don't mind discussing each point separately as the first thread
dealing with archive modules is already very long, so the current way
of doing things makes sure to attract the correct type of attention
for each problem, IMO.
--
Michael

Attachment

Re: fix archive module shutdown callback

From
Kyotaro Horiguchi
Date:
At Mon, 17 Oct 2022 13:51:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in 
> On Sun, Oct 16, 2022 at 01:39:14PM +0530, Bharath Rupireddy wrote:
> > Is the shutdown callback meant to be called only after the archive
> > library is loaded? The documentation [1] says that it just gets called
> > before the archiver process exits. If this is true, can we just place
> > before_shmem_exit(call_archive_module_shutdown_callback, 0); in
> > PgArchiverMain() after on_shmem_exit(pgarch_die, 0);?
> 
> I am not sure to understand what you mean here.  The shutdown callback
> is available once the archiver process has loaded the library defined
> in archive_library (assuming it is itself in shared_preload_libraries)
> and you cannot call something that does not exist yet.  So, yes, you

I guess that the "callback" there means the callback-caller function
(call_archive_module_shutdown_callback), which in turn is set as a
callback...

> could define the call to before_shmem_exit() a bit earlier because
> that would be a no-op until the library is loaded, but at the end that
> would be just registering a callback that would do nothing useful in a
> larger window, aka until the library is loaded.

I thought that Bharath's point is to use before_shmem_exit() instead
of PG_ENSURE_ERROR_CLEANUP(). The place doesn't seem significant but
if we use before_shmem_exit(), it would be cleaner to place it
adjecent to on_sheme_exit() call.

> FWIW, I think that the documentation should clarify that the shutdown
> callback is called before shmem exit.  That's important.

Sure. What the shutdown callback can do differs by shared memory
access.

> Another thing is that the shutdown callback is used by neither
> shell_archive.c nor basic_archive.  We could do something about that,
> actually, say by plugging an elog(DEBUG1) in shutdown_cb for
> shell_archive.c to inform that the archiver is going down?
> basic_archive could do that, but we already use shell_archive.c in a 
> bunch of tests, and this would need just log_min_messages=debug1 or
> lower, so..

+1 for inserting DEBUG1.

> > Also, I've noticed other 3 threads and CF entries all related to
> > 'archive modules' feature. IMO, it could be better to have all of them
> > under a single thread and single CF entry to reduce
> > reviewers/committers' efforts and seek more thoughts about all of the
> > fixes.
> 
> I don't mind discussing each point separately as the first thread
> dealing with archive modules is already very long, so the current way
> of doing things makes sure to attract the correct type of attention
> for each problem, IMO.

I tend to agree to this.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: fix archive module shutdown callback

From
Kyotaro Horiguchi
Date:
At Sun, 16 Oct 2022 13:39:14 +0530, Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> wrote in 
> On Sun, Oct 16, 2022 at 3:43 AM Nathan Bossart <nathandbossart@gmail.com> wrote:
> >
> > Hi hackers,
> >
> > Presently, when an archive module sets up a shutdown callback, it will be
> > called upon ERROR/FATAL (via PG_ENSURE_ERROR_CLEANUP), when the archive
> > library changes (via HandlePgArchInterrupts()), and upon normal shutdown.
> > There are a couple of problems with this:
> 
> Yeah.
> 
> > * HandlePgArchInterrupts() calls the shutdown callback directly before
> > proc_exit().  However, the PG_ENSURE_ERROR_CLEANUP surrounding the call to
> > pgarch_MainLoop() sets up a before_shmem_exit callback that also calls the
> > shutdown callback.  This means that the shutdown callback will be called
> > twice whenever archive_library is changed via SIGHUP.
> >
> > * PG_ENSURE_ERROR_CLEANUP is intended for both ERROR and FATAL.  However,
> > the archiver operates at the bottom of the exception stack, so ERRORs are
> > treated as FATALs.  This means that PG_ENSURE_ERROR_CLEANUP is excessive.
> > We only need to set up the before_shmem_exit callback.
> >
> > To fix, the attached patch removes the use of PG_ENSURE_ERROR_CLEANUP and
> > the call to the shutdown callback in HandlePgArchInterrupts() in favor of
> > just setting up a before_shmem_exit callback in LoadArchiveLibrary().
> 
> We could have used a flag in call_archive_module_shutdown_callback()
> to avoid it being called multiple times, but having it as
> before_shmem_exit () callback without direct calls to it is the right
> approach IMO.
>
> +1 to remove PG_ENSURE_ERROR_CLEANUP and PG_END_ENSURE_ERROR_CLEANUP.

That prevents archiver process from cleanly shut down when something
wrong happnes outside the interuppt handler.  In the frist place, why
do we need to call the cleanup callback directly in the handler?  We
can let the handler return something instead to tell the
pgarch_MainLoop to exit immediately on the normal path.

> Is the shutdown callback meant to be called only after the archive
> library is loaded? The documentation [1] says that it just gets called
> before the archiver process exits. If this is true, can we just place
> before_shmem_exit(call_archive_module_shutdown_callback, 0); in
> PgArchiverMain() after on_shmem_exit(pgarch_die, 0);?

+1 for using before_shmem_exit.

regards.

-- 
Kyotaro Horiguchi
NTT Open Source Software Center



Re: fix archive module shutdown callback

From
Michael Paquier
Date:
On Mon, Oct 17, 2022 at 02:30:52PM +0900, Kyotaro Horiguchi wrote:
> At Mon, 17 Oct 2022 13:51:52 +0900, Michael Paquier <michael@paquier.xyz> wrote in
>> I am not sure to understand what you mean here.  The shutdown callback
>> is available once the archiver process has loaded the library defined
>> in archive_library (assuming it is itself in shared_preload_libraries)
>> and you cannot call something that does not exist yet.  So, yes, you
>
> I guess that the "callback" there means the callback-caller function
> (call_archive_module_shutdown_callback), which in turn is set as a
> callback...

A callback in a callback in a callback.

>> could define the call to before_shmem_exit() a bit earlier because
>> that would be a no-op until the library is loaded, but at the end that
>> would be just registering a callback that would do nothing useful in a
>> larger window, aka until the library is loaded.
>
> I thought that Bharath's point is to use before_shmem_exit() instead
> of PG_ENSURE_ERROR_CLEANUP(). The place doesn't seem significant but
> if we use before_shmem_exit(), it would be cleaner to place it
> adjecent to on_sheme_exit() call.

Removing PG_ENSURE_ERROR_CLEANUP() and relying on before_shmem_exit()
is fine by me, that's what I imply upthread.
--
Michael

Attachment

Re: fix archive module shutdown callback

From
Bharath Rupireddy
Date:
On Mon, Oct 17, 2022 at 11:17 AM Michael Paquier <michael@paquier.xyz> wrote:
>
> On Mon, Oct 17, 2022 at 02:30:52PM +0900, Kyotaro Horiguchi wrote:
>
> Removing PG_ENSURE_ERROR_CLEANUP() and relying on before_shmem_exit()
> is fine by me, that's what I imply upthread.

Hm. Here's a v2 patch that addresses review comments. In addition to
making it a before_shmem_exit() callback, this patch also does the
following things:
1) Renames call_archive_module_shutdown_callback() to be more
meaningful and generic as before_shmem_exit() callback.
2) Clarifies when the archive module shutdown callback gets called in
documentation.
3) Defines a shutdown callback that just emits a log message in
shell_archive.c and tests it.

Please review it further.

--
Bharath Rupireddy
PostgreSQL Contributors Team
RDS Open Source Databases
Amazon Web Services: https://aws.amazon.com

Attachment

Re: fix archive module shutdown callback

From
Michael Paquier
Date:
On Tue, Oct 18, 2022 at 02:38:07PM +0530, Bharath Rupireddy wrote:
> 2) Clarifies when the archive module shutdown callback gets called in
> documentation.

I have looked at that, and it was actually confusing as the callback
would also be called on reload if archive_library changes, but the
update somewhat outlines that this would happen only on postmaster
shutdown.

> 3) Defines a shutdown callback that just emits a log message in
> shell_archive.c and tests it.

The test had a few issues:
- No need to wait for postmaster.pid in the test, as pg_ctl does this
job.
- The reload can be time-sensitive on slow machines, so I have added a
query run to make sure that the reload happens before stopping the
server.
- slurp_file() was feeding on the full log file of standby2, but we
should load it from the log location before stopping the server, even
if log_min_messages was updated only at the end of the test.

And done, after tweaking a few more things.
--
Michael

Attachment