Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
Date
Msg-id CA+TgmobBH1SGmQY6G9x4nth03bJXyg2FXEbmNqs-_aRp1peAgQ@mail.gmail.com
Whole thread Raw
In response to Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks
List pgsql-hackers
On Mon, Aug 10, 2020 at 10:44 AM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I agree that doing nothing seems like a bad idea.  My concern about
> allowing non-LIFO callback removal is that it seems to me that such
> usage patterns have likely got *other* bugs, so we should discourage
> that.  These callbacks don't exist in a vacuum: they reflect that
> the mainline code path has set up, or torn down, important state.
> Non-LIFO usage requires very strong assumptions that the states in
> question are not interdependent, and that's something I'd rather not
> rely on if we don't have to.

I have mixed feelings about this. On the one hand, I think saying that
it requires "very strong assumptions" about interdependence makes it
sound scarier than it is -- not that your statement is wrong, but that
in most cases we kinda know whether that's true or not, and the idea
that you shouldn't remove a callback upon which some later callback
might be depending is not too difficult for anybody to understand. On
the other hand, I think that in most cases we ought to be discouraging
people who are trying to do per-subsystem cleanup from using
before_shmem_exit() at all. I think such callbacks ought to be done
using on_shmem_exit() if they involve shared memory or on_proc_exit()
if they do not. Those functions don't have cancel_blah_exit()
variants, and I don't think they should: the right way to code those
things is not to remove the callbacks from the stack when they're no
longer needed, but rather to code the callbacks so that they will do
nothing if no work is required, leaving them permanently registered.

And the main reason why I think that such callbacks should be
registered using on_shmem_exit() or on_proc_exit() rather than
before_shmem_exit() is because of (1) the interdependency issue you
raise and (2) the fact that cancel_before_shmem_exit doesn't do what
people tend to think it does. For an example of (1), look at
ShutdownPostgres() and RemoveTempRelationsCallback(). The latter
aborts out of any transaction and then starts a new one to drop your
temp schema. But that might fail, so ShutdownPostgres() also needs to
be prepared to AbortOutOfAnyTransaction(). If you inserted more
cleanup steps that were thematically similar to those, each one of
them would also need to begin with AbortOutOfAnyTransaction(). That
suggests that this whole thing is a bit under-engineered. Some of the
other before_shmem_exit() callbacks don't start with that incantation,
but that's because they are per-subsystem callbacks that likely ought
to be using on_shmem_exit() rather than actually being the same sort
of thing.

Perhaps we really have four categories here:
(1) Temporary handlers for PG_ENSURE_ERROR_CLEANUP().
(2) High-level cleanup that needs to run after aborting out of the
current transaction.
(3) Per-subsystem shutdown for shared memory stuff.
(4) Per-subsystem shutdown for backend-private stuff.

Right now we blend (1), (2), and some of (3) together, but we could
try to create a cleaner line. We could redefine before_shmem_exit() as
being exactly #2, and abort out of any transaction before calling each
step, and document that you shouldn't use it unless you need that
behavior. And we could have a separate stack for #1 that is explicitly
LIFO and not intended for any other use. But then again maybe that's
overkill. What I do think we should do, after thinking about it more,
is discourage the casual use of before_shmem_exit() for things where
on_shmem_exit() or on_proc_exit() would be just as good. I think
that's what would avoid the most problems here.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: pendingOps table is not cleared with fsync=off
Next
From: Tom Lane
Date:
Subject: Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks