Dividing line between before_shmem_exit and on_shmem_exit? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Dividing line between before_shmem_exit and on_shmem_exit? |
Date | |
Msg-id | 20210803023612.iziacxk5syn2r4ut@alap3.anarazel.de Whole thread Raw |
List | pgsql-hackers |
Hi, I don't think I really understand what decides what should use before_shmem_exit() and what into on_shmem_exit(). The existing comments say: /* ---------------------------------------------------------------- * before_shmem_exit * * Register early callback to perform user-level cleanup, * e.g. transaction abort, before we begin shutting down * low-level subsystems. * ---------------------------------------------------------------- */ /* ---------------------------------------------------------------- * on_shmem_exit * * Register ordinary callback to perform low-level shutdown * (e.g. releasing our PGPROC); run after before_shmem_exit * callbacks and before on_proc_exit callbacks. * ---------------------------------------------------------------- */ but I don't find it super obvious when a subsystem is "low-level" or "user-level". Which is not helped by a fairly inconsistent use of the mechanism. E.g. ParallelWorkerShutdown() is on_*, ShutdownPostgres, ShutdownAuxiliaryProcess are before_* and I can't really make sense of that. Nor do I really understand why dsm_backend_shutdown() is scheduled to be exactly between those phases? That seems to indicate that no "low-level" subsystem could ever use DSM, but I don't really understand that either. I realize it has to be reliably be re-invoked in case of errors as the comment in shmem_exit() explains - but does not really explain why that's the right moment to shut down DSM. I suspect it's mostly pragmatic, because it has to happen before we tear down "really fundamental" subsystems, but there's no easy way to call dsm_backend_shutdown() repeatedly at that point? The concrete reason I am asking is that for the shared memory stats patch I needed to adjust a few on/before determinations. The fundamental reason for that is that the shared memory stats patch uses DSM (since stats are variably sized), which means that anything that could report stats needs to happen in before_shmem_exit(). To make things work in a halfway reasonable fashion I needed to adjust: 1) Single user mode using on_shmem_exit(ShutdownXLOG()) to before*. From the stats side that makes sense - clearly checkpoints can produce stat worthy stuff, so it has to happen before the stats subsystem is inactive. I don't know if it's "low-level" in the sense referenced in the comments above though. 2) Change ParallelWorkerShutdown to before*. That seems consistent with other *Shutdown* routines, so ... 3) Have ParallelWorkerShutdown() also trigger detaching from its dsm segment. Some on_dsm_detach() callbacks trigger stats (e.g. sharedfileset.c causes fd.c to do ReportTemporaryFileUsage()), which still need to be reported. Unfortunately ordering on_dsm_detach() callbacks is *not* sufficient, because dsa.c needs to be able to create new dsm segments if the amount of stats grow - and those will be put at the head of dsm_segment_list, which in turn will lead to dsm_backend_shutdown() to detach from that newly added segment last. Oops. Pragmatically I can live with all of these changes and I don't think they should be blockers for the shmem stats patch. However, this situation feels deeply unsatisfying. We have to be very careful about nesting initialization of subsystems correctly (see e.g. comments for ShutdownPostgres's registration), there's no way to order dsm callbacks in a proper order, on_shmem_exit is used for things ranging from pgss_shmem_shutdown over ProcKill to IpcMemoryDelete, ... IOW, the reason that I am (and I think others) are struggling with before/on_shmem_exit is that the division is not really good enough. I'm not even convinced that dynamic registration of such callbacks is a good idea. At least for the important in core stuff it might be better to have a centrally maintained ordering of callbacks, plus a bit of state describing how far teardown has progressed (to deal with failing callback issues). Greetings, Andres Freund
pgsql-hackers by date: