Widening the perspective a bit, our current approach for such error-handling / shutdown functions seems ... not maintainable. In particular we have a substantial number of top-level sigsetjmp() handlers that have slightly different recovery codepaths. When adding a new process type, how is one supposed to even know what function one needs to call?
PostgresMain() has this comment: /* * NOTE: if you are tempted to add more code in this if-block, * consider the high probability that it should be in * AbortTransaction() instead. The only stuff done directly here * should be stuff that is guaranteed to apply *only* for outer-level * error recovery, such as adjusting the FE/BE protocol status. */
But a) that clearly hasn't worked, given how long the following block is, and b) can't really work that well, because we have plenty processes that don't use transaction, therefore putting cleanup in AbortTransaction() doesn't go that far.
I don't quite know how it should look like, but it seems pretty obvious that it should be more unified than it is. My gut feeling is that we should have a single error recovery function that has a flags argument guiding which subsystems should be reset and which shouldn't.
+1 for the idea. I am interested in writing a patch for the same if you would like.
As you mentioned, these code blocks under the if (sigsetjmp(local_sigjmp_buf, 1) != 0) statement have a very similar set of function calls, depending on the process type—whether it's an auxiliary process, background process, or client backend. There are also similarities across these types; for instance, all of them register a ProcKill callback as an on_shmem_exit callback..
Currently, we perform LWLockReleaseAll() at the before_shmem_exit stage, such as in the ShutdownAuxiliaryProcess() callback for auxiliary processes and ShutdownPostgres() for client backends.
However, there are some inconsistencies: 1. The client backend does not call LWLockReleaseAll() until ProcKill(), if it is not in a transaction. 2. The background processes and AutovacuumLauncher do not register a before_shmem_callback for calling LWLockReleaseAll() but may invoke it directly within the sigsetjmp() blocks.
3. Some auxiliary processes also call LWLockReleaseAll() early in the sigsetjmp() block.
Seems we should just reorder the sequence in ProcKill() to first call LWLockReleaseAll()
It would be too late to call it in ProcKill, since dsm_backend_shutdown() would
detach segments containing the LWLocks before this. Using a before_shmem_exit callback or handling it in the initial steps of sigsetjmp might be more suitable.