Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks - Mailing list pgsql-hackers
From | Bharath Rupireddy |
---|---|
Subject | Re: Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks |
Date | |
Msg-id | CALj2ACXJgJ9LtrgFQK4C2sd1r26TyLb1EFx_VXRw1LouYZ0czg@mail.gmail.com Whole thread Raw |
In response to | Issue with cancel_before_shmem_exit while searching to remove a particular registered exit callbacks (Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com>) |
List | pgsql-hackers |
On Tue, Jul 7, 2020 at 10:24 PM Andres Freund <andres@anarazel.de> wrote: > > On 2020-07-07 09:44:41 -0400, Tom Lane wrote: > > Bharath Rupireddy <bharath.rupireddyforpostgres@gmail.com> writes: > > > Firstly, pg_start_backup registers nonexclusive_base_backup_cleanup as > > > on_shmem_exit call back which will > > > add this function to the before_shmem_exit_list array which is > > > supposed to be removed on pg_stop_backup > > > so that we can do the pending cleanup and issue a warning for each > > > pg_start_backup for which we did not call > > > the pg_stop backup. Now, I executed a query for which JIT is picked > > > up, then the the llvm compiler inserts it's > > > own exit callback i.e. llvm_shutdown at the end of > > > before_shmem_exit_list array. Now, I executed pg_stop_backup > > > and call to cancel_before_shmem_exit() is made with the expectation > > > that the nonexclusive_base_backup_cleanup > > > callback is removed from before_shmem_exit_list array. > > > > I'm of the opinion that the JIT code is abusing this mechanism, and the > > right thing to do is fix that. > > What are you proposing? For now we could easily enough work around this > by just making it a on_proc_exit() callback, but that doesn't really > change the fundamental issue imo. > > > The restriction you propose to remove is not just there out of > > laziness, it's an expectation about what safe use of this mechanism > > would involve. Un-ordered removal of callbacks seems pretty risky: it > > would mean that whatever cleanup is needed is not going to be done in > > LIFO order. > I quickly searched(in HEAD) what all the callbacks are getting registered to before_shmem_exit_list, with the intention to see if they also call corresponding cancel_before_shmem_exit() after their intended usage is done. For few of the callbacks there is no cancel_before_shmem_exit(). This seems expected; those callbacks ought to be executed before shmem exit. These callbacks are(let say SET 1): ShutdownPostgres, logicalrep_worker_onexit, llvm_shutdown, Async_UnlistenOnExit, RemoveTempRelationsCallback, ShutdownAuxiliaryProcess, do_pg_abort_backup in xlog.c (this callback exist only in v13 or later), AtProcExit_Twophase. Which means, once they are into the before_shmem_exit_list array, in some order, they are never going to be removed from it as they don't have corresponding cancel_before_shmem_exit() and the relative order of execution remains the same. And there are other callbacks that are getting registered to before_shmem_exit_list array(let say SET 2): apw_detach_shmem, _bt_end_vacuum_callback, pg_start_backup_callback, pg_stop_backup_callback, createdb_failure_callback, movedb_failure_callback, do_pg_abort_backup(in basebackup.c). They all have corresponding cancel_before_shmem_exit() to unregister/remove the callbacks from before_shmem_exit_list array. I think the callbacks that have no cancel_before_shmem_exit()(SET 1) may have to be executed in the LIFO order: it makes sense to execute ShutdownPostgres at the end after let's say other callbacks in SET 1. And the SET 2 callbacks have cancel_before_shmem_exit() with the only intention that there's no need to call the callbacks on the before_shmem_exit(), since they are not needed, and try to remove from the before_shmem_exit_list array and may fail, if any other callback gets registered in between. If I'm not wrong with the above points, we must enhance cancel_before_shmem_exit() or have cancel_before_shmem_exit_v2() (as mentioned in my below response). > > Maybe I am confused, but isn't it <13's pg_start_backup() that's > violating the protocol much more clearly than the JIT code? Given that > it relies on there not being any callbacks registered between two SQL > function calls? I mean, what it does is basically: > > 1) before_shmem_exit(nonexclusive_base_backup_cleanup... > 2) arbitrary code executed for a long time > 3) cancel_before_shmem_exit(nonexclusive_base_backup_cleanup... > > which pretty obviously can't at all deal with any other > before_shmem_exit callbacks being registered in 2). Won't this be a > problem for every other before_shmem_exit callback that we register > on-demand? Say Async_UnlistenOnExit, RemoveTempRelationsCallback? > Yes, for versions <13's, clearly pg_start_backup causes the problem and the issue can also be reproduced with Async_UnlistenOnExit, RemoveTempRelationsCallback coming in between pg_start_backup and pg_stop_backup. We can have it fixed in a few ways: 1) enhance cancel_before_shmem_exit() as attached in the original patch. 2) have existing cancel_before_shmem_exit(), whenever called for nonexclusive_base_backup_cleanup(), we can look for the entire array instead of just the last entry. 3) have a separate function, say, cancel_before_shmem_exit_v2(), that searches for the entire before_shmem_exit_list array(the logic proposed in this patch) so that it will not disturb the existing cancel_before_shmem_exit(). 4) or try to have the pg_start_backup code that exists in after > 13 versions. If okay to have cancel_before_shmem_exit_v2() for versions < 13's, with the searching for the entire array instead of just the last element to fix the abort issue, maybe we can have this function in version 13 and latest as well(at least with disable mode, something like #if 0 ... #endif), so that in case if any of similar issues arise we could just quickly reuse. If the before_shmem_exit_list array is to be used in LIFO order, do we have some comment/usage guideline mentioned in the ipc.c/.h? I didn't find one. Thoughts? With Regards, Bharath Rupireddy. EnterpriseDB: http://www.enterprisedb.com
pgsql-hackers by date: