Srinath Reddy <srinath2133@gmail.com> writes: > when we kill postmaster using kill -9 and start immediately it crashes with >> FATAL: pre-existing shared memory block (key 2495405, ID 360501) is still >> in use
"Doctor, it hurts when I do this!"
"So don't do that!"
This is not a supported way of shutting down the postmaster, and it never will be. Use SIGINT, or SIGQUIT if you are in a desperate hurry and are willing to have the next startup take longer.
i was actually trying to recreate power outage scenario using node->kill9(),node->start() in a custom tap test,then i found this crash.
I think the specific reason you are seeing this is that it takes nonzero time for the postmaster's orphaned child processes to notice that the postmaster is gone and terminate. As long as any of those children remain, the shared memory block will have a nonzero reference count. The new postmaster sees that and refuses to start, for the very sound reason that it risks data corruption if it brings up a new set of worker processes while any of the old ones are still running.
regards, tom lane
i am guessing you mean "reference count to shared memory block" means shmem_nattach right? i think this will be incremented by 1 when a process attached to the shmem segment using shmat() in postgres case its the postmaster who attaches during creation of shmem segment and detaches during postmaster's on_shmem_exit is called during if it exits properly or not dies suddenly (as the case with kill -9) ,during detaching only the shmem_nattach will be decremented by 1 ,AFAIK the child processes will get to use the shmem segment but never attaches or detaches so they are not effecting the shmem_nattach.so as the shmem_nattach is not 0 PGSharedMemoryAttach thinks the shmem state is still attached and in use.