On Tue, Feb 12, 2019 at 10:15 PM Sergei Kornilov <sk@zsrv.org> wrote:
> I still have error with parallel_leader_participation = off.
Justin very kindly set up a virtual machine similar to the one where
he'd seen the problem so I could experiment with it. Eventually I
also managed to reproduce it locally, and have finally understood the
problem.
It doesn't happen on master (hence some of my initial struggle to
reproduce it) because of commit 197e4af9, which added srandom() to set
a different seed for each parallel workers. Perhaps you see where
this is going already...
The problem is that a DSM handle (ie a random number) can be reused
for a new segment immediately after the shared memory object has been
destroyed but before the DSM slot has been released. Now two DSM
slots have the same handle, and dsm_attach() can be confused by the
old segment and give up.
Here's a draft patch to fix that. It also clears the handle in a case
where it wasn't previously cleared, but that wasn't strictly
necessary. It just made debugging less confusing.
--
Thomas Munro
http://www.enterprisedb.com