On Tue, Dec 24, 2019 at 3:10 PM Thomas Munro <thomas.munro@gmail.com> wrote:
> On Sat, Dec 21, 2019 at 2:10 PM Shawn Debnath <sdn@amazon.com> wrote:
> > On Fri, Dec 20, 2019 at 12:05:34PM +1300, Thomas Munro wrote:
> > > I think we should probably just remove the unusual ResetLatch() call,
> > > rather than adding a CFI(). See attached. Thoughts?
> >
> > I agree: removing the ResetLatch() and having the wait event code deal
> > with it is a better way to go. I wonder why the reset was done in the
> > first place?
I have pushed this on master only.
> Thanks for the review. I was preparing to commit this today, but I
> think I've spotted another latch protocol problem in the stable
> branches only and I'd to get some more eyeballs on it. I bet it's
> really hard to hit, but ConditionVariableSleep()'s return path (ie
> when the CV has been signalled) forgets that the the latch is
> multiplexed and latches are merged: just because it was set by
> ConditionVariableSignal() doesn't mean it wasn't *also* set by die()
> or some other interrupt, and yet at the point we return, we've reset
> the latch and not run CFI(), and there's nothing to say the caller
> won't then immediately wait on the latch in some other code path, and
> sleep like a log despite the earlier delivery of (say) SIGTERM. If
> I'm right about that, I think the solution is to move that CFI down in
> the stable branches (which you already did for master in commit
> 1321509f).
I'm not going to back-patch this (ie the back-branches version from my
previous email) without more discussion, but I still think it's subtly
broken.