On 25/07/17 01:33, Alvaro Herrera wrote:
> Alvaro Herrera wrote:
>> I'm back at looking into this again, after a rather exhausting week. I
>> think I have a better grasp of what is going on in this code now,
>
> Here's an updated patch, which I intend to push tomorrow.
>
I think the ConditionVariablePrepareToSleep() call in
ReplicationSlotAcquire() needs to be done inside the mutex lock
otherwise there is tiny race condition where the process which has slot
acquired might release the slot between the mutex unlock and the
ConditionVariablePrepareToSleep() call which means we'll never get
signaled and ConditionVariableSleep() will wait forever.
Otherwise the patch looks good to me.
As a side note, the ConditionVariablePrepareToSleep()'s comment could be
improved because currently it says the only advantage is that we skip
double-test in the beginning of ConditionVariableSleep(). But that's not
true, it's essential for preventing race conditions like the one above
because it puts the current process into waiting list so we can be sure
it will be signaled on broadcast once ConditionVariablePrepareToSleep()
has been called.
-- Petr Jelinek http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training &
Services