Re: Condition variable live lock - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Condition variable live lock
Date
Msg-id 20171229193843.tsbob2xsjfmxkq2g@alap3.anarazel.de
Whole thread Raw
In response to Re: Condition variable live lock  (Thomas Munro <thomas.munro@enterprisedb.com>)
Responses Re: Condition variable live lock  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2017-12-29 12:16:20 +1300, Thomas Munro wrote:
> Here is one way to fix it: track the wait queue size and use that
> number to limit the wakeup loop.  See attached.
>
> That's unbackpatchable though, because it changes the size of struct
> ConditionVariable, potentially breaking extensions compiled against an
> earlier point release.  Maybe this problem won't really cause problems
> in v10 anyway?  It requires a particular interaction pattern that
> barrier.c produces but more typical client code might not: the awoken
> backends keep re-adding themselves because they're waiting for
> everyone (including the waker) to do something, but the waker is stuck
> in that broadcast loop.

Hm, I'm not quite convinced by this approach. Partially because of the
backpatch issue you mention, partially because using the list length as
a limit doesn't seem quite nice.

Given that the proclist_contains() checks in condition_variable.c are
already racy, I think it might be feasible to collect all procnos to
signal while holding the spinlock, and then signal all of them in one
go.

Obviously it'd be nicer to not hold a spinlock while looping, but that
seems like something we can't fix in the back branches. [insert rant
about never using spinlocks unless there's very very clear convicing
reasons].

- Andres


pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: [PATCH] session_replication_role = replica with TRUNCATE
Next
From: Andres Freund
Date:
Subject: Re: [PATCH] Logical decoding of TRUNCATE