Re: Condition variable live lock - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Condition variable live lock
Date
Msg-id CA+TgmoZykfhWOQWqFcCHL2a-=QVigssd0og1x2AU6jtJsLN-iw@mail.gmail.com
Whole thread Raw
In response to Re: Condition variable live lock  (Andres Freund <andres@anarazel.de>)
Responses Re: Condition variable live lock
List pgsql-hackers
On Fri, Dec 29, 2017 at 2:38 PM, Andres Freund <andres@anarazel.de> wrote:
> Hm, I'm not quite convinced by this approach. Partially because of the
> backpatch issue you mention, partially because using the list length as
> a limit doesn't seem quite nice.

Seems OK to me.  Certainly better than your competing proposal.

> Given that the proclist_contains() checks in condition_variable.c are
> already racy, I think it might be feasible to collect all procnos to
> signal while holding the spinlock, and then signal all of them in one
> go.

That doesn't seem very nice at all.  Not only does it violate the
coding rule against looping while holding a spinlock, but it seems
that it would require allocating memory while holding one, which is a
non-starter.

> Obviously it'd be nicer to not hold a spinlock while looping, but that
> seems like something we can't fix in the back branches. [insert rant
> about never using spinlocks unless there's very very clear convicing
> reasons].

I don't think that's a coding rule that I'd be prepared to endorse.
We've routinely used spinlocks for years in cases where the critical
section was very short, just to keep the overhead down.  I think it
works fine in that case, although I admit that I failed to appreciate
how unpleasant the livelock possibilities were in this case.

It's not clear to me that we entirely need a back-patchable fix for
this.  It could be that parallel index scan can have the same issue,
but I'm not aware of any user complaints.  Parallel bitmap heap only
ever waits once so it's probably fine.  If we do need a back-patchable
fix, I suppose slock_t mutex could be replaced by pg_atomic_uint32
state.  I think that would avoid changing the size of the structure on
common platforms, though obscure systems with spinlocks > 4 bytes
might be affected.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] MERGE SQL Statement for PG11