Re: group locking: incomplete patch, just for discussion - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: group locking: incomplete patch, just for discussion
Date
Msg-id CAM3SWZQ6Xh8+cMSFiuM3_GnTVBCo4rPZCMcN1adeDLKF31fx1g@mail.gmail.com
Whole thread Raw
In response to Re: group locking: incomplete patch, just for discussion  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Sun, Nov 23, 2014 at 3:59 PM, Andres Freund <andres@2ndquadrant.com> wrote:
>> >> The heavyweight locking issue is really killing me, though.
>> >
>> > I don't really understand why you're not content with just detecting
>> > deadlocks for now. Everything else seems like bells and whistles for
>> > later.
>>
>> I don't think it's OK for a parallel query to just randomly deadlock.
>
> I think that should be the end goal, yes.
>
>> It's OK for parallel query to excuse itself politely and retire from
>> the field, but I don't think it's OK for it to say, oh, sure, I can
>> parallelize that for you, and then fail by deadlocking with itself.
>
> But I also think that right now it doesn't necessarily need to be the
> focus immediately.

I am in strong agreement with Robert that unprincipled deadlocks are
not okay. Now, what that actually means in practice might be kind of
hard to delineate.

Take my ON CONFLICT UPDATE patch. The locking there is optimistic, and
so in theory there are risks of lock starvation [1]. There is no
axiomatic reason why a single backend cannot be starved of a lock in
the event of a perfect storm of conflicts. Similarly, the Lehman and
Yao paper makes what can only be called a plausibility argument around
how livelocks are not a concern in practice [2] when an indefinite
number of "move right" operations are required after a page split;
having written a bunch of theorems around the correctness of other
aspects of their algorithm, the argument here is more or less: "Come
on, that'll never happen!". Concurrency is hard.

If you want another example of this kind of thing, then there's how
Oracle deals with READ COMMITTED row conflicts -- the statement is
rolled back and restarted. Who is to say with certainty that it'll
make any progress the second time? Or the third or forth?

Note that the optimistic locking stuff was actually my solution to the
"unprincipled deadlocks" problem, a solution that, as I say, also
implies the theoretical risk of lock starvation (but not livelock). I
think I would also have found it acceptable if there was a solution
that reduced the original risk, the risk of *deadlocks* to merely a
theoretical one, as opposed to a very real one (that could be shown
with a simple testcase).

> This is a topic that I think will be much easier to
> approach if some demonstration of actual parallel users would be
> available. Doesn't have to be committable or such, but I think it'll
> help to shape how this has to look.  I think unrecognized deadlocks will
> make things too annoying to use, but I don't think it needs to be
> guaranteed deadlock free or such.

I couldn't agree more. The greater point about livelock, which I
believe applies here as well, is that there are different standards by
which we might consider that unprincipled deadlocks won't happen.
Basically, it seems inevitable that a practical standard sometimes
must be applied. If a well informed person, say myself or Andres,
cannot make any given practical implementation deadlock (or
alternatively be starved for locks, if we're talking about an
"optimistic locking" work-around to unprincipled deadlocks that
introduces that risk), then I guess that means that there are no
unprincipled deadlocks/livelocks, and I for one will probably find
that acceptable given enough testing.

I realize that that might sound terribly loose (I probably forgot a
few caveats), but what it boils down to is that we *don't* have the
tools to analyze certain types of problems like this, or at least I
don't think we do, because no one does. The closest thing we have is
some well written custom stress testing, and lots of scrutiny. Of
course I'd prefer a solution that is provably correct from first
principles (in other words, I'd prefer to use algorithms that are in
some formal sense non-blocking [3] or something along those lines),
but that will often be cost prohibitive or otherwise illusive when
designing concurrent algorithms.

I think I agree with you both.

[1] https://wiki.postgresql.org/wiki/UPSERT#Theoretical_lock_starvation_hazards
[2] http://www.cs.cornell.edu/courses/cs4411/2009sp/blink.pdf ,
Section 6.4 Livelock
[3] https://en.wikipedia.org/wiki/Non-blocking_algorithm
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Richard Frith-Macdonald
Date:
Subject: Re: Elusive segfault with 9.3.5 & query cancel
Next
From: Andrew Dunstan
Date:
Subject: Re: logical column ordering