Re: Postgres stucks in deadlock detection - Mailing list pgsql-hackers

From Юрий Соколов
Subject Re: Postgres stucks in deadlock detection
Date
Msg-id CAL-rCA1CVze9Y8uqJTH2vCffCvggcWQO6UQqaJnV9Q60NnJiyQ@mail.gmail.com
Whole thread Raw
In response to Re: Postgres stucks in deadlock detection  (Andres Freund <andres@anarazel.de>)
Responses Re: Postgres stucks in deadlock detection  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
пт, 13 апр. 2018 г., 21:10 Andres Freund <andres@anarazel.de>:
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > + /*
> > > +  * Ensure that only one backend is checking for deadlock.
> > > +  * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +  */
> > > + if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > + {
> > > +         enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +         return;
> > > + }
> > > + inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
>
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

What about my version? 
It still performs deadlock detection every time, but it tries to detect it with shared lock first,
and only if there is probability of real deadlock, it rechecks with exclusive lock. 

Although even shared lock leads to some stalleness for active transactions, but in the catastrophic situation, where many backends to check for inexisting deadlock at the same time, it greately reduce pause. 

Regards, 
Yura. 

pgsql-hackers by date:

Previous
From: Michael Paquier
Date:
Subject: Re: Proposal: Adding json logging
Next
From: Alvaro Herrera
Date:
Subject: Re: partitioning code reorganization