Re: Postgres stucks in deadlock detection - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Postgres stucks in deadlock detection
Date
Msg-id 20180413180948.rj5e3bsxhilvdccr@alap3.anarazel.de
Whole thread Raw
In response to Re: Postgres stucks in deadlock detection  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: Postgres stucks in deadlock detection
List pgsql-hackers
Hi,

On 2018-04-13 19:13:07 +0300, Konstantin Knizhnik wrote:
> On 13.04.2018 18:41, Andres Freund wrote:
> > On 2018-04-13 16:43:09 +0300, Konstantin Knizhnik wrote:
> > > Updated patch is attached.
> > > +    /*
> > > +     * Ensure that only one backend is checking for deadlock.
> > > +     * Otherwise under high load cascade of deadlock timeout expirations can cause stuck of Postgres.
> > > +     */
> > > +    if (!pg_atomic_test_set_flag(&ProcGlobal->activeDeadlockCheck))
> > > +    {
> > > +        enable_timeout_after(DEADLOCK_TIMEOUT, DeadlockTimeout);
> > > +        return;
> > > +    }
> > > +    inside_deadlock_check = true;
> > I can't see that ever being accepted.  This means there's absolutely no
> > bound for deadlock checks happening even under light concurrency, even
> > if there's no contention for a large fraction of the time.
> 
> It may cause problems only if
> 1. There is large number of active sessions
> 2. They perform deadlock-prone queries (so no attempts to avoid deadlocks at
> application level)
> 3. Deadlock timeout is set to be very small (10 msec?)

That's just not true.


> Otherwise either probability that all backends  once and once again are
> trying to check deadlocks concurrently is very small (and can be even more
> reduced by using random timeout for subsequent deadlock checks), either
> system can not normally function in any case because large number of clients
> fall into deadlock.

Operating systems batch wakeups.


> I completely agree that there are plenty of different approaches, but IMHO
> the currently used strategy is the worst one, because it can stall system
> even if there are not deadlocks at all.


> I always think that deadlock is a programmer's error rather than normal
> situation. May be it is wrong assumption

It is.


> So before implementing some complicated solution of the problem9too slow
> deadlock detection), I think that first it is necessary to understand
> whether there is such problem at al and under which workload it can happen.

Sure. I'm not saying that you shouldn't experiment with a patch like the
one you sent. What I am saying is that that can't be the actual solution
that will be integrated.

Greetings,

Andres Freund


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: crash with sql language partition support function
Next
From: Andres Freund
Date:
Subject: Re: crash with sql language partition support function