Re: How to cripple a postgres server - Mailing list pgsql-general

From Tom Lane
Subject Re: How to cripple a postgres server
Date
Msg-id 17723.1022627329@sss.pgh.pa.us
Whole thread Raw
In response to Re: How to cripple a postgres server  (Stephen Robert Norris <srn@commsecure.com.au>)
Responses Re: How to cripple a postgres server
List pgsql-general
Stephen Robert Norris <srn@commsecure.com.au> writes:
> I've already strace'ed the idle backend, and I can see the SIGUSR2 being
> delivered just before everything goes bad.

>> Yes, but what happens after that?

> The strace stops until I manually kill the connecting process - the
> machine stops in general until then (vmstat 1 stops producing output,
> shells stop responding ...). So who knows what happens :(

Hmm, I hadn't quite understood that you were complaining of a
system-wide lockup and not just Postgres getting wedged.  I think the
chances are very good that this *is* a kernel bug.  In any case, no
self-respecting kernel hacker would be happy with the notion that
a completely unprivileged user program can lock up the whole machine.
So even if Postgres has got a problem, the kernel is clearly failing
to defend itself adequately.

Are you able to reproduce the problem with fewer than 800 backends?
How about if you try it on a smaller machine?

Another thing that would be entertaining to try is other ways of
releasing 800 queries at once.  For example, on connection 1 do
    BEGIN; LOCK TABLE foo;
then issue a "SELECT COUNT(*) FROM foo" on each other connection,
and finally COMMIT on connection 1.  If that creates similar misbehavior
then I think the SI-overrun mechanism is probably not to be blamed.

> ... Sometimes, the
> SIGUSR2 does just create a very brief load spike (vmstat shows >500
> processes on the run queue, but the next second everything is back to
> normal and no unusual amount of CPU is consumed).

That's the behavior I'd expect.  We need to figure out what's different
between that case and the cases where it locks up.

            regards, tom lane

pgsql-general by date:

Previous
From: Stephen Robert Norris
Date:
Subject: Re: How to cripple a postgres server
Next
From: Marcia Abade
Date:
Subject: Privileges Doubts