On Tue, 2002-05-28 at 23:29, Tom Lane wrote:
> Stephen Robert Norris <srn@commsecure.com.au> writes:
> > I've already strace'ed the idle backend, and I can see the SIGUSR2 being
> > delivered just before everything goes bad.
>
> Yes, but what happens after that?
The strace stops until I manually kill the connecting process - the
machine stops in general until then (vmstat 1 stops producing output,
shells stop responding ...). So who knows what happens :(
>
> If you don't see anything obvious by examining a single process, maybe
> strace'ing the postmaster + all descendant processes would offer a
> better viewpoint.
>
> > What resource would you think idle backends might be exhausting?
>
> Difficult to say. I suspect your normal load doesn't have *all* the
> backends trying to run queries at once. But in any case the SIGUSR2
> event should only produce a momentary spike in load, AFAICS.
>
> regards, tom lane
I agree (about not having 800 simultaneous queries). Sometimes, the
SIGUSR2 does just create a very brief load spike (vmstat shows >500
processes on the run queue, but the next second everything is back to
normal and no unusual amount of CPU is consumed).
This sort of rules out (to me) a kernel problem, unless it's something
triggered at a specific number of processes (like 700 is bad, 699 is
OK).
Stephen