On Tue, 2002-05-28 at 14:24, Tom Lane wrote:
> Stephen Robert Norris <srn@commsecure.com.au> writes:
> > One big difference, though, is that with the vacuum problem, the CPU
> > used is almost all (99%) system time; loading up the db with lots of
> > queries increases user time mostly, with little system time...
>
> Hmm, that's a curious point; leaves one wondering about possible kernel
> bugs.
>
> > In any event, it seems a bug that merely having connections open causes
> > this problem! They aren't even in transactions...
>
> If the problem is that you've launched far more backends than the system
> can really support, I'd have no hesitation in writing it off as user
> error. "Idle" processes are not without cost. But at this point
> I can't tell whether that's the case, or whether you're looking at a
> genuine performance bug in either Postgres or the kernel.
>
> Can you run strace (or truss or kernel-call-tracer-of-your-choice) on
> the postmaster, and also on one of the putatively idle backends, so
> we can see some more data about what's happening?
>
> regards, tom lane
I've already strace'ed the idle backend, and I can see the SIGUSR2 being
delivered just before everything goes bad.
What resource would you think idle backends might be exhausting?
On the production system, the problem doesn't happen under load (of
about 60-80 non-trivial queries/second) but does happen when the system
is largely idle. The number of connections is exactly the same in both
cases...
Stephen