Josh Berkus <josh@agliodbs.com> writes:
>> I think you are probably looking at the same problem previously reported
>> by Josh Berkus among others.
> That would be interesting. Previously we've only demonstrated the
> problem on long-running queries, but I suppose it could also affect
> massive concurrent query access.
Well, the test cases we used were designed to get the system into a
tight loop of grabbing and releasing shared buffers --- a long-running
index scan is certainly one of the best ways to do that, but there are
others.
I hadn't focused before on the point that Jason is launching a new
connection for every query. In that scenario I think the bulk of the
cycles are going to go into loading the per-backend catalog caches with
the system catalog rows that are needed to parse and plan the query.
The catalog fetches to get those rows are effectively mini-queries
with preset indexscan plans, so it's not hard to believe that they'd be
hitting the BufMgrLock nearly as hard as a tight indexscan loop. Once
all the pages needed are cached in shared buffers, there's no I/O delays
to break the loop, and so you could indeed get into the context swap
storm regime we saw before.
I concur with the thought that using persistent connections might go a
long way towards alleviating his problem.
regards, tom lane