Re: Proposal of tunable fix for scalability of 8.4 - Mailing list pgsql-performance

From Greg Smith
Subject Re: Proposal of tunable fix for scalability of 8.4
Date
Msg-id alpine.GSO.2.01.0903122152120.16050@westnet.com
Whole thread Raw
In response to Re: Proposal of tunable fix for scalability of 8.4  (Scott Carey <scott@richrelevance.com>)
List pgsql-performance
On Thu, 12 Mar 2009, Scott Carey wrote:

> Furthermore, if the problem was due to too much concurrency in the
> database with active connections, its hard to see how changing the lock
> code would change the result the way it did ?

What I wonder about is if the locking mechanism is accidentally turning
into a CPU resource scheduling problem on this benchmark.  If the
connections were pooled instead, control over that scheduling would be
more explicit, because connections would more directly map onto physical
CPUs.  What if the fall-off is because the sum of the working code set
here is simply exceeding the sum of the CPU caching available once the
number of active connections gets big enough?  The real problem could be
that the connections waiting on ProcArray are just falling out of cache,
such that when they do wake up they take a while to page back in and keep
going.

I wouldn't actually bet anything on that theory though, or any of the
others offered here.  I find wandering into performance bottleneck
analysis presuming you know what's going on to be dangerous.  The bigger
issue here is that Jignesh is using a configuration known to be
problematic (lots of connections), which introduces some uncertaintly
about the true root cause here.  Whether it's well founded or not, it
still hurts his case.

And to step back for a second, after reading up on it again I see that
Sun's internal iGen-OLTP benchmark "stresses lock management and
connectivity"[1], which makes me wonder even more than I did before about
how specific this fix is to this workload.

[1] http://blogs.sun.com/bmseer/entry/t2000_adds_database_leadership_to

> First just run a test with a tiny delay (5ms? 0?) and fewer users to
> compare.  If your theory that a connection pooler would help, that test
> would provide higher throughput with low user count and not be lock
> limited.

If the symptoms stay the same but are just scaled to a much lower
connection count, that might help rule out some types of context switching
and caching problem from the list of most likely suspects.  Might as well
make it 0ms to minimize the number of connections.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD

pgsql-performance by date:

Previous
From: Robert Haas
Date:
Subject: Re: Proposal of tunable fix for scalability of 8.4
Next
From: Greg Smith
Date:
Subject: Re: Proposal of tunable fix for scalability of 8.4