Re: "large" spam tables and performance: postgres memory parameters - Mailing list pgsql-performance

From Kevin Grittner
Subject Re: "large" spam tables and performance: postgres memory parameters
Date
Msg-id 4B45C714020000250002DEF4@gw.wicourts.gov
Whole thread Raw
In response to "large" spam tables and performance: postgres memory parameters  (Gary Warner <gar@cis.uab.edu>)
List pgsql-performance
Welcome out of the shadows, Gary!  ;-)

Gary Warner <gar@cis.uab.edu> wrote:

> My biggest question mark there really has to do with how many
> users I have and how that might alter the results.

In benchmarks I've run with our software load, I've found that I get
best throughput when I use a connection pool which limits the active
database transaction count to (2 * CPU_count) + effective_spindles.
CPU_count should be fairly obvious; effective_spindles is
essentially "what's the maximum number of random read requests your
disk subsystem can productively handle concurrently?"

One or two others commented that their benchmark results seemed to
fit with that formula.  I don't know just how far to trust it as a
generalization, but in the absence of anything else, it probably
isn't a horrible rule of thumb.  If you expect to have more logical
connections than that number, you might want to establish a
connection pool which limits to that number.  Be sure that if it is
"full" when a request to start a transaction comes in, the request
queues instead of failing.

To convince yourself that this really helps, picture a hypothetical
machine which can only make progress on one request at a time, but
will task-switch among as many requests as are presented.  Then
picture 100 requests being presented simultaneously, each of which
needs one second of time to complete.  Even without figuring in the
overhead of task switching or the cache effects, it's clear that a
connection "pool" of one connection improve response time by almost
50% with no cost in throughput. When you factor in the very real
costs of having large numbers of requests competing, both throughput
and response time win with connection pooling above some threshold.
Of course, with multiple CPUs, multiple spindles, network latency,
etc., the pool should be large enough to tend to keep them all busy.

Of course, the exact point at which a connection pool gives optimal
performance depends on so many factors that the only way to *really*
get it right is to test with a realistic load through your actual
software.  The above is just intended to suggest a reasonable
starting point.

-Kevin

pgsql-performance by date:

Previous
From: "Kevin Grittner"
Date:
Subject: Re: Massive table (500M rows) update nightmare
Next
From: Kevin Kempter
Date:
Subject: Re: Massive table (500M rows) update nightmare