Brian Fehrle <brianf@consistentstate.com> wrote:
> PostgreSQL 8.4.1 on x86_64-unknown-linux-gnu
Please upgrade to the latest bug fix release of PostgreSQL:
http://www.postgresql.org/support/versioning
To see what bug and security fixes you're missing, look at release
notes for 8.4.2 to 8.4.9 here:
http://www.postgresql.org/docs/8.4/static/release.html
There have been improvements in your areas of concern in 9.0 and
9.1, so you might want to start planning a major release upgrade.
That's not as painful as it used to be, with pg_upgrade.
> bgwriter_lru_maxpages | 500
FWIW, we do set this to 1000.
> max_connections | 2000
This is probably your biggest problem. Unless you've got 1000 CPUs
on this box, you should use a connection pooler which is
transaction-oriented, limits the number of database connections, and
queues requests for a new transaction when all connections are in
use. This will almost certainly improve throughput and limit
latency problems. You do not need 2000 connections to support 2000
cnocurrent users; such a setting will make it harder to provide 2000
concurrent users with decent and consistent performance.
> effective_cache_size | 16GB
Given your other settings, this seems likely to be low. I normally
add the cache space reported by the OS to the shared_buffers
setting.
> shared_buffers | 16GB
This is probably at least twice what it should be. If you are
having problems with backends writing too many buffers and problems
with clusters of I/O congestion, you might want to drop it to the
0.5 to 2.0 GB range.
> wal_buffers | 8MB
Might as well go to 16MB.
> work_mem | 64MB
Each of your connections can allocate this much space, potentially
several times, at the same moment. Unless you really have a monster
machine, 64MB * 2000 connections is just asking for out of memory
failures at unpredictable peak load times.
> One main thing is trying to lower 'spikey' disk IO so that
> performance is more consistent at any given time.
The advice above should help with that.
-Kevin