Thread: Degrading PostgreSQL 8.4 write performance

Degrading PostgreSQL 8.4 write performance

From
Kabu Taah
Date:

Load testing of postgresql 8.4 for OLTP application suitability showed that throughput of the database significantly degraded over time from thousands of write transactions per second to almost zero. Write transactions are in given case insert/update/delete database transactions. The load driver used for testing the database executed SQL queries in parallel threads and used prepared statement and connection pooling. Postgres performance degraded in a couple of minutes after the first run of the test, and the problem was reproducible with only 2 parallel client threads. Subsequent test executions showed degraded throughput since the beginning. The degradation has been detected only in case of write transactions - select transactions were not affected. After some time or after server restart the problem is reproducible - test achieves high throughput and then degrades again. Linux top does not show any postgres processes performing any significant work, CPU usage during the test after degradation is <1%, io waits are also normal.

Machine used for the test is:
Red Hat Enterprise Linux AS release 4 (Nahant Update 6)
8 CPU @ 2GHz
16GB RAM
WAL and data are on separate SSD drives

Server is initially configured as dedicated OLTP transaction processing:

Options changed from default:
max_connections = 150
shared_buffers = 4GB
wal_buffers = 16MB
checkpoint_segments = 80
maintenance_work_mem = 2GB

Modified kernel params:
kernel.shmmax = 8932986880
kernel.shmall = 2180905
kernel.sem = 500 64000 200 256

Disabling and tuning autovacuum did not give any results.

Any suggestions?




----------------------------
Täna teleka ette ei jõua? Pane film salvestama!
minuTV.ee

Re: Degrading PostgreSQL 8.4 write performance

From
Merlin Moncure
Date:
On Fri, Jun 17, 2011 at 7:48 AM, Kabu Taah <kabuutah@hot.ee> wrote:
> Load testing of postgresql 8.4 for OLTP application suitability showed that
> throughput of the database significantly degraded over time from thousands
> of write transactions per second to almost zero. Write transactions are in
> given case insert/update/delete database transactions. The load driver used
> for testing the database executed SQL queries in parallel threads and used
> prepared statement and connection pooling. Postgres performance degraded in
> a couple of minutes after the first run of the test, and the problem was
> reproducible with only 2 parallel client threads. Subsequent test executions
> showed degraded throughput since the beginning. The degradation has been
> detected only in case of write transactions - select transactions were not
> affected. After some time or after server restart the problem is
> reproducible - test achieves high throughput and then degrades again. Linux
> top does not show any postgres processes performing any significant work,
> CPU usage during the test after degradation is <1%, io waits are also
> normal.

There are a ton of potential causes of this.  The problem could be in
your code, the database driver, etc.  The first step is to try and
isolate a query that is not running properly and to benchmark it with
explain analyze.   Being able to reproduce the problem in pgbench
would explain a lot as well.

merlin

Re: Degrading PostgreSQL 8.4 write performance

From
Greg Smith
Date:
On 06/17/2011 08:48 AM, Kabu Taah wrote:
>
> Load testing of postgresql 8.4 for OLTP application suitability showed
> that throughput of the database significantly degraded over time from
> thousands of write transactions per second to almost zero...Postgres
> performance degraded in a couple of minutes after the first run of the
> test, and the problem was reproducible with only 2 parallel client
> threads.
>

When you write with PostgreSQL, things that are modified ("dirtied") in
its cache are written out to the operating system write cache.
Eventually, that data gets written by the OS; sometimes it takes care of
it on its own, in others the periodic database checkpoints (at least
every 5 minutes) does it.

It's possible to get a false idea that thousands of transactions per
second is possible for a few minutes when benchmarking something,
because of how write caches work.  The first few thousand transactions
are going to fill up the following caches:

-Space for dirty data in shared_buffers
-Operating system write cache space
-Non-volatile ache on any RAID controller being used
-Non-volatile cache on any drives you have (some SSDs have these)

Once all three of those are full, you are seeing the true write
throughput of the server.  And it's not unusual for that to be 1/10 or
less of the rate you saw when all the caches were empty, and writes to
disk weren't actually happening; they were just queued up.

You can watch Linux's cache fill up like this:

watch cat /proc/meminfo

Keep your eye on the "Dirty:" line.  It's going to rise for a while, and
I'll bet your server performance dives once that reaches 10% of the
total RAM in the server.

Also, turn on "log_checkpoint" in the server configuration.  You'll also
discover there's a drop in performance that begins the minute you see
one of those start.  Performance when a checkpoint is happening is true
server performance; sometimes you get a burst that's much higher outside
of that, but you can't count on that.

The RedHat 4 kernel is so old at this point, I'm not even sure exactly
how to tune it for SSD's.  You really should be running RedHat 6 if you
want to take advantage of disks that fast.

--
Greg Smith   2ndQuadrant US    greg@2ndQuadrant.com   Baltimore, MD
PostgreSQL Training, Services, and 24x7 Support  www.2ndQuadrant.us
"PostgreSQL 9.0 High Performance": http://www.2ndQuadrant.com/books


Re: Degrading PostgreSQL 8.4 write performance

From
"Pierre C"
Date:
>                 Load testing of postgresql 8.4 for OLTP application
> suitability showed that throughput of the
> database significantly degraded over time from thousands of write
> transactions per second to almost zero.

A typical postgres benchmarking gotcha is :

- you start with empty tables
- the benchmark fills them
- query plans which were prepared based on stats of empty (or very small)
tables become totally obsolete when the table sizes grow
- therefore everything becomes very slow as the tables grow

So you should disconnect/reconnect or issue a DISCARD ALL periodically on
each connection, and of course periodically do some VACUUM ANALYZE (or
have autovacuum do that for you).