strange pgbench results (as if blocked at the end) - Mailing list pgsql-performance

From Tomas Vondra
Subject strange pgbench results (as if blocked at the end)
Date
Msg-id 2f9ba8ed6e4ed11d9a6d236cb4d2a2ec.squirrel@sq.gransy.com
Whole thread Raw
Responses Re: strange pgbench results (as if blocked at the end)
Re: strange pgbench results (as if blocked at the end)
List pgsql-performance
Hi,

I've run a lot of pgbench tests recently (trying to compare various fs,
block sizes etc.), and I've noticed several really strange results.

Eeach benchmark consists of three simple steps:

1) set-up the database
2) read-only run (10 clients, 5 minutes)
3) read-write run (10 clients, 5 minutes)

with a short read-only warm-up (1 client, 1 minute) before each run.

I've run nearly 200 of these, and in about 10 cases I got something that
looks like this:

http://www.fuzzy.cz/tmp/pgbench/tps.png
http://www.fuzzy.cz/tmp/pgbench/latency.png

i.e. it runs just fine for about 3:40 and then something goes wrong. The
bench should take 5:00 minutes, but it somehow locks, does nothing for
about 2 minutes and then  all the clients end at the same time. So instead
of 5 minutes the run actually takes about 6:40.

The question is what went wrong - AFAIK there's nothing else running on
the machine that could cause this. I'm looking for possible culprits -
I'll try to repeat this run and see if it happens again.

The pgbench log is available here (notice the 10 lines at the end, those
are the 10 blocked clients) along with the postgres.log

http://www.fuzzy.cz/tmp/pgbench/pgbench.log.gz
http://www.fuzzy.cz/tmp/pgbench/pg.log

Ignore the "immediate shutdown request" warning (once the benchmark is
over, I don't need it anymore. Besides that there's just a bunch of
"pgstat wait timeout" warnings (which makes sense, because the pgbench run
does a lot of I/O).

I'd understand a slowdown, but why does it block?

I'm using PostgreSQL 9.0.4, the machine has 2GB of RAM and 1GB of shared
buffers. I admit the machine might be configured a bit differently (e.g.
smaller shared buffers) but I've seen just about 10 such strange results
out of 200 runs, so I doubt this is the cause.

I was thinking about something like autovacuum, but I'd expect that to
happen much more frequently (same config, same workload, etc.). And it
happens with just some file systems.

For example for ext3/writeback, the STDDEV(latency) looks like this
(x-axis represents PostgreSQL block size, y-axis fs block size):

  http://www.fuzzy.cz/tmp/pgbench/ext3-writeback.png

while for ext4/journal:

  http://www.fuzzy.cz/tmp/pgbench/ext4-journal.png

thanks
Tomas


pgsql-performance by date:

Previous
From: Stephen Frost
Date:
Subject: Re: Recommended optimisations slows down PostgreSQL 8.4
Next
From: hyelluas
Date:
Subject: How to see memory usage using explain analyze ?