On 6/11/07, Vladimir Stankovic <V.Stankovic@city.ac.uk> wrote:
> Hi all,
>
> It seems that I have an issue with the performance of a PostgreSQL server.
>
> I'm running write-intensive, TPC-C like tests. The workload consist of
> 150 to 200 thousand transactions. The performance varies dramatically,
> between 5 and more than 9 hours (I don't have the exact figure for the
> longest experiment). Initially the server is relatively fast. It
> finishes the first batch of 50k transactions in an hour. This is
> probably due to the fact that the database is RAM-resident during this
> interval. As soon as the database grows bigger than the RAM the
> performance, not surprisingly, degrades, because of the slow disks.
> My problem is that the performance is rather variable, and to me
> non-deterministic. A 150k test can finish in approx. 3h30mins but
> conversely it can take more than 5h to complete.
> Preferably I would like to see *steady-state* performance (where my
> interpretation of the steady-state is that the average
> throughput/response time does not change over time). Is the steady-state
> achievable despite the MVCC and the inherent non-determinism between
> experiments? What could be the reasons for the variable performance?
> - misconfiguration of the PG parameters (e.g. autovacuum does not cope
> with the dead tuples on the MVCC architecture)
> - file fragmentation
> - index bloat
> - ???
vmstat is telling you that the server is i/o bound. an iostat will
tell be helpful to tell you where things are binding up...either the
data volume, wal volume, or both. I suspect your sorts are spilling
to disk which is likely the cause of the variable performance,
interacting with autovacuum. Another possibility is vacuum is bogging
you down. look for pg_tmp folders inside the database tree to see if
this is happening. Also you want to see if your server is swapping.
first, I'd suggest bumping maintenance_work_mem to 256mb. I'd also
suggest bumping work_mem higher, but you are going to have to
calculate how far to go based on how many active queries with sort are
going to fire simultaneously. It can be a fine line because your a
bit underpowered memory but your database is small as well. bumping
work_mem but throwing your server into swap solves nothing.
merlin