Bryan Vest <bvest@bright.net> writes:
> Currently the database is at a size of 87.6Gig. A Vacuum Analyze runs every
> night and has been taking 4 or 5 hours to complete. Everything seems to run
> fine for a while, then at some point the load goes through the roof and the
> iowait % also goes way up. It will recover after a little bit and then do the
> same thing all over again. When this happens access to the web based user
> interface slows way down for our customers. Any input for improvements to this
> config would be appreciated, Thanks.
While others have pointed out problems with the config I don't think any of
them explains this irregular behaviour. From what you're describing the
response time is ok most of the time except for these particular bursts?
Do they occur at regular intervals? Is it possible it's just the
checkpointing? Can you see which volumes the i/o traffic is on? Is it on the
local transaction log files or is it on the data files? Does the write i/o
spike upwards or is it just a storm of read i/o? Also, incidentally, Is it
possible you have a cron job running vacuum and don't realize it?
If it happens at irregular intervals then it could be a single bad query
that's causing the problem. One bad query would cause a sequential scan of
your 87G and potentially push out a lot of data from the cache. I imagine this
might also be especially bad with the shared_buffers being out of whack.
You might start by checking the easiest thing first, set
log_min_duration_statement to something high and slowly lower it until it's
printing a handful of queries during the heaviest period.
You could also look for a pgsql_tmp directory that indicate a disk sort is
happening, which would mean some query is trying to sort a lot of data. You
might have to lower sort_mem to a conservative value before you could see that
though.
The pgsql_tmp directory appears (and disappears?) as needed, it's something
like this:
bash-2.05b# ls /var/lib/postgres/data/base/17150/pgsql_tmp
pgsql_tmp22184.0
--
greg