On Mon, 2004-12-06 at 23:54, Mark Wong wrote:
> On Mon, Dec 06, 2004 at 11:44:22PM +0000, Simon Riggs wrote:
> > On the graphs... why do the graphs for Proc Utilisation, Index Scans
> > etc, only show first 300 secs of a 3600 sec long run? Are those axes
> > correct? (I understand seeing the ramp-up is important, I just want to
> > check the time axis).
> >
> > What do you think the periodicity is on those graphs that has an order
> > of around 10 secs if that axis is correct?
> >
> > Thats about every 400 transactions. Anybody?
>
> Whoops! Those are supposed to be minutes. The granularity of the
> intervals have always been 1 minute (60 seconds). I wonder why I put
> seconds on the charts... I'll fix that too.
OK...at least the results are starting to be coherent now...
All the graphs now show we have 3 main states:
1. normal running
2. a periodicity of about 7 per 5 mins: ~42 secs
3. a periodicity of about 5 mins
1 gives good performance, most transactions < 2 sec
2 gives mild performance reductions, blocking transactions for ~7 secs
3 gives bad performance reductions, blocking transactions for ~12 secs
We think effect 3 is a checkpoint...
Not sure, as yet, what is causing effect 2. It's not related to the
kernel, but is related to user CPU and I/O waits and effects all tables
in proportion to their overall I/O usage. Some evidence that it becomes
more pronounced as CPU utilisation peaks, possibly increasing slightly
in frequency once this occurs: maybe the cache filling?
Conjecture: effect 2 is caused by insufficient bgwriter activity. The
bgwriter writes less dirty blocks than are being written by users, so
the cache slowly fills with dirty blocks. Then user processes must write
their own dirty blocks to disk before we continue. [Not sure about this,
but conjecture at least covers: wierd periodicity, I/O effect, cache
full effect and the lack of impact of bgwriter parameter changes]
--
Best Regards, Simon Riggs