Re: hanging for 30sec when checkpointing - Mailing list pgsql-admin

From gjm@caledoncard.com (Greg Mennie)
Subject Re: hanging for 30sec when checkpointing
Date
Msg-id a806dcd9.0402110625.3190f48c@posting.google.com
Whole thread Raw
In response to hanging for 30sec when checkpointing  (Shane Wright <me@shanewright.co.uk>)
Responses Re: hanging for 30sec when checkpointing
Re: hanging for 30sec when checkpointing
List pgsql-admin
me@shanewright.co.uk (Shane Wright) wrote in message news:<40202216.4010608@shanewright.co.uk>...
> Hi,
>
> I'm running a reasonable sized (~30Gb) 7.3.4 database on Linux and I'm
> getting some weird performance at times.
>
> When the db is under medium-heavy load, it periodically spawns a
> 'checkpoint subprocess' which runs for between 15 seconds and a minute.
> Ok, fair enough, the only problem is the whole box becomes pretty much
> unresponsive during this time - from what I can gather it's because it
> writes out roughly 1Mb (vmstat says ~1034 blocks) per second until its done.
>
> Other processes can continue to run (e.g. vmstat) but other things do
> not (other queries, mostly running 'ps fax', etc).  So everything gets
> stacked up till the checkpoint finishes and all is well again, untill
> the next time...

I am having a similar problem and this is what I've found so far:

During the checkpoint the volume of data that's written isn't very
high and it goes on for a fairly long time (up to 20 seconds) at a
rate that appears to be well below our disk array's potential.  The
volume of data written is usually 1-5 MB/sec on an array that we've
tested to sustain over 50 MB/sec  (sequential writes, of course).

It turns out that what's going on is that the command queue for the
RAID array (3Ware RAID card) is filling up during the checkpoint and
is staying at the max (254 commands) for most of the checkpoint.  The
odd lucky insert appears to work, but is extremely slow.  In our case,
the WAL files are on the same array as the data files, so everything
grinds to a halt.

The machine we're running it on is a dual processor box with 2GB RAM.
Since most database read operations are being satisfied from the
cache, reading processes don't seem to be affected during the pauses.

I suspect that increasing the checkpoint frequency could help, since
the burst of commands on the disk channel would be shorter.  (it's
currently 300 seconds)

I have found that the checkpoint after a vacuum is the worst.  This
was the original problem which led to the investigation.

Besides more frequent checkpoints, I am at a loss as to what to do
about this.  Any help would be appreciated.

Thanks,

Greg

pgsql-admin by date:

Previous
From: Naomi Walker
Date:
Subject: Distributed Environment
Next
From: "Paulo Rogerio Zimolo"
Date:
Subject: Dumping from version 7.3.4 to 7.4.1