Re: io storm on checkpoints, postgresql 8.2.4, linux - Mailing list pgsql-performance

From Dmitry Potapov
Subject Re: io storm on checkpoints, postgresql 8.2.4, linux
Date
Msg-id 878c83960708230737j6103e177h8f23d5bd4676828c@mail.gmail.com
Whole thread Raw
In response to Re: io storm on checkpoints, postgresql 8.2.4, linux  (Greg Smith <gsmith@gregsmith.com>)
Responses Re: io storm on checkpoints, postgresql 8.2.4, linux  (Greg Smith <gsmith@gregsmith.com>)
List pgsql-performance
2007/8/23, Greg Smith <gsmith@gregsmith.com>:
On Wed, 22 Aug 2007, Dmitry Potapov wrote:

If you do end up following up with this via the Linux kernel mailing list,
please pass that link along.  I've been meaning to submit it to them and
wait for the flood of e-mail telling me what I screwed up, that will go
better if you tell them about it instead of me.

I'm planning to do so, but before I need to take a look at postgresql source and dev documentation to find how exactly IO is done, to be able to explain the issue to linux kernel people.  That will take some time, I'll post a link here when I'm done.


> looks to me as an elegant solution. Is there some other way to fix this
> issue without disabling pagecache and the IO smoothing it was designed
> to perform?

I spent a couple of months trying and decided it was impossible.  Your
analysis of the issue is completely accurate; lowering
dirty_background_ratio to 0 makes the system much less efficient, but it's
the only way to make the stalls go completely away.

By the way, does postgresql has a similar stall problem on freebsd/other OS'es? It would be interesting to study their approach to io smoothing if it doesn't.

I contributed some help toward fixing the issue in the upcoming 8.3
instead; there's a new checkpoint writing process aimed to ease the exact
problem you're running into there, see the new
checkpoint_completion_target tunable at
http://developer.postgresql.org/pgdocs/postgres/wal-configuration.html

If you could figure out how to run some tests to see if the problem clears
up for you using the new technique, that would be valuable feedback for
the development team for the upcoming 8.3 beta.  Probably more productive
use of your time than going crazy trying to fix the issue in 8.2.4.
We have a tool here to record and replay the exact workload we have on a real production system, the only problem is getting a spare 16Gb box. I can get a server with 8Gb ram and nearly same storage setup for testing purposes. I hope it will be able to carry the production load, so I can compare 8.2.4 and 8.3devel on the same box, in the same situation. Is there any other changes in 8.3devel that can affect the results of such test? I didn't really follow 8.3 development process :(

--
Regards,
            Dmitry

pgsql-performance by date:

Previous
From: "Merlin Moncure"
Date:
Subject: asynchronous commit feature
Next
From: "Campbell, Lance"
Date:
Subject: Installing PostgreSQL