Re: Controlling Load Distributed Checkpoints - Mailing list pgsql-hackers

From Greg Smith
Subject Re: Controlling Load Distributed Checkpoints
Date
Msg-id Pine.GSO.4.64.0706071403560.2676@westnet.com
Whole thread Raw
In response to Re: Controlling Load Distributed Checkpoints  (Heikki Linnakangas <heikki@enterprisedb.com>)
Responses Re: Controlling Load Distributed Checkpoints
Re: Controlling Load Distributed Checkpoints
List pgsql-hackers
On Thu, 7 Jun 2007, Heikki Linnakangas wrote:

> So there's two extreme ways you can use LDC:
> 1. Finish the checkpoint as soon as possible, without disturbing other 
> activity too much
> 2. Disturb other activity as little as possible, as long as the 
> checkpoint finishes in a reasonable time.
> Are both interesting use cases, or is it enough to cater for just one of 
> them? I think 2 is easier to tune.

The motivation for the (1) case is that you've got a system that's 
dirtying the buffer cache very fast in normal use, where even the 
background writer is hard pressed to keep the buffer pool clean.  The 
checkpoint is the most powerful and efficient way to clean up many dirty 
buffers out of such a buffer cache in a short period of time so that 
you're back to having room to work in again.  In that situation, since 
there are many buffers to write out, you'll also be suffering greatly from 
fsync pauses.  Being able to synchronize writes a little better with the 
underlying OS to smooth those out is a huge help.

I'm completely biased because of the workloads I've been dealing with 
recently, but I consider (2) so much easier to tune for that it's barely 
worth worrying about.  If your system is so underloaded that you can let 
the checkpoints take their own sweet time, I'd ask if you have enough 
going on that you're suffering very much from checkpoint performance 
issues anyway.  I'm used to being in a situation where if you don't push 
out checkpoint data as fast as physically possible, you end up fighting 
with the client backends for write bandwidth once the LRU point moves past 
where the checkpoint has written out to already.  I'm not sure how much 
always running the LRU background writer will improve that situation.

> On a Linux system, one way to model it is that the OS flushes dirty buffers 
> to disk at the same rate as we write them, but delayed by 
> dirty_expire_centisecs. That should hold if the writes are spread out enough.

If they're really spread out, sure.  There is congestion avoidance code 
inside the Linux kernel that makes dirty_expire_centisecs not quite work 
the way it is described under load.  All you can say in the general case 
is that when dirty_expire_centisecs has passed, the kernel badly wants to 
write the buffers out as quickly as possible; that could still be many 
seconds after the expiration time on a busy system, or on one with slow 
I/O.

On every system I've ever played with Postgres write performance on, I 
discovered that the memory-based parameters like dirty_background_ratio 
were really driving write behavior, and I almost ignore the expire timeout 
now.  Plotting the "Dirty:" value in /proc/meminfo as you're running tests 
is extremely informative for figuring out what Linux is really doing 
underneath the database writes.

The influence of the congestion code is why I made the comment about 
watching how long writes are taking to gauge how fast you can dump data 
onto the disks.  When you're suffering from one of the congestion 
mechanisms, the initial writes start blocking, even before the fsync. 
That behavior is almost undocumented outside of the relevant kernel source 
code.

--
* Greg Smith gsmith@gregsmith.com http://www.gregsmith.com Baltimore, MD


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Vacuuming anything zeroes shared table stats
Next
From: Alvaro Herrera
Date:
Subject: Re: Vacuuming anything zeroes shared table stats