Re: Load distributed checkpoint - Mailing list pgsql-hackers

From Inaam Rana
Subject Re: Load distributed checkpoint
Date
Msg-id 833c669b0612080417r5c3fefbaja6b63857c8ce2890@mail.gmail.com
Whole thread Raw
In response to Re: Load distributed checkpoint  (Ron Mayer <rm_pg@cheapcomplexdevices.com>)
List pgsql-hackers


On 12/7/06, Ron Mayer <rm_pg@cheapcomplexdevices.com> wrote:
Takayuki Tsunakawa wrote:
> Hello, Itagaki-san
>> Checkpoint consists of the following four steps, and the major
>> performance
>> problem is 2nd step. All dirty buffers are written without interval
>> in it.
>> 1. Query information (REDO pointer, next XID etc.)
>> 2. Write dirty pages in buffer pool
>> 3. Flush all modified files
>> 4. Update control file
>
> Hmm. Isn't it possible that step 3 affects the performance greatly?
> I'm sorry if you have already identified step 2 as disturbing
> backends.
>
> As you know, PostgreSQL does not transfer the data to disk when
> write()ing. Actual transfer occurs when fsync()ing at checkpoints,
> unless the filesystem cache runs short. So, disk is overworked at
> fsync()s.

It seems to me that virtual memory settings of the OS will determine
if step 2 or step 3 causes much of the actual disk I/O.

In particular, on Linux, things like /proc/sys/vm/dirty_expire_centisecs

dirty_expire_centisecs will have little, if any, effect on a box with consistent workload. Under uniform load bgwriter will keep pushing the buffers to fs cache which will result in eviction/flushing of pages to disk. That the pages will age quickly can lower the cap of dirty pages but it won't/can't handle sudden spike at checkpoint time.

and dirty_writeback_centisecs

Again on a system that encounters IO chokes on checkpoints pdflush is presumably working like crazy at that time. Reducing the gap between its wakeup calls will have probably very little impact on the checkpoint performance.

and possibly dirty_background_ratio

I have seen this to put a real cap on number of dirty pages during normal running.  As regards checkpoints, this again seems to have little effect.

The problem while dealing with checkpoints is that we are dealing with two starkly different type of IO loads. The larger the number of shared_buffers the greater the spike in IO activity at checkpoint. AFAICS no specific vm tunables can smooth out checkpoint spikes by itself. There has to be some intelligence in the bgwriter to even the load out.

would affect this.  If those numbers are high, ISTM most write()s
from step 2 would wait for the flush in step 3.  If I understand
correctly, if the dirty_expire_centisecs number is low, most write()s
from step 2 would happen before step 3 because of the pdflush daemons.
I expect other OS's would have different but similar knobs to tune this.

It seems to me that the most portable way postgresql could force
the I/O to be balanced would be to insert otherwise unnecessary
fsync()s into step 2; but that it might (not sure why) be better
to handle this through OS-specific tuning outside of postgres.

---------------------------(end of broadcast)---------------------------
TIP 2: Don't 'kill -9' the postmaster

pgsql-hackers by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: EXPLAIN ANALYZE
Next
From: "Kevin Grittner"
Date:
Subject: Re: Load distributed checkpoint