Re: Partitioned checkpointing - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Partitioned checkpointing
Date
Msg-id CANP8+jKHDrwDD5Qc4dRYo2mNKoeLkTvF7QFDbnh0oiqAfVZ67A@mail.gmail.com
Whole thread Raw
In response to Re: Partitioned checkpointing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Responses Re: Partitioned checkpointing  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Re: Partitioned checkpointing  (Fabien COELHO <coelho@cri.ensmp.fr>)
Re: Partitioned checkpointing  (Takashi Horikawa <t-horikawa@aj.jp.nec.com>)
List pgsql-hackers
On 11 September 2015 at 09:07, Fabien COELHO <coelho@cri.ensmp.fr> wrote:
 
Some general comments :

Thanks for the summary Fabien. 
 
I understand that what this patch does is cutting the checkpoint of buffers in 16 partitions, each addressing 1/16 of buffers, and each with its own wal-log entry, pacing, fsync and so on.

I'm not sure why it would be much better, although I agree that it may have some small positive influence on performance, but I'm afraid it may also degrade performance in some conditions. So I think that maybe a better understanding of why there is a better performance and focus on that could help obtain a more systematic gain.

I think its a good idea to partition the checkpoint, but not doing it this way.

Splitting with N=16 does nothing to guarantee the partitions are equally sized, so there would likely be an imbalance that would reduce the effectiveness of the patch.
 
This method interacts with the current proposal to improve the checkpointer behavior by avoiding random I/Os, but it could be combined.

I'm wondering whether the benefit you see are linked to the file flushing behavior induced by fsyncing more often, in which case it is quite close the "flushing" part of the current "checkpoint continuous flushing" patch, and could be redundant/less efficient that what is done there, especially as test have shown that the effect of flushing is *much* better on sorted buffers.

Another proposal around, suggested by Andres Freund I think, is that checkpoint could fsync files while checkpointing and not wait for the end of the checkpoint. I think that it may also be one of the reason why your patch does bring benefit, but Andres approach would be more systematic, because there would be no need to fsync files several time (basically your patch issues 16 fsync per file). This suggest that the "partitionning" should be done at a lower level, from within the CheckPointBuffers, which would take care of fsyncing files some time after writting buffers to them is finished.

The idea to do a partial pass through shared buffers and only write a fraction of dirty buffers, then fsync them is a good one.

The key point is that we spread out the fsyncs across the whole checkpoint period.

I think we should be writing out all buffers for a particular file in one pass, then issue one fsync per file.  >1 fsyncs per file seems a bad idea.

So we'd need logic like this
1. Run through shared buffers and analyze the files contained in there
2. Assign files to one of N batches so we can make N roughly equal sized mini-checkpoints
3. Make N passes through shared buffers, writing out files assigned to each batch as we go

--
Simon Riggs                http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

pgsql-hackers by date:

Previous
From: Jan Wieck
Date:
Subject: Re: Double linking MemoryContext children
Next
From: Tom Lane
Date:
Subject: Re: Double linking MemoryContext children