Re: Partitioned checkpointing - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Partitioned checkpointing
Date
Msg-id 20150911162942.GD4996@alap3.anarazel.de
Whole thread Raw
In response to Re: Partitioned checkpointing  (Tomas Vondra <tomas.vondra@2ndquadrant.com>)
Responses Re: Partitioned checkpointing  (Takashi Horikawa <t-horikawa@aj.jp.nec.com>)
List pgsql-hackers
Hi,

Partitioned checkpoint have the significant disadvantage that it
increases random write io by the number of passes. Which is a bad idea,
*especially* on SSDs.

> >So we'd need logic like this
> >1. Run through shared buffers and analyze the files contained in there
> >2. Assign files to one of N batches so we can make N roughly equal sized
> >mini-checkpoints
> >3. Make N passes through shared buffers, writing out files assigned to
> >each batch as we go

That's essentially what Fabien's sorting patch does by sorting all
writes.

> What I think might work better is actually keeping the write/fsync phases we
> have now, but instead of postponing the fsyncs until the next checkpoint we
> might spread them after the writes. So with target=0.5 we'd do the writes in
> the first half, then the fsyncs in the other half. Of course, we should sort
> the data like you propose, and issue the fsyncs in the same order (so that
> the OS has time to write them to the devices).

I think the approach in Fabien's patch of enforcing that there's not
very much dirty data to flush by forcing early cache flushes is
better. Having gigabytes worth of dirty data in the OS page cache can
have massive negative impact completely independent of fsyncs.

> I wonder how much the original paper (written in 1996) is effectively
> obsoleted by spread checkpoints, but the benchmark results posted by
> Horikawa-san suggest there's a possible gain. But perhaps partitioning the
> checkpoints is not the best approach?

I think it's likely that the patch will have only a very small effect if
applied ontop of Fabien's patch (which'll require some massaging I'm
sure). 

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: Partitioned checkpointing
Next
From: YUriy Zhuravlev
Date:
Subject: Re: Move PinBuffer and UnpinBuffer to atomics