Re: Load Distributed Checkpoints, take 3 - Mailing list pgsql-patches

From Heikki Linnakangas
Subject Re: Load Distributed Checkpoints, take 3
Date
Msg-id 467C16BE.3080809@enterprisedb.com
Whole thread Raw
In response to Re: Load Distributed Checkpoints, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Load Distributed Checkpoints, take 3  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-patches
Tom Lane wrote:
> Maybe I misread the patch, but I thought that if someone requested an
> immediate checkpoint, the checkpoint-in-progress would effectively flip
> to immediate mode.  So that could be handled by offering an immediate vs
> extended checkpoint option in pg_start_backup.  I'm not sure it's a
> problem though, since as previously noted you probably want
> pg_start_backup to be noninvasive.  Also, one could do a manual
> CHECKPOINT command then immediately pg_start_backup if one wanted
> as-fast-as-possible (CHECKPOINT requests immediate checkpoint, right?)

Yeah, that's possible.

>> and recovery would need to process on average 1.5 as much WAL as before.
>> Though with LDC, you should get away with shorter checkpoint intervals
>> than before, because the checkpoints aren't as invasive.
>
> No, you still want a pretty long checkpoint interval, because of the
> increase in WAL traffic due to more page images being dumped when the
> interval is short.
>
>> If we do that, we should remove bgwriter_all_* settings. They wouldn't
>> do much because we would have checkpoint running all the time, writing
>> out dirty pages.
>
> Yeah, I'm not sure that we've thought through the interactions with the
> existing bgwriter behavior.

I searched the archives a bit for the discussions when the current
bgwriter settings were born, and found this thread:

http://archives.postgresql.org/pgsql-hackers/2004-12/msg00784.php

The idea of Load Distributed Checkpoints certainly isn't new :).

Ok, if we approach this from the idea that there will be *no* GUC
variables at all to control this, and we remove the bgwriter_all_*
settings as well, does anyone see a reason why that would be bad? Here's
the ones mentioned this far:

1. we need to keep 2x as much WAL segments around as before.

2. pg_start_backup will need to wait for a long time.

3. Recovery will take longer, because the distance last committed redo
ptr will lag behind more.

1. and 3. can be alleviated by using a smaller
checkpoint_timeout/segments though as you pointed out that leads to
higher WAL traffic. 2. is not a big deal, and we can add an 'immediate'
parameter to pg_start_backup if necessary.

--
   Heikki Linnakangas
   EnterpriseDB   http://www.enterprisedb.com

pgsql-patches by date:

Previous
From: Magnus Hagander
Date:
Subject: Re: Preliminary GSSAPI Patches
Next
From: Greg Smith
Date:
Subject: Re: Load Distributed Checkpoints, take 3