Re: Checkpointer on hot standby runs without looking checkpoint_segments - Mailing list pgsql-hackers

From Robert Haas
Subject Re: Checkpointer on hot standby runs without looking checkpoint_segments
Date
Msg-id CA+Tgmoa8kTT0JLs1FQ7C43VbkboEFJnOsJRTBbgdm5XRLiFZkA@mail.gmail.com
Whole thread Raw
In response to Re: Checkpointer on hot standby runs without looking checkpoint_segments  (Florian Pflug <fgp@phlo.org>)
List pgsql-hackers
On Fri, Jun 8, 2012 at 1:01 PM, Florian Pflug <fgp@phlo.org> wrote:
> On Jun8, 2012, at 15:47 , Robert Haas wrote:
>> On Fri, Jun 8, 2012 at 5:02 AM, Simon Riggs <simon@2ndquadrant.com> wrote:
>>> On 8 June 2012 09:14, Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp> wrote:
>>>
>>>> The requirement for this patch is as follows.
>>>>
>>>> - What I want to get is similarity of the behaviors between
>>>>  master and (hot-)standby concerning checkpoint
>>>>  progression. Specifically, checkpoints for streaming
>>>>  replication running at the speed governed with
>>>>  checkpoint_segments. The work of this patch is avoiding to get
>>>>  unexpectedly large number of WAL segments stay on standby
>>>>  side. (Plus, increasing the chance to skip recovery-end
>>>>  checkpoint by my another patch.)
>>>
>>> Since we want wal_keep_segments number of WAL files on master (and
>>> because of cascading, on standby also), I don't see any purpose to
>>> triggering more frequent checkpoints just so we can hit a magic number
>>> that is most often set wrong.
>>
>> This is a good point.  Right now, if you set checkpoint_segments to a
>> large value, we retain lots of old WAL segments even when the system
>> is idle (cf. XLOGfileslop).  I think we could be smarter about that.
>> I'm not sure what the exact algorithm should be, but right now users
>> are forced between setting checkpoint_segments very large to achieve
>> optimum write performance and setting it small to conserve disk space.
>> What would be much better, IMHO, is if the number of retained
>> segments could ratchet down when the system is idle, eventually
>> reaching a state where we keep only one segment beyond the one
>> currently in use.
>
> I'm a bit sceptical about this. It seems to me that you wouldn't actually
> be able to do anything useful with the conserved space, since postgres
> could re-claim it at any time. At which point it'd better be available,
> or your whole cluster comes to a screeching halt...

Well, the issue for me is elasticity.  Right now we ship with
checkpoint_segments=3.  That causes terribly performance on many
real-world workloads.  But say we ship with checkpoint_segments = 100,
which is a far better setting from a performance point of view.  Then
pg_xlog space utilization will eventually grow to more than 3 GB, even
on a low-velocity system where they don't improve performance.  I'm
not sure whether it's useful for the number of checkpoint segments to
vary dramatically on a single system, but I do think it would be very
nice if we could ship with a less conservative default without eating
up so much disk space.  Maybe there's a better way of going about
that, but I agree with Simon's point that the setting is often wrong.
Frequently it's too low; sometimes it's too high; occasionally it's
got both problems simultaneously.  If you have another idea on how to
improve this, I'm all ears.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Avoiding adjacent checkpoint records
Next
From: Robert Haas
Date:
Subject: Re: log_newpage header comment