Home > mailing lists

Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load - Mailing list pgsql-bugs

From	Simon Riggs
Subject	Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
Date	January 8, 2013 22:49:01
Msg-id	CA+U5nMKdf7odZzYNnoRkkCZmJpGEy=OQbU9Nan_zva_Rtzi2vw@mail.gmail.com Whole thread Raw
In response to	BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load (briank@openroadtech.com)
Responses	Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
List	pgsql-bugs

Tree view

On 8 January 2013 19:24,  <briank@openroadtech.com> wrote:

> Simply stated, pg_xlog grows out of control on a streaming-replication
> backup server with a high volume of writes on the master server. This occurs
> only with checkpoint_completion_target>0 and very large (eg. 8GB)
> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me).

All of this appears to be working as designed.

It will issue a restartpoint every checkpoint_timeout seconds on the standby.

checkpoint_segments is ignored on standby.

> Detail: I have two new/fast servers using streaming replication, with 9.2.2
> from the PostgreSQL yum repository. The servers are connected via 10G
> network, so the failover receives data as fast as it is created.
>
> It appears that to trigger this problem you need large shared_buffers (mine
> is set to 8GB) and checkpoint_completion_target > 0. (.8 or .9 will do). I
> have checkpoint_timeout=5min. (though setting this to 30s *does not*
> alleviate the problem).  The failover machine doesn't seem to be concerned
> with checkpoint_timeout or checkpoint_segments.
>
> When doing large writes (like pg_restore) the failover machine's pg_xlog
> does not clear files. The reason appears to be that it is not
> checkpointing(verified by log_checkpoints=on and examining the log).  If I
> run 'checkpoint;' manually on the failover in psql(hot_standby=on), it
> immediately cleans up pg_xlog(after a pause to actually perform the
> checkpoint).
>
> To generate the data,you can just run
> pgbench -i -s1000
> on the master server, then watch pg_xlog grow on the failover.
>
> This produces a ~10GB of data, then creates indices.  If I run du on the
> failover after the data is loaded, I see:
>
> # du -s base pg_xlog | sort -n
> 10928276        pg_xlog
> 13144536        base
>
> I have a 10GB pg_xlog.  The load completes in less than 5 minutes, which may
> be relevant here, since eventually the failover will perform a checkpoint.
> On the primary server, pg_xlog caps at 1.2GB.
>
> If checkpoint_completion_target=0, or shared_buffers=256MB, pg_xlog grows to
> around 1GB and stays there.
>
> This guess may be off target, but it appears that the database only
> checkpoints every approximately checkpoint_completion_target*
> checkpoint_timeout seconds.  So, for a .9 completion target and the default
> 5min timeout, the failover will go for 4+ minutes without executing a single
> checkpoint, regardless of how much data arrives (verified with
> log_checkpoints=on).
>
> All the while the master server is spitting out 'checkpoints are occurring
> too frequently' messages, which I assume is to be expected during a reload.
>
> This means that the failover machine needs 2x the database size to not crash
> during a database reload.  We run VMs with limited disk allocation where
> this is not the case.  We periodically need to dump/restore, and I'm
> concerned this simple operation will crash our failover machines.
>
> I can avoid the problem by setting checkpoint_completion_target=0, which I
> will do for now.
>
> Here are settings that differ from the default initdb install:
>
> shared_buffers = 8GB
> work_mem = 64MB
> maintenance_work_mem = 256MB
> max_stack_depth = 8MB
> wal_level = hot_standby
> synchronous_commit = off
> wal_buffers = 64MB
> checkpoint_segments = 10
> checkpoint_completion_target = 0
> max_wal_senders = 2
> hot_standby = on
> effective_cache_size = 8GB
> log_checkpoints = on


--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-bugs by date:

From: Scott Mead
Date: 08 January 2013, 22:47:45
Subject: Re: BUG #7800: Welcome email with login ifnormation NOT received

From: Scott Mead
Date: 08 January 2013, 22:50:31
Subject: Re: BUG #7800: Welcome email with login ifnormation NOT received

Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load - Mailing list pgsql-bugs

Previous

Next