Thread: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
From
briank@openroadtech.com
Date:
The following bug has been logged on the website: Bug reference: 7801 Logged by: Brian Krznarich Email address: briank@openroadtech.com PostgreSQL version: 9.2.2 Operating system: RHEL6 Description: = I was told to resubmit this- the original submission (7777) may have been lost. Simply stated, pg_xlog grows out of control on a streaming-replication backup server with a high volume of writes on the master server. This occurs only with checkpoint_completion_target>0 and very large (eg. 8GB) shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me). = Detail: I have two new/fast servers using streaming replication, with 9.2.2 from the PostgreSQL yum repository. The servers are connected via 10G network, so the failover receives data as fast as it is created. It appears that to trigger this problem you need large shared_buffers (mine is set to 8GB) and checkpoint_completion_target > 0. (.8 or .9 will do). I have checkpoint_timeout=3D5min. (though setting this to 30s *does not* alleviate the problem). The failover machine doesn't seem to be concerned with checkpoint_timeout or checkpoint_segments. When doing large writes (like pg_restore) the failover machine's pg_xlog does not clear files. The reason appears to be that it is not checkpointing(verified by log_checkpoints=3Don and examining the log). If I run 'checkpoint;' manually on the failover in psql(hot_standby=3Don), it immediately cleans up pg_xlog(after a pause to actually perform the checkpoint). To generate the data,you can just run = pgbench -i -s1000 on the master server, then watch pg_xlog grow on the failover. This produces a ~10GB of data, then creates indices. If I run du on the failover after the data is loaded, I see: = # du -s base pg_xlog | sort -n 10928276 pg_xlog 13144536 base I have a 10GB pg_xlog. The load completes in less than 5 minutes, which may be relevant here, since eventually the failover will perform a checkpoint. = On the primary server, pg_xlog caps at 1.2GB. = If checkpoint_completion_target=3D0, or shared_buffers=3D256MB, pg_xlog gro= ws to around 1GB and stays there. This guess may be off target, but it appears that the database only checkpoints every approximately checkpoint_completion_target* checkpoint_timeout seconds. So, for a .9 completion target and the default 5min timeout, the failover will go for 4+ minutes without executing a single checkpoint, regardless of how much data arrives (verified with log_checkpoints=3Don). = All the while the master server is spitting out 'checkpoints are occurring too frequently' messages, which I assume is to be expected during a reload. This means that the failover machine needs 2x the database size to not crash during a database reload. We run VMs with limited disk allocation where this is not the case. We periodically need to dump/restore, and I'm concerned this simple operation will crash our failover machines. I can avoid the problem by setting checkpoint_completion_target=3D0, which I will do for now. Here are settings that differ from the default initdb install: shared_buffers =3D 8GB = work_mem =3D 64MB = maintenance_work_mem =3D 256MB max_stack_depth =3D 8MB = wal_level =3D hot_standby = synchronous_commit =3D off = wal_buffers =3D 64MB = checkpoint_segments =3D 10 = checkpoint_completion_target =3D 0 = max_wal_senders =3D 2 = hot_standby =3D on = effective_cache_size =3D 8GB log_checkpoints =3D on
Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
From
Simon Riggs
Date:
On 8 January 2013 19:24, <briank@openroadtech.com> wrote: > Simply stated, pg_xlog grows out of control on a streaming-replication > backup server with a high volume of writes on the master server. This occurs > only with checkpoint_completion_target>0 and very large (eg. 8GB) > shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me). All of this appears to be working as designed. It will issue a restartpoint every checkpoint_timeout seconds on the standby. checkpoint_segments is ignored on standby. > Detail: I have two new/fast servers using streaming replication, with 9.2.2 > from the PostgreSQL yum repository. The servers are connected via 10G > network, so the failover receives data as fast as it is created. > > It appears that to trigger this problem you need large shared_buffers (mine > is set to 8GB) and checkpoint_completion_target > 0. (.8 or .9 will do). I > have checkpoint_timeout=5min. (though setting this to 30s *does not* > alleviate the problem). The failover machine doesn't seem to be concerned > with checkpoint_timeout or checkpoint_segments. > > When doing large writes (like pg_restore) the failover machine's pg_xlog > does not clear files. The reason appears to be that it is not > checkpointing(verified by log_checkpoints=on and examining the log). If I > run 'checkpoint;' manually on the failover in psql(hot_standby=on), it > immediately cleans up pg_xlog(after a pause to actually perform the > checkpoint). > > To generate the data,you can just run > pgbench -i -s1000 > on the master server, then watch pg_xlog grow on the failover. > > This produces a ~10GB of data, then creates indices. If I run du on the > failover after the data is loaded, I see: > > # du -s base pg_xlog | sort -n > 10928276 pg_xlog > 13144536 base > > I have a 10GB pg_xlog. The load completes in less than 5 minutes, which may > be relevant here, since eventually the failover will perform a checkpoint. > On the primary server, pg_xlog caps at 1.2GB. > > If checkpoint_completion_target=0, or shared_buffers=256MB, pg_xlog grows to > around 1GB and stays there. > > This guess may be off target, but it appears that the database only > checkpoints every approximately checkpoint_completion_target* > checkpoint_timeout seconds. So, for a .9 completion target and the default > 5min timeout, the failover will go for 4+ minutes without executing a single > checkpoint, regardless of how much data arrives (verified with > log_checkpoints=on). > > All the while the master server is spitting out 'checkpoints are occurring > too frequently' messages, which I assume is to be expected during a reload. > > This means that the failover machine needs 2x the database size to not crash > during a database reload. We run VMs with limited disk allocation where > this is not the case. We periodically need to dump/restore, and I'm > concerned this simple operation will crash our failover machines. > > I can avoid the problem by setting checkpoint_completion_target=0, which I > will do for now. > > Here are settings that differ from the default initdb install: > > shared_buffers = 8GB > work_mem = 64MB > maintenance_work_mem = 256MB > max_stack_depth = 8MB > wal_level = hot_standby > synchronous_commit = off > wal_buffers = 64MB > checkpoint_segments = 10 > checkpoint_completion_target = 0 > max_wal_senders = 2 > hot_standby = on > effective_cache_size = 8GB > log_checkpoints = on -- Simon Riggs http://www.2ndQuadrant.com/ PostgreSQL Development, 24x7 Support, Training & Services
Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
From
"Krznarich, Brian"
Date:
On 1/8/2013 2:48 PM, Simon Riggs wrote: > On 8 January 2013 19:24, <briank@openroadtech.com> wrote: > >> Simply stated, pg_xlog grows out of control on a streaming-replication >> backup server with a high volume of writes on the master server. This oc= curs >> only with checkpoint_completion_target>0 and very large (eg. 8GB) >> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me). > All of this appears to be working as designed. > > It will issue a restartpoint every checkpoint_timeout seconds on the stan= dby. > > checkpoint_segments is ignored on standby. The documentation does not seem to agree with the last point. "In standby mode, a restartpoint is also triggered if=20 checkpoint_segments log segments have been replayed since last=20 restartpoint and at least one checkpoint record has been replayed." This is precisely the problem. The failover should not go=20 checkpoint_timeout*checkpoint_completion_target seconds without=20 executing a restartpoint, in spite of the fact that thousands of WAL=20 segments are stacking up in pg_xlog. With checkpoint_completion_target=3D0, the standby server will happily=20 execute restartpoints much faster than checkpoint_timeout if it is=20 necessary. Once checkpoint_completion_target>0, no attention is paid=20 to the backlog of WAL data. I honestly do not understand postgresql well enough to understand why=20 large vs. small shared_buffers changes this behavior, but it does. If=20 shared_buffers is not extremely large, it seems postgresql is forced to=20 execute restartpoints more frequently? In general it seems like it should be safe to use the same=20 postgresql.conf on the master and the standby server, but this would=20 clearly be an exception. One wouldn't expect a 10GB pg_xlog on a=20 standby where the master has no such problem. Thank you for your assistance. Brian=