Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load - Mailing list pgsql-bugs

From Krznarich, Brian
Subject Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load
Date
Msg-id 4B3A2632C3BFC249BACF6888FE6D25A60144FA@EXCMBX02PAKR.bfusa.com
Whole thread Raw
In response to Re: BUG #7801: Streaming failover checkpoints much slower than master, pg_xlog space problems during db load  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-bugs
On 1/8/2013 2:48 PM, Simon Riggs wrote:
> On 8 January 2013 19:24,  <briank@openroadtech.com> wrote:
>
>> Simply stated, pg_xlog grows out of control on a streaming-replication
>> backup server with a high volume of writes on the master server. This oc=
curs
>> only with checkpoint_completion_target>0 and very large (eg. 8GB)
>> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me).
> All of this appears to be working as designed.
>
> It will issue a restartpoint every checkpoint_timeout seconds on the stan=
dby.
>
> checkpoint_segments is ignored on standby.
The documentation does not seem to agree with the last point.
"In standby mode, a restartpoint is also triggered if=20
checkpoint_segments log segments have been replayed since last=20
restartpoint and at least one checkpoint record has been replayed."

This is precisely the problem.  The failover should not go=20
checkpoint_timeout*checkpoint_completion_target seconds without=20
executing a restartpoint, in spite of the fact that thousands of WAL=20
segments are stacking up in pg_xlog.

With checkpoint_completion_target=3D0, the standby server will happily=20
execute restartpoints much faster than checkpoint_timeout if it is=20
necessary.  Once  checkpoint_completion_target>0, no attention is paid=20
to the backlog of WAL data.

I honestly do not understand postgresql well enough to understand why=20
large vs. small shared_buffers changes this behavior, but it does.   If=20
shared_buffers is not extremely large, it seems postgresql is forced to=20
execute restartpoints more frequently?

In general it seems like it should be safe to use the same=20
postgresql.conf on the master and the standby server, but this would=20
clearly be an exception.   One wouldn't expect a 10GB pg_xlog on a=20
standby where the master has no such problem.

Thank you for your assistance.

Brian=

pgsql-bugs by date:

Previous
From: Scott Mead
Date:
Subject: Re: BUG #7800: Welcome email with login ifnormation NOT received
Next
From: "Kevin Grittner"
Date:
Subject: Re: BUG #7795: Cannot choose UTF-8 encoding for initdb