On 1/8/2013 2:48 PM, Simon Riggs wrote:
> On 8 January 2013 19:24, <briank@openroadtech.com> wrote:
>
>> Simply stated, pg_xlog grows out of control on a streaming-replication
>> backup server with a high volume of writes on the master server. This oc=
curs
>> only with checkpoint_completion_target>0 and very large (eg. 8GB)
>> shared_buffers. pg_xlog on the master stays a fixed size (1.2G for me).
> All of this appears to be working as designed.
>
> It will issue a restartpoint every checkpoint_timeout seconds on the stan=
dby.
>
> checkpoint_segments is ignored on standby.
The documentation does not seem to agree with the last point.
"In standby mode, a restartpoint is also triggered if=20
checkpoint_segments log segments have been replayed since last=20
restartpoint and at least one checkpoint record has been replayed."
This is precisely the problem. The failover should not go=20
checkpoint_timeout*checkpoint_completion_target seconds without=20
executing a restartpoint, in spite of the fact that thousands of WAL=20
segments are stacking up in pg_xlog.
With checkpoint_completion_target=3D0, the standby server will happily=20
execute restartpoints much faster than checkpoint_timeout if it is=20
necessary. Once checkpoint_completion_target>0, no attention is paid=20
to the backlog of WAL data.
I honestly do not understand postgresql well enough to understand why=20
large vs. small shared_buffers changes this behavior, but it does. If=20
shared_buffers is not extremely large, it seems postgresql is forced to=20
execute restartpoints more frequently?
In general it seems like it should be safe to use the same=20
postgresql.conf on the master and the standby server, but this would=20
clearly be an exception. One wouldn't expect a 10GB pg_xlog on a=20
standby where the master has no such problem.
Thank you for your assistance.
Brian=