Frank Wittig <fw@weisshuhn.de> writes:
> The problem is that the slave server stops checkpointing after some
> hours of working (about 24 to 48 hours of conitued log replay).
Hm ... look at RecoveryRestartPoint() in xlog.c. Could there be
something wrong with this logic?
/*
* Do nothing if the elapsed time since the last restartpoint is less than
* half of checkpoint_timeout. (We use a value less than
* checkpoint_timeout so that variations in the timing of checkpoints on
* the master, or speed of transmission of WAL segments to a slave, won't
* make the slave skip a restartpoint once it's synced with the master.)
* Checking true elapsed time keeps us from doing restartpoints too often
* while rapidly scanning large amounts of WAL.
*/
elapsed_secs = time(NULL) - ControlFile->time;
if (elapsed_secs < CheckPointTimeout / 2)
return;
The idea is that the slave (once in sync with the master) ought to
checkpoint every time it sees a checkpoint record in the master's
output. I'm not seeing a flaw but maybe there is one here, or somewhere
nearby. Are you sure the master is checkpointing?
regards, tom lane