On 01/21/2014 07:31 PM, Fujii Masao wrote:
> On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
>> From: "Fujii Masao" <masao.fujii@gmail.com>
>>
>>> ! if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
>>>
>>> Even when standby_mode is not enabled, we can use cascade replication and
>>> it needs the accumulated WAL files. So I think that
>>> AllowCascadeReplication()
>>> should be added into this condition.
>>>
>>> ! snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
>>> ! XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
>>> !
>>> ! if (restoredFromArchive)
>>>
>>> Don't we need to check !StandbyModeRequested and
>>> !AllowCascadeReplication()
>>> here?
>>
>> Oh, you are correct. Okay, done.
>
> Thanks! The patch looks good to me. Attached is the updated version of
> the patch. I added the comments.
Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
quite useful property that all the WAL files that are needed for
recovery are copied into pg_xlog, even when restoring from archive, even
when not doing cascading replication. It guarantees that you can restart
the standby, even if the connection to the archive is lost for some
reason. I intentionally changed the behavior for archive recovery too,
when it was introduced for cascading replication. Also, I think it's
good that the behavior does not depend on whether cascading replication
is enabled - it's a quite subtle difference.
So, IMHO this is not a bug, it's a feature.
To solve the original problem of running out of disk space in archive
recovery, I wonder if we should perform restartpoints more aggressively.
We intentionally don't trigger restatpoings by checkpoint_segments, only
checkpoint_timeout, but I wonder if there should be an option for that.
MauMau, did you try simply reducing checkpoint_timeout, while doing
recovery?
- Heikki