Re: [9.3 bug] disk space in pg_xlog increases during archive recovery - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Date
Msg-id 52DEE8A7.9020404@vmware.com
Whole thread Raw
In response to Re: [9.3 bug] disk space in pg_xlog increases during archive recovery  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: [9.3 bug] disk space in pg_xlog increases during archive recovery  (Fujii Masao <masao.fujii@gmail.com>)
Re: [9.3 bug] disk space in pg_xlog increases during archive recovery  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On 01/21/2014 07:31 PM, Fujii Masao wrote:
> On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
>> From: "Fujii Masao" <masao.fujii@gmail.com>
>>
>>> !     if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
>>>
>>> Even when standby_mode is not enabled, we can use cascade replication and
>>> it needs the accumulated WAL files. So I think that
>>> AllowCascadeReplication()
>>> should be added into this condition.
>>>
>>> !       snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
>>> !       XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
>>> !
>>> !       if (restoredFromArchive)
>>>
>>> Don't we need to check !StandbyModeRequested and
>>> !AllowCascadeReplication()
>>> here?
>>
>> Oh, you are correct.  Okay, done.
>
> Thanks! The patch looks good to me. Attached is the updated version of
> the patch. I added the comments.

Sorry for reacting so slowly, but I'm not sure I like this patch. It's a 
quite useful property that all the WAL files that are needed for 
recovery are copied into pg_xlog, even when restoring from archive, even 
when not doing cascading replication. It guarantees that you can restart 
the standby, even if the connection to the archive is lost for some 
reason. I intentionally changed the behavior for archive recovery too, 
when it was introduced for cascading replication. Also, I think it's 
good that the behavior does not depend on whether cascading replication 
is enabled - it's a quite subtle difference.

So, IMHO this is not a bug, it's a feature.

To solve the original problem of running out of disk space in archive 
recovery, I wonder if we should perform restartpoints more aggressively. 
We intentionally don't trigger restatpoings by checkpoint_segments, only 
checkpoint_timeout, but I wonder if there should be an option for that. 
MauMau, did you try simply reducing checkpoint_timeout, while doing 
recovery?

- Heikki



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Funny representation in pg_stat_statements.query.
Next
From: Tom Lane
Date:
Subject: Re: Hard limit on WAL space used (because PANIC sucks)