Re: [9.3 bug] disk space in pg_xlog increases during archive recovery - Mailing list pgsql-hackers

From Andres Freund
Subject Re: [9.3 bug] disk space in pg_xlog increases during archive recovery
Date
Msg-id 20140201204414.GA5930@awork2.anarazel.de
Whole thread Raw
In response to Re: [9.3 bug] disk space in pg_xlog increases during archive recovery  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
On 2014-01-21 23:37:43 +0200, Heikki Linnakangas wrote:
> On 01/21/2014 07:31 PM, Fujii Masao wrote:
> >On Fri, Dec 20, 2013 at 9:21 PM, MauMau <maumau307@gmail.com> wrote:
> >>From: "Fujii Masao" <masao.fujii@gmail.com>
> >>
> >>>!     if (source == XLOG_FROM_ARCHIVE && StandbyModeRequested)
> >>>
> >>>Even when standby_mode is not enabled, we can use cascade replication and
> >>>it needs the accumulated WAL files. So I think that
> >>>AllowCascadeReplication()
> >>>should be added into this condition.
> >>>
> >>>!       snprintf(recoveryPath, MAXPGPATH, XLOGDIR "/RECOVERYXLOG");
> >>>!       XLogFilePath(xlogpath, ThisTimeLineID, endLogSegNo);
> >>>!
> >>>!       if (restoredFromArchive)
> >>>
> >>>Don't we need to check !StandbyModeRequested and
> >>>!AllowCascadeReplication()
> >>>here?
> >>
> >>Oh, you are correct.  Okay, done.
> >
> >Thanks! The patch looks good to me. Attached is the updated version of
> >the patch. I added the comments.
> 
> Sorry for reacting so slowly, but I'm not sure I like this patch. It's a
> quite useful property that all the WAL files that are needed for recovery
> are copied into pg_xlog, even when restoring from archive, even when not
> doing cascading replication. It guarantees that you can restart the standby,
> even if the connection to the archive is lost for some reason. I
> intentionally changed the behavior for archive recovery too, when it was
> introduced for cascading replication. Also, I think it's good that the
> behavior does not depend on whether cascading replication is enabled - it's
> a quite subtle difference.
> 
> So, IMHO this is not a bug, it's a feature.

Very much seconded. With the old behaviour it's possible to get into the
situation that you cannot get your standby productive anymore if the
archive server died. Since the archive server is often the primary
that's imo unacceptable.

I'd even go as far as saying the previous behaviour is a bug.

> To solve the original problem of running out of disk space in archive
> recovery, I wonder if we should perform restartpoints more aggressively. We
> intentionally don't trigger restatpoings by checkpoint_segments, only
> checkpoint_timeout, but I wonder if there should be an option for that.
> MauMau, did you try simply reducing checkpoint_timeout, while doing
> recovery?

Hm, don't we actually do cause trigger restartpoints based on checkpoint
segments?

static int
XLogPageRead(XLogReaderState *xlogreader, XLogRecPtr targetPagePtr, int reqLen,            XLogRecPtr targetRecPtr,
char*readBuf, TimeLineID *readTLI)
 
{
...
   if (readFile >= 0 && !XLByteInSeg(targetPagePtr, readSegNo))   {       /*        * Request a restartpoint if we've
replayedtoo much xlog since the        * last one.        */       if (StandbyModeRequested && bgwriterLaunched)
{          if (XLogCheckpointNeeded(readSegNo))           {               (void) GetRedoRecPtr();               if
(XLogCheckpointNeeded(readSegNo))                  RequestCheckpoint(CHECKPOINT_CAUSE_XLOG);           }       }
 
...

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: install libpq.dll in bin directory on Windows / Cygwin
Next
From: Andres Freund
Date:
Subject: Re: [9.3 bug] disk space in pg_xlog increases during archive recovery