Thread: [HACKERS] Phantom segment upon promotion causing troubles.

[HACKERS] Phantom segment upon promotion causing troubles.

From
Andres Freund
Date:
Hi,

Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
weird enough to be interesting.  What he'd observed was that he promoted
some PITR standby, and early clones of that node work, but later clones
did not, failing to read some segment.

The problems turns out to be the following:  When a node is promoted at
a segment boundary, just after an XLOG_SWITCH record we'll haveEndOfLog = EndRecPtr;
pointing to the *beginning* of the next segment, as XLOG_SWITCH records
are treated as using the whole segment.  After creating the
END_OF_RECOVERY record (or checkpoint), we'll do:
if (ArchiveRecoveryRequested){    /*     * We switched to a new timeline. Clean up segments on the old     * timeline.
  *     * If there are any higher-numbered segments on the old timeline,     * remove them. They might contain valid
WAL,but they might also be     * pre-allocated files containing garbage. In any case, they are not     * part of the
newtimeline's history so we don't need them.     */    RemoveNonParentXlogFiles(EndOfLog, ThisTimeLineID);
 

note that this uses EndOfLog, pointing to ab/cd000000 (i.e. the
beginning of a record). RemoveNonParentXlogFiles calls
RemoveNonParentXlogFiles() which in turn uses RemoveXlogFile() to remove
superflous files.  That's where the fun begins.

static void
RemoveXlogFile(const char *segname, XLogRecPtr PriorRedoPtr, XLogRecPtr endptr)
{XLogSegNo    endlogSegNo;XLogSegNo    recycleSegNo;
...
#define XLByteToPrevSeg(xlrp, logSegNo) \logSegNo = ((xlrp) - 1) / XLogSegSize
...XLByteToPrevSeg(endptr, endlogSegNo);if (PriorRedoPtr == InvalidXLogRecPtr)    recycleSegNo = endlogSegNo + 10;else
 recycleSegNo = XLOGfileslop(PriorRedoPtr);
 
...    InstallXLogFileSegment(&endlogSegNo, path,                           true, recycleSegNo, true))
...

So what happens here is that we're calling InstallXLogFileSegment() to
remove superflous xlog files (e.g. because they're before the recovery
target, because restore command ran before the trigger file was detected
or because walsender received them).  But because endptr = ab/cd000000,
the use of XLByteToPrevSeg() means InstallXLogFileSegment() will be
called with the *previous* segment's segment number.

That in turn will lead to InstallXLogFileSegment() installing the
to-be-removed segment into the current timeline, but into a segment from
one *before* the creation of new timeline, for the purpose of recycling
the segment.  I'll call this the "phantom" segment, which has no
meaningful content and lives on a timeline which does not yet exist.

As there's no .ready file created for that segment, and we'll never
actually write to it, it'll initially just sit around. Not visible for
archiving, and normally unused by wal streaming.  But that changes at
later checkpoints, because, via RemoveOldXlogFiles()'s
XLogArchiveCheckDone() checks we:
/** XLogArchiveCheckDone*
...* If <XLOG>.done exists, then return true; else if <XLOG>.ready exists,* then return false; else create <XLOG>.ready
andreturn false.** The reason we do things this way is so that if the original attempt to* create <XLOG>.ready fails,
we'llretry during subsequent checkpoints.
 

So we'll at some later point create a .ready for the above created
phantom segment. Which then will get archived.


At that point we're in trouble.  If any standbys of that promoted node
catch up after that fact (or new ones are created from older base
backups), after the phantom segment has been archived, and
restore_command is set, recovery will fail.  The reason for that is that
one commonly will have recovery_target_timeline = latest (or the new
timeline) set.  And XLogFileReadAnyTLI() is pretty simplistic. When
restoring a segment it'll simply probe all timelines, starting from the
newest. Which means that, once archived, our phantom segment will "hide"
the actual segment from the source timeline.  Because it's not parseable
(it's at a different segment, thus parsing decide it's unusable),
recovery will hang at that point.

Which means quick standbys catch up, slow ones are "dead". It's
"fixable" by creating a restore_command which filters that phantom
segment, or deleting the segment from the archive.


The minimal fix here is presumably not to use XLByteToPrevSeg() in
RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
serves here - I don't think it's ever needed.  Normally it's harmless
because InstallXLogFileSegment() checks where it could install the file
to, but that doesn't work around timeline bumps, triggering the problem
at hand.  This seems to be very longstanding behaviour, I'm not sure
where it's originating from (hard to track due to code movement).


There seems to be a larger question ehre though: Why does
XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
at that period?  That seems like a bad idea, especially in more
complicated scenarios where some precursor timeline might live for
longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
which timeline a segment ought to come from, based on the historY?

Comments?

Greetings,

Andres Freund



Re: [HACKERS] Phantom segment upon promotion causing troubles.

From
Andres Freund
Date:
Hi,

On 2017-06-19 00:30:26 -0700, Andres Freund wrote:
> There seems to be a larger question ehre though: Why does
> XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> at that period?  That seems like a bad idea, especially in more
> complicated scenarios where some precursor timeline might live for
> longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
> which timeline a segment ought to come from, based on the historY?

One thing that I blamed first, before debunking it, is that after
promotion we do:
/* * Preallocate additional log files, if wanted. */PreallocXlogFiles(EndOfLog);

where EndOfLog points to the last replayed record, rather than last
record(s).  I think that's currently harmless, but it's certainly
fragile.  Given the uselessness of PreallocXlogFiles() calls, I'm
inclined to just remove it here...

- Andres



Re: [HACKERS] Phantom segment upon promotion causing troubles.

From
Heikki Linnakangas
Date:
On 06/19/2017 10:30 AM, Andres Freund wrote:
> Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
> weird enough to be interesting.  What he'd observed was that he promoted
> some PITR standby, and early clones of that node work, but later clones
> did not, failing to read some segment.
>
> The problems turns out to be the following:  [explanation]

Good detective work!

> The minimal fix here is presumably not to use XLByteToPrevSeg() in
> RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
> serves here - I don't think it's ever needed.

Agreed, I don't see a reason for it either.

> There seems to be a larger question ehre though: Why does
> XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> at that period?  That seems like a bad idea, especially in more
> complicated scenarios where some precursor timeline might live for
> longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
> which timeline a segment ought to come from, based on the historY?

Yeah. I've had that thought for years as well, but there has never been 
any pressing reason to bite the bullet and rewrite it, so I haven't 
gotten around to it.

- Heikki




Re: [HACKERS] Phantom segment upon promotion causing troubles.

From
Andres Freund
Date:
Hi,

On 2017-06-20 16:11:32 +0300, Heikki Linnakangas wrote:
> On 06/19/2017 10:30 AM, Andres Freund wrote:
> > Greg Burek from Heroku (CCed) reported a weird issue on IM, that was
> > weird enough to be interesting.  What he'd observed was that he promoted
> > some PITR standby, and early clones of that node work, but later clones
> > did not, failing to read some segment.
> > 
> > The problems turns out to be the following:  [explanation]
> 
> Good detective work!

Thanks.


> > The minimal fix here is presumably not to use XLByteToPrevSeg() in
> > RemoveXlogFile(), but XLByteToSeg(). I don't quite see what purpose it
> > serves here - I don't think it's ever needed.
> 
> Agreed, I don't see a reason for it either.

Pushed. And found like three other things while investigating :/


> > There seems to be a larger question ehre though: Why does
> > XLogFileReadAnyTLI() probe all timelines even if they weren't a parent
> > at that period?  That seems like a bad idea, especially in more
> > complicated scenarios where some precursor timeline might live for
> > longer than it was a parent?  ISTM XLogFileReadAnyTLI() should check
> > which timeline a segment ought to come from, based on the historY?
> 
> Yeah. I've had that thought for years as well, but there has never been any
> pressing reason to bite the bullet and rewrite it, so I haven't gotten
> around to it.

Heh.  Still seems like something we should tackle - but it'd not be
urgent enough to backpatch, so it doesn't quite seem like something to
tackle *just now* :/

- Andres