Re: Fast promotion failure - Mailing list pgsql-hackers

From Simon Riggs
Subject Re: Fast promotion failure
Date
Msg-id CA+U5nM+zJjyA7aKXYT4eMD5npdEfbiqx3nBaAFvmJKLEeytGrQ@mail.gmail.com
Whole thread Raw
In response to Re: Fast promotion failure  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Fast promotion failure  (Simon Riggs <simon@2ndQuadrant.com>)
List pgsql-hackers
On 21 May 2013 07:46, Heikki Linnakangas <hlinnakangas@vmware.com> wrote:
> On 21.05.2013 00:00, Simon Riggs wrote:
>>
>> When we set the new timeline we should be updating all values that
>> might be used elsewhere. If we do that, then no matter when or how we
>> run GetXLogReplayRecPtr, it can't ever get it wrong in any backend.
>>
>> --- a/src/backend/access/transam/xlog.c
>> +++ b/src/backend/access/transam/xlog.c
>> @@ -5838,8 +5838,16 @@ StartupXLOG(void)
>>         }
>>
>>         /* Save the selected TimeLineID in shared memory, too */
>> -       XLogCtl->ThisTimeLineID = ThisTimeLineID;
>> -       XLogCtl->PrevTimeLineID = PrevTimeLineID;
>> +       {
>> +               /* use volatile pointer to prevent code rearrangement */
>> +               volatile XLogCtlData *xlogctl = XLogCtl;
>> +
>> +               SpinLockAcquire(&xlogctl->info_lck);
>> +               XLogCtl->ThisTimeLineID = ThisTimeLineID;
>> +               XLogCtl->lastReplayedTLI = ThisTimeLineID;
>> +               XLogCtl->PrevTimeLineID = PrevTimeLineID;
>> +               SpinLockRelease(&xlogctl->info_lck);
>> +       }
>
>
> Hmm. lastReplayedTLI is supposed to be the timeline of the last record that
> was replayed, ie. the timeline corresponding lastReplayedEndRecPtr. I think
> it would work, but it feels like it could be an easy source of bugs in the
> future.

I'm OK with that principle, as long as we don't touch ThisTimeLineID,
which has been the source of multiple bugs.

So we should set the TLI explicitly before installing, like attached patch.

Otherwise we'd need multiple permanent TLIs which would be overkill.

I feel there are problems because we set the newly selected TLI from
startup process into shared memory, then some time later we set
SharedRecoveryInProgress = false. That timing window isn't good, but I
don't see a different way.

> It might be a good idea to change walsender to not store that in
> ThisTimeLineID, though. It used to make sense for ThisTimeLineID to track
> the current recovery timeline in 9.2 and below, but now that walsender can
> be sending any older timeline, using ThisTimeLineID to store the latest one
> seems confusing.

Agreed, but looks like too much code to touch that lightly.

--
 Simon Riggs                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Move unused buffers to freelist
Next
From: Amit Kapila
Date:
Subject: Re: Move unused buffers to freelist