Re: [BUG?] lag of minRecoveryPont in archive recovery - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: [BUG?] lag of minRecoveryPont in archive recovery
Date
Msg-id 50C6F96A.7060305@vmware.com
Whole thread Raw
In response to Re: [BUG?] lag of minRecoveryPont in archive recovery  (Kyotaro HORIGUCHI <horiguchi.kyotaro@lab.ntt.co.jp>)
Responses Re: [BUG?] lag of minRecoveryPont in archive recovery
List pgsql-hackers
On 11.12.2012 08:07, Kyotaro HORIGUCHI wrote:
>> The cause is that CheckRecoveryConsistency() is called before rm_redo(),
>> as Horiguchi-san suggested upthead. Imagine the case where
>> minRecoveryPoint is set to the location of the XLOG_SMGR_TRUNCATE
>> record. When restarting the server with that minRecoveryPoint,
>> the followings would happen, and then PANIC occurs.
>>
>> 1. XLOG_SMGR_TRUNCATE record is read.
>> 2. CheckRecoveryConsistency() is called, and database is marked as
>>      consistent since we've reached minRecoveryPoint (i.e., the location
>>      of XLOG_SMGR_TRUNCATE).
>> 3. XLOG_SMGR_TRUNCATE record is replayed, and invalid page is
>>      found.
>> 4. Since the database has already been marked as consistent, an invalid
>>      page leads to PANIC.
>
> Exactly.
>
> In smgr_redo, EndRecPtr which is pointing the record next to
> SMGR_TRUNCATE, is used as the new minRecoveryPoint. On the other
> hand, during the second startup of the standby,
> CheckRecoveryConsistency checks for consistency by
> XLByteLE(minRecoveryPoint, EndRecPtr) which should be true at
> just BEFORE the SMGR_TRUNCATE record is applied. So
> reachedConsistency becomes true just before the SMGR_TRUNCATE
> record will be applied. Bang!

Ah, I see. I thought moving CheckRecoveryConsistency was just a
nice-to-have thing, so that we'd notice that we're consistent earlier,
so didn't pay attention to that part.

I think the real issue here is that CheckRecoveryConsistency should not
be comparing minRecoveryPoint against recoveryLastRecPtr, not EndRecPtr
as it currently does. EndRecPtr points to the end of the last record
*read*, while recoveryLastRecPtr points to the end of the last record
*replayed*. The latter is what CheckRecoveryConsistency should be
concerned with.

BTW, it occurs to me that we have two variables in the shared struct
that are almost the same thing: recoveryLastRecPtr and replayEndRecPtr.
The former points to the end of the last record successfully replayed,
while replayEndRecPtr points to the end of the last record successfully
replayed, or begin replayed at the moment. I find that confusing, so I
suggest that we rename recoveryLastRecPtr to make that more clear.
Included in the attached patch.

- Heikki

Attachment

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: allowing multiple PQclear() calls
Next
From: Jeevan Chalke
Date:
Subject: Re: too much pgbench init output