Re: 9.2.3 crashes during archive recovery - Mailing list pgsql-hackers

From Heikki Linnakangas
Subject Re: 9.2.3 crashes during archive recovery
Date
Msg-id 511BE725.8040207@vmware.com
Whole thread Raw
In response to Re: 9.2.3 crashes during archive recovery  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: 9.2.3 crashes during archive recovery
List pgsql-hackers
On 13.02.2013 21:03, Tom Lane wrote:
> Simon Riggs<simon@2ndQuadrant.com>  writes:
>> On 13 February 2013 09:04, Heikki Linnakangas<hlinnakangas@vmware.com>  wrote:
>>> To be precise, we'd need to update the control file on every XLogFlush(),
>>> like we do during archive recovery. That would indeed be unacceptable from a
>>> performance point of view. Updating the control file that often would also
>>> be bad for robustness.
>
>> If those arguments make sense, then why don't they apply to recovery as well?
>
> In plain old crash recovery, don't the checks on whether to apply WAL
> records based on LSN take care of this?

The problem we're trying to solve is determining how much WAL needs to 
be replayed until the database is consistent again. In crash recovery, 
the answer is "all of it". That's why the CRC in the WAL is essential; 
it's required to determine where the WAL ends. But if we had some other 
mechanism, like if we updated minRecoveryPoint after every XLogFlush() 
like Simon suggested, we wouldn't necessarily need the CRC to detect end 
of WAL (not that I'd suggest removing it anyway), and we could throw an 
error if there is corrupt bit somewhere in the WAL before the true end 
of WAL.

In archive recovery, we can't just say "replay all the WAL", because the 
whole idea of PITR is to not recover all the WAL. So we use 
minRecoveryPoint to keep track of how far the WAL needs to be replayed 
at a minimum, for the database to be consistent.

- Heikki



pgsql-hackers by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: 9.2.3 crashes during archive recovery
Next
From: Tom Lane
Date:
Subject: Re: 9.2.3 crashes during archive recovery