Tom Lane wrote:
> Christopher Kings-Lynne <chriskl@familyhealth.com.au> writes:
> > Maybe we could avoid removing it until the next checkpoint? Or is that
> > not enough. Maybe it could stay there forever :/
>
> Part of the problem here is that this code has to serve several
> purposes. We have different scenarios to worry about:
>
> * crash recovery from the most recent checkpoint
>
> * PITR replay over a long interval (many checkpoints)
>
> * recovery in the face of a partially corrupt filesystem
>
> It's the last one that is mostly bothering me at the moment. I don't
> want us to throw away data simply because the filesystem forgot an
> inode. Yeah, we might not have enough data in the WAL log to completely
> reconstruct a table, but we should push out what we do have, *not* toss
> it into the bit bucket.
I like the idea tossed out by one of the others the most: create a
"recovery" system tablespace, and use it to resolve issues like this.
The question is: what do you do with the tables in that tablespace once
recovery is complete? Leave them there? That's certainly a possibility
(in fact, it seems the best option, especially now that we're doing
PITR), but it means that the DBA would have to periodically clean up that
tablespace so that it doesn't run out of space during a later recovery.
Actually, it seems to me to be the only option that isn't the equivalent
of throwing away the data...
> In the first case (straight crash recovery) I think it is true that any
> reference to a missing file is a reference to a file that will get
> deleted before recovery finishes. But I don't think that holds for PITR
> (we might be asked to stop short of where the table gets deleted) nor
> for the case where there's been filesystem damage.
But doesn't PITR assume that a full filesystem-level restore of the
database as it was prior to the events in the first event log being
replayed has been done? In that event, wouldn't the PITR process Just
Work?
--
Kevin Brown kevin@sysexperts.com