Re: Allow WAL information to recover corrupted pg_controldata - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Allow WAL information to recover corrupted pg_controldata
Date
Msg-id 000001cd4f8c$054a65f0$0fdf31d0$@kapila@huawei.com
Whole thread Raw
In response to Re: Allow WAL information to recover corrupted pg_controldata  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
>>> The reason I'm concerned about selecting a next-LSN that's certainly beyond every LSN in the database is that not
doing 
>>> so could result in introducing further corruption, which would be entirely avoidable with more care in choosing the

>>> next-LSN.

>> The further corruption can only be possible when we replay some wrong
>> WAL by selecting wrong LSN.

> No, this is mistaken.  Pages in the database that have LSN ahead of
> where the server thinks the end of WAL is cause lots of problems
> unrelated to replay; for example, inability to complete a checkpoint.
> That might not directly lead to additional corruption, but consider
> the case where such a page gets further modified, and the server decides
> it doesn't need to create a full-page image because the LSN is ahead of
> where the last checkpoint was.  A crash or two later, you have new
> problems.

Incase any modification happen to the database after it started, even if the next-LSN is max LSN of pages,
the modification can create a problem because the database will be in inconsistent state.

Please correct me if I am wrong in assuming that the next-LSN having value as max LSN of pages
1. has nothing to do with Replay. We should still reset the WAL so that no replay happens.
2. It is to avoid some further disasters.

With Regards,
Amit Kapila.





pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: [PATCH 08/16] Introduce the ApplyCache module which can reassemble transactions from a stream of interspersed changes
Next
From: Florian Pflug
Date:
Subject: Re: sortsupport for text