On Fri, Sep 30, 2011 at 2:09 AM, Fujii Masao <masao.fujii@gmail.com> wrote:
> On Thu, Sep 29, 2011 at 11:12 PM, Florian Pflug <fgp@phlo.org> wrote:
>> On Sep29, 2011, at 13:49 , Simon Riggs wrote:
>>> This worries me slightly now though because the patch makes us PANIC
>>> in a place we didn't used to and once we do that we cannot restart the
>>> server at all. Are we sure we want that? It's certainly a great way to
>>> shake down errors in other code...
>>
>> The patch only introduces a new PANIC condition during archive recovery,
>> though. Crash recovery is unaffected, except that we no longer create
>> restart points before we reach consistency.
>>
>> Also, if we hit an invalid page reference after reaching consistency,
>> the cause is probably either a bug in our recovery code, or (quite unlikely)
>> a corrupted WAL that passed the CRC check. In both cases, the likelyhood
>> of data-corruption seems high, so PANICing seems like the right thing to do.
>
> Fair enough.
>
> We might be able to use FATAL or ERROR instead of PANIC because they
> also cause all processes to exit when the startup process emits them.
> For example, we now use FATAL to stop the server in recovery mode
> when recovery is about to end before we've reached a consistent state.
I think we should issue PANIC if the source is a critical rmgr, or
just WARNING if from a non-critical rmgr, such as indexes.
Ideally, I think we should have a mechanism to allow indexes to be
marked corrupt. For example, a file that if present shows that the
index is corrupt and would be marked not valid. We can then create the
file and send a sinval message to force the index relcache to be
rebuilt showing valid set to false.
--
Simon Riggs http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Training & Services