Re: FSM corruption leading to errors - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: FSM corruption leading to errors
Date
Msg-id CABOikdNXbhebE5JHAN8k4ZCJ6_DR6oOUfz-MavYCZfbP9tdpwQ@mail.gmail.com
Whole thread Raw
In response to Re: FSM corruption leading to errors  (Michael Paquier <michael.paquier@gmail.com>)
Responses Re: FSM corruption leading to errors  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers


On Tue, Oct 11, 2016 at 5:20 AM, Michael Paquier <michael.paquier@gmail.com> wrote:

>
> Once the underlying bug is fixed, I don't see why it should break again. I
> added the above code to mostly deal with already corrupt FSMs. May be we can
> just document and leave it to the user to run some correctness checks (see
> below), especially given that the code is not cheap and adds overheads for
> everybody, irrespective of whether they have or will ever have corrupt FSM.

Yep. I'd leave it for the release notes to hold a diagnostic method.
That's annoying, but this has been done in the past like for the
multixact issues..

I'm okay with that. It's annoying, especially because the bug may show up when your primary is down and you just failed over for HA, only to find that the standby won't work correctly. But I don't have ideas how to fix existing corruption without adding significant penalty to normal path.
 

What if you restart the standby, and then do a diagnostic query?
Wouldn't that be enough? (Something just based on
pg_freespace(pg_relation_size(oid) / block_size) != 0)


Yes, that will enough once the fix is in place.

I think this is a major bug and I would appreciate any ideas to get the patch in a committable shape before the next minor release goes out. We probably need a committer to get interested in this to make progress.

Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Quorum commit for multiple synchronous replication.
Next
From: Ashutosh Bapat
Date:
Subject: Re: postgres_fdw super user checks