Re: FSM corruption leading to errors - Mailing list pgsql-hackers

From Pavan Deolasee
Subject Re: FSM corruption leading to errors
Date
Msg-id CABOikdNKB706ASToj5jowqzQaVK6OvUsqPfw7i6VpKjGuRBYmQ@mail.gmail.com
Whole thread Raw
In response to Re: FSM corruption leading to errors  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: FSM corruption leading to errors  (Michael Paquier <michael.paquier@gmail.com>)
List pgsql-hackers

On Wed, Oct 19, 2016 at 6:54 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:


Can we document an existing procedure for repairing FSM corruption?
(VACUUM, maybe?) 

I'm afraid it may not be easy to repair the corruption with existing facilities. Most often the corruption will be on the standby and a VACUUM may not actually touch affected pages on the master (because they may not even exists on the master or skipped because of visibility maps). It may not even trigger relation truncation. So AFAICS it may not generate any WAL activity that can fix the corruption on the standby.

One possible way would be to delete the FSM (and VM) information on the master and standby and then run VACUUM so these maps are rebuilt. We obviously don't need to do this for all tables, but we need a way to find the tables with corrupt FSM [1].

Suggested procedure could be:

1. Upgrade master and standby to the latest minor release (which involves restart)
2. Install pg_freespace extension and run the query [1] on master to find possible corruption cases. The query checks if FSM reports free space in a block outside the size of the relation. Unfortunately, we might have false positives if the relation is extended while the query is running.
3. Repeat the same query on standby (if it's running in Hot standby mode, otherwise the corruption can only be detected once it's promoted to be a master)
4. Remove FSM and VM files for the affected tables (I don't think if it's safe to do this on a running server)
5. VACUUM affected tables so that FSM and VM is rebuilt.

Another idea is to implement a pg_freespace_repair() function in pg_freespace which takes an AccessExclusiveLock on the table and truncates it to it's current size, thus generating a WAL record that the standby will replay to fix the corruption. This probably looks more promising, easy to explain and less error prone.


[1]  SELECT *
  FROM (
        SELECT oid::regclass as relname, EXISTS (
                SELECT *
                  FROM (
                        SELECT blkno, pg_freespace(oid::regclass, blkno)
                          FROM generate_series(pg_relation_size(oid::regclass)/ current_setting('block_size')::bigint, pg_relation_size(oid::regclass, 'fsm') / 2) as blkno
                       ) as avail
                 WHERE pg_freespace > 0
               ) as corrupt_fsm
          FROM pg_class
         WHERE relkind = 'r'
       ) b
 WHERE b.corrupt_fsm = true;


Thanks,
Pavan

--
 Pavan Deolasee                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Parallel bitmap heap scan
Next
From: Michael Paquier
Date:
Subject: Re: Password identifiers, protocol aging and SCRAM protocol