pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication - Mailing list pgsql-hackers

From RKN Sai Krishna
Subject pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication
Date
Msg-id CAMVpbFNqjXWMt3A4YnA0YggsJRikctBRS9Bo_kJNdZON1VTujw@mail.gmail.com
Whole thread Raw
Responses Re: pg_page_repair: a tool/extension to repair corrupted pages in postgres with streaming/physical replication
List pgsql-hackers

Hi,

Problem: Today when a data page is corrupted in the primary postgres with physical replication (sync or async standbys), there seems to be no way to repair it easily and we rely on PITR to recreate the postgres server or drop the corrupted table (of course this is not an option for important customer tables, but may be okay for some maintenance or temporary tables). PITR is costly to do in a production environment oftentimes as it involves creation of the full-blown postgres from the base backup and causing downtime for the customers.

Solution: Make use of the uncorrupted page present in sync or async standby. The proposed tool/extension pg_page_repair (as we call it) can fetch the uncorrupted page from sync or async standby and overwrite the corrupted one on the primary. Yes, there will be a challenge in making sure that the WAL is replayed completely and standby is up-to-date so that we are sure that stale pages are not copied across. A simpler idea could be that the pg_page_repair can wait until the standby replays/catches up with the primary's flush LSN before fetching the uncorrupted page. A downside of this approach is that the pg_page_repair waits for long or rather infinitely if the replication lag is huge. As we all know that the replication lag is something a good postgres solution will always monitor to keep it low, if true, the pg_page_repair is guaranteed to not wait for longer. Another idea could be that the pg_page_repair gets the base page from the standby and applies all the WAL records pertaining to the corrupted page using the base page to get the uncorrupted page. This requires us to pull the replay logic from the core to pg_page_repair which isn't easy. Hence we propose to go with approach 1, but open to discuss on approach 2 as well. We suppose that the solution proposed in this thread holds good even for pages corresponding to indexes.

Implementation Choices: pg_page_repair can either take the corrupted page info (db id, rel id, block number etc.) or just a relation name and automatically figure out the corrupted page using pg_checksums for instance or just database name and automatically figure out all the corrupted pages. It can either repair the corrupted pages online (only the corrupted table is inaccessible, the server continues to run) or take downtime if there are many corrupted pages.

Future Scope: pg_page_repair can be integrated to the core so that the postgres will repair the pages automatically without manual intervention.

Other Solutions: We did consider an approach where the tool could obtain the FPI from WAL and replay till the latest WAL record to repair the page. But there could be limitations such as FPI and related WAL not being available in primary/archive location.

Thoughts?

Credits (cc-ed): thanks to SATYANARAYANA NARLAPURAM for initial thoughts and thanks to Bharath Rupireddy, Chen Liang, mahendrakar s and Rohan Kumar for internal discussions.

Thanks, RKN

pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Support logical replication of DDLs
Next
From: "Drouvot, Bertrand"
Date:
Subject: Missing reference to pgstat_replslot.c in pgstat.c