Re: Detecting corrupted pages earlier - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: Detecting corrupted pages earlier |
Date | |
Msg-id | 24613.1048883254@sss.pgh.pa.us Whole thread Raw |
In response to | Re: Detecting corrupted pages earlier (Hiroshi Inoue <Inoue@tpf.co.jp>) |
Responses |
Re: Detecting corrupted pages earlier
|
List | pgsql-hackers |
Hiroshi Inoue <Inoue@tpf.co.jp> writes: > How about adding a new option to skip corrupted pages ? I have committed changes to implement checking for damaged page headers, along the lines of last month's discussion. It includes a GUC variable to control the response as suggested by Hiroshi. Given the number of data-corruption reports we've seen recently, I am more than half tempted to commit the change into the 7.3.* branch too. However the GUC variable makes it seem a little like a new feature. I could backpatch the whole change, or backpatch it without the GUC variable (so the only possible response is elog(ERROR)), or leave well enough alone. Any votes? The patch is pretty small; I include the non-boilerplate parts below. regards, tom lane *** /home/tgl/pgsql/src/backend/storage/buffer/bufmgr.c.orig Tue Mar 25 09:06:11 2003 --- /home/tgl/pgsql/src/backend/storage/buffer/bufmgr.c Fri Mar 28 11:54:48 2003 *************** *** 59,64 **** --- 60,69 ---- (*((XLogRecPtr*) MAKE_PTR((bufHdr)->data))) + /* GUC variable */ + bool zero_damaged_pages = false; + + static void WaitIO(BufferDesc *buf); static void StartBufferIO(BufferDesc *buf, bool forInput); static void TerminateBufferIO(BufferDesc*buf); *************** *** 217,222 **** --- 222,241 ---- { status = smgrread(DEFAULT_SMGR, reln, blockNum, (char *) MAKE_PTR(bufHdr->data)); + /* check for garbage data */ + if (status == SM_SUCCESS && + !PageHeaderIsValid((PageHeader) MAKE_PTR(bufHdr->data))) + { + if (zero_damaged_pages) + { + elog(WARNING, "Invalid page header in block %u of %s; zeroing out page", + blockNum, RelationGetRelationName(reln)); + MemSet((char *) MAKE_PTR(bufHdr->data), 0, BLCKSZ); + } + else + elog(ERROR, "Invalid page header in block %u of %s", + blockNum, RelationGetRelationName(reln)); + } } if (isLocalBuf) *** /home/tgl/pgsql/src/backend/storage/page/bufpage.c.orig Tue Mar 25 09:06:12 2003 --- /home/tgl/pgsql/src/backend/storage/page/bufpage.c Fri Mar 28 11:38:43 2003 *************** *** 48,53 **** --- 46,96 ---- } + /* + * PageHeaderIsValid + * Check that the header fields of a page appear valid. + * + * This is called when a page has just been read in from disk. The idea is + * to cheaply detect trashed pages before we go nuts following bogus item + * pointers, testing invalid transaction identifiers, etc. + * + * It turns out to be necessary to allow zeroed pages here too. Even though + * this routine is *not* called when deliberately adding a page to a relation, + * there are scenarios in which a zeroed page might be found in a table. + * (Example: a backend extends a relation, then crashes before it can write + * any WAL entry about the new page. The kernel will already have the + * zeroed page in the file, and it will stay that way after restart.) So we + * allow zeroed pages here, and are careful that the page access macros + * treat such a page as empty and without free space. Eventually, VACUUM + * will clean up such a page and make it usable. + */ + bool + PageHeaderIsValid(PageHeader page) + { + char *pagebytes; + int i; + + /* Check normal case */ + if (PageGetPageSize(page) == BLCKSZ && + PageGetPageLayoutVersion(page) == PG_PAGE_LAYOUT_VERSION && + page->pd_lower >= SizeOfPageHeaderData && + page->pd_lower <= page->pd_upper && + page->pd_upper <= page->pd_special && + page->pd_special <= BLCKSZ && + page->pd_special == MAXALIGN(page->pd_special)) + return true; + + /* Check all-zeroes case */ + pagebytes = (char *) page; + for (i = 0; i < BLCKSZ; i++) + { + if (pagebytes[i] != 0) + return false; + } + return true; + } + + /* ---------------- * PageAddItem *
pgsql-hackers by date: