pgcheck - data integrity check - Mailing list pgsql-hackers

From Rober Mach
Subject pgcheck - data integrity check
Date
Msg-id 46BBDFF8.9080909@gmail.com
Whole thread Raw
Responses Re: pgcheck - data integrity check  (Decibel! <decibel@decibel.org>)
Re: pgcheck - data integrity check  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
Hi,

I am working on a data integrity check tool (pgcheck).
I would like to discuss the following issues:

Right now I am working on a function, which should check the validity of
pages in relation. The relation is passed to the function as its
argument (its oid). For the integrity check of a page, I am using an
AccessShare lock on a relation as you can see on
http://cvs.pgfoundry.org/cgi-bin/cvsweb.cgi/pgcheck/pokus/pgcheck_page/
. In near future, I would like to extend the functionality also with a
recovery tool which would be able to repair broken pages. Should the
function also repair the corrupted data on a page, the AccessShare lock
on a relation would not be sufficient. But on the other hand
AccessExclusive lock on the entire relation could significantly
influence the performance of a database.
So far I see these possibilities:
1- use AccessShare lock for the integrity check function and in case of
faulty page, pass this page to a special function, which would lock the
page using AccessExclusiveLock on the relation for correction.   + it would not influence the performance so much   -
highercomplexity
 
2- use one function which would lock the relation with AccessExclusive
lock, check the integrity of data and in case of faulty pages, it would
repair it at once.   + easier to implement   - could cause performance downturn because of relatively long
Exclusive lock on some relations.
3- use the AccessShare lock for the integrity check and use special
"single-user mode" for recovery of data   + safest option for recovery of data   - long down time of database


Furhter, I would like to know your opinions on what should be checked
next in order to check the integrity of data in database. I am thinking
of checking:
-in case of variable-length data, compare the formal and actual size of data
-check whether constrains applied on items are fulfilled
-compare data in indexes with indexed tables, whether they are correct


Robert

P.S. Any comments to the c-funtion I made so far a welcome



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Unexpected VACUUM FULL failure
Next
From: "Adrian Maier"
Date:
Subject: Re: Compilation of pg 7.4.17 fails on HP-UX