Wes wrote:
> On a nightly basis, we shut the database down and do a file system backup.
>
> A short chronology of our database problem:
[snip]
> Question:
>
> How can we tell that a database is intact? In the above example, pg_dumpall
> worked on the 8/21 database. Did it become corrupt between 8/21 and 8/23,
> or was it already corrupt and got worse? Pg_dumpall tells you nothing about
> the condition of indexes. Could a corrupt index corrupt data blocks?
>
> I'm looking at doing a pg_dumpall on a weekly basis so that we have a point
> in time where we know we have a recoverable database. When the database
> reaches several hundred GB and over over a billion rows, this isn't a great
> solution, and doesn't address the overall database integrity.
>
> Back to the original question... How can I verify the complete integrity of
> a database - especially a very large one where a reload or full index
> rebuild could take on the order of days?
You shouldn't have to verify anything. PG's job is to never corrupt your
data, and providing your hardware is good it should do so. If you are
getting problems almost daily that would suggest a RAM/disk problem to
me (sig 11 usually implies RAM). Can't guarantee it's not PG but it's
record of reliability is pretty good.
Steps I'd take:
1. Check your version number against the release notes and see if you
should upgrade. You don't mention your version, but it's always worth
having the last dot-release (7.2.5, 7.3.7, 7.4.5)
2. Schedule time to run memory/disk tests against your hardware. Finding
48 hours might not be easy, but you need to know where you stand.
3. Setup slony or some other replication so I can schedule my downtime.
--
Richard Huxton
Archonet Ltd