Re: Backend Crash - Mailing list pgsql-hackers

From Gregory Stark
Subject Re: Backend Crash
Date
Msg-id 87lkgppod5.fsf@oxford.xeocode.com
Whole thread Raw
In response to Re: Backend Crash  (Harvell F <fharvell@file13.info>)
List pgsql-hackers
"Harvell F" <fharvell@file13.info> writes:

>   Just as a follow up, it turns out that our fiberchannel RAID was power cycled
> while the systems were up and running.  There are  several write errors in the
> postgresql log.
>
>   Now I'm off to try to recover the data...

That's still a problem, it indicates either a bug in Postgres or -- sadly more
likely -- a problem with your hardware or system software setup. In a working
system Postgres guarantees that a situation like that will result in
transactions failing to commit (either with errors or freezing), not corrupted
data. Data once committed should never be lost.

In order for this to happen something in your software and hardware setup must
be caching writes then hiding the errors from Postgres. For instance systems
where fsync lies and reports success before it has written the data to disk
can result in silently corrupted data on any power outage or system crash. 

Could you send the write errors? Or at least the first page or so of them?
And check the system logs at that time for any lower-level errors as well.

What kind of drives are in the fibrechannel RAID? Are they SCSI, PATA, or
SATA? Can you check their configuration at all or does the RAID hide all that
from you? Does the RAID have a battery backed cache?

--  Gregory Stark EnterpriseDB          http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Gregory Stark
Date:
Subject: Re: Background LRU Writer/free list
Next
From: Tom Lane
Date:
Subject: Re: Background LRU Writer/free list