Re: Losing records when server hang - Mailing list pgsql-general

From Marco Colombo
Subject Re: Losing records when server hang
Date
Msg-id 4117B1CD.2020500@esi.it
Whole thread Raw
In response to Re: Losing records when server hang  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Tom Lane wrote:
> lec <limec@streamyx.com> writes:
>
>>It's a SCSI, RAID-5 on a Dell server.
>
>
>>The hardware actually "hang". The Dell engineers came and replaced the
>>motherboard but couldn't tell what the actual fault was.
>
>
>>Commit as in 'COMMIT'. 'Records' 1,2,3,4,5,6,7,8,9,10 are actually
>>transactions. I'm as puzzled as to why I lost the transactions in the
>>middle but got the last transaction.
>
>
> I'm puzzled too.  I don't suppose you have the postmaster log from when
> it was trying to recover from the crash?  Or even better, copies of the
> WAL files?
>
> A possible theory has to do with corruption of the WAL log.  For
> instance, transactions 1-10 are all down to disk in WAL (or at least the
> kernel told postgres the writes were done) and for one reason or another
> the buffer manager chances to flush the page containing record 10 out
> to its data file before the other records' pages.  Now the system hangs.
> After reboot, if the WAL log is unreadable beyond transaction 1 then the
> database would come up with transaction 1 replayed, 2-10 not replayed,
> but 10's data is out there anyway.
>
> However this would seem to imply disk drive misfeasance above and beyond
> your motherboard problem.

Well, no. How about this theory:

1) everything is ok:
    the backend executes  write()/fsync() for transactions 1-5

2) hardware fails some how at MB level (imagine CPU/RAM overheating):
    RAM gets corrupted - kernel starts oopsing (but goes on)
    meanwhile, the backend executes write()/fsync() for transactions 6-10,
    but randomly corrupted data gets written to disk.

3) unrecoverable kernel error occurs, the show stops.

On recover, transactions 6-9 don't even look like valid log entries, while
10, for some reason, does (maybe only data is corrupted).

I'm not familiar with the details of WAL files and post-crash recovery,
but is that possible? Or does the process stop at the first failure?

Anyway, if your CPU/RAM is failing, no DB technology can save you. You
need redundant CPU/RAM units to perform the same operations concurrenly,
and the hardware to validate the results on a 2vs1 basis at least.
Ask NASA, I think they know what "mission critical" actually means. :)

Really, when the hardware starts flipping random bits in RAM, you can't
even know how long it's being going on, can be hours w/o the kernel panic
or hang at all. No one knows how good is your data. There's no point in
recovering a transaction if the data inside is corrupted.

.TM.
--
       ____/  ____/   /
      /      /       /            Marco Colombo
     ___/  ___  /   /              Technical Manager
    /          /   /             ESI s.r.l.
  _____/ _____/  _/               Colombo@ESI.it


pgsql-general by date:

Previous
From: Dino Nardini
Date:
Subject: Re: pg jdbc driver
Next
From: "Igor Kryltsov"
Date:
Subject: Postgres update with self join