[Tom Lane]:
>
> Kjetil Torgrim Homme <kjetilho@ifi.uio.no> writes:
> > indeed interesting. IMHO 0x0fff sounds more like a write to -1
> > (relative to the next page) than a random memory error.
>
> Note though that that offset is only special with respect to
> locations on disk; it was not a special address in memory. I'm
> wondering about transient flakiness in your disk drive controller
> causing it to sometimes return bad data. Have you run any
> read/write disk tests?
well, the hardware RAID1 controller claims the disks are healthy, but
of course it would. I was unable to find any software to do such
testing, the closest I got was badblocks(8), but it only wants to work
on a block device.
I hacked good old bonnie to write blocks of a random byte, then read
them back and see if they match. I'll run a few such processes
concurrently, we'll see if it suffices to trigger (and discover) any
hardware bugs.
--
Kjetil T.