On 06/19/2012 05:17 PM, Achilleas Mantzios wrote:
> We had another corruption incident on the very same machine, this time in the jboss subsystem (a "jar cvf" produced
corrupted.jar).
> IMHO this means faulty RAM/disk.
> If that is true, then i guess HW sanity checks are even more important than SW upgrades.
... and a lot more difficult :S
Log monitoring is often the most imporant part - monitoring for NMIs and
other hardware notifications, checking the kernel log for odd issues or
reports of unexpected segfaults from userspace programs, etc.
--
Craig Ringer