On Τετ 20 Ιουν 2012 07:08:09 Craig Ringer wrote:
> On 06/19/2012 05:17 PM, Achilleas Mantzios wrote:
> > We had another corruption incident on the very same machine, this time in
> > the jboss subsystem (a "jar cvf" produced corrupted .jar). IMHO this
> > means faulty RAM/disk.
> > If that is true, then i guess HW sanity checks are even more important
> > than SW upgrades.
>
> ... and a lot more difficult :S
>
> Log monitoring is often the most imporant part - monitoring for NMIs and
> other hardware notifications, checking the kernel log for odd issues or
> reports of unexpected segfaults from userspace programs, etc.
>
That's right, we have written a whole framework for this, but there are always cases
which escape our attention.
> --
> Craig Ringer
-
Achilleas Mantzios
IT DEPT