Home > mailing lists

Thread: constant crashing hardware issue and thank you TAKE AWAY

constant crashing hardware issue and thank you TAKE AWAY

From

jack

Date:

17 April 2024, 13:06:31

I discovered that one of the memory sticks in the machine was damaged.

Running memtest86 on the machine generated many RAM errors.

This was causing the strange bi-polar errors in postgresql.

The hardware technician explained that he sees this often and that there is no one cause for such problems.

As I am not a hardware specialist, I never thought that RAM could cause such problems.

I always assumed that the OS (ubuntu or windows) would advise me if there was ever an issue with memory.

TAKE AWAY:
As a result of this I will be checking the RAM on all my machines once a month or the moment a machine starts to act strange.

Thanks again to all who helped with this issue.

Re: constant crashing hardware issue and thank you TAKE AWAY

From

Madalin Ignisca

Date:

17 April 2024, 13:08:33

That kind of support for “damaged ram” you have it with ECC memory on CPU’s that support it.

XEON cpus for example.

On 17 Apr 2024, at 15:06, jack <jack4pg@a7q.com> wrote:

uld advise me if there was ever an issue with me

Re: constant crashing hardware issue and thank you TAKE AWAY

From

Justin Clift

Date:

17 April 2024, 16:19:17

On 2024-04-17 23:06, jack wrote:
<snip>
> As a result of this I will be checking the RAM on all my machines once
> a month or the moment a machine starts to act strange.

Once a month is overkill, and unlikely to be useful. :)

With server or enterprise grade hardware, it'll support "ECC" memory.

That has extra memory chips + supporting circuity on the memory board
so it can detect + correct most errors which happen without them causing
problems.

For the errors that it can't *correct*, it'll still generate warnings
to your system software to let you know (if you've configured it).

If you do get such a warning - or if the system starts acting funny like
you saw - that's when you'd want to run memtest on the system.

---

The other time to run memtest on the system is when you first buy or
receive a new server.  You'd generally do a "burn in" test of all the
things (memory, hard disks/ssds, cpu, gpu, etc) just to make sure
everything is ok before you start using it for important stuff.

Regards and best wishes,

Justin Clift