Re: Total crash of my db-server - Mailing list pgsql-general

From Jan Weerts
Subject Re: Total crash of my db-server
Date
Msg-id B349BABAF9A92F4D9FBFCADF8D5FEDD50810D9@ivsrv03.i-views.de
Whole thread Raw
In response to Re: Total crash of my db-server  ("Henrik Steffen" <steffen@city-map.de>)
List pgsql-general

Hi Steffen!

>the whole computer crashes.
>
>it'S mostly during dumpalls (backup) and/or vacuuming
>or reindexing...

From my experience with two different machines: We had this behaviour on two servers under linux. Both of them were running postgres, but database load did not necessarily coincide with a dead system.

After some problems we found out, that both cases could be solved with different RAM configurations. The first machine was two years in use and suddenly started to reboot during the day (not after hours). We suspected an attack or broken hard drives. In the end we changed the RAM and since then it is happily humming in its rack.

The second machine was brand-new and we wanted to put one gig of ram in two dimm sockets. The machine was set up and postgres installed. When we started to test the database and load the system we got kernel panics or a totally unresponsive machine. In the end after a lot of testing we removed one of the RAM modules and since then it is running with just half a gig (which suffices for the application we will be using it for). Different software based RAM tests showed varying results on each run, not reproducable. We suspect the chipset to be broken in this respect despite its claimed ability to use these modules.

So my guess here is, that since postgres is not running as root, it cannot really "destroy" the kernel or anything vital. For this kind of breakdown I usually blame Windows, but since this is Linux, I really do suspect the hardware. Even if you are not experiencing this the first time as you said in another post. Are the other machines loaded (cpu and ram) by other applications or only postgres? If only postgres, try some other ram and cpu consuming app and load the machine heavily.

HTH
  Jan
p.s.: we once had a temp, who we supect to have zapped two ram
modules and two mainboards in just one month. And since the
first case proves aging of ram, I am prepared to blame hardware
in some cases.

pgsql-general by date:

Previous
From: Justin Clift
Date:
Subject: Re: Total crash of my db-server
Next
From: Robert
Date:
Subject: How to access deleted but not vacuum'ed tuples?