Serious Crash last Friday - Mailing list pgsql-general

From Henrik Steffen
Subject Serious Crash last Friday
Date
Msg-id 025001c215ca$4fe9e1a0$7100a8c0@topconcepts.net
Whole thread Raw
Responses Re: Serious Crash last Friday  (Martijn van Oosterhout <kleptog@svana.org>)
List pgsql-general
Hello all,

on Friday we experienced a very very worrying crash of our postgresql
server.

Our system runs on a Intel Pentium IV, 1.6 GHz, 1 GB RAM with latest
Postgresql-Server
7.2.1 (redhat rpm) - on rather heavy load

There are 109 user tables in one database, the largest tables contain 60
columns and
approx. 400.000 rows. it's a dedicated database-only machine.

Well, the crash was indicated as follows: One of my employees complained
that she couldn't
work anymore (via webinterface). The error-message was due to an error in
the
employee-table. This particular table has a unique row for employee-numbers.
Suddenly
there were 11 entries for the same employee. Even my name was included
twice, and
another employee still working on friday afternoon was also included 3
times. Note:
This was a table with a UNIQUE KEY - this shouldn't be possible IMHO.

Taking a closer look, I found additional tables, with non-unique values in
UNIQUE columns.

When trying to delete unique values by using the OIDs, I found out, that
even the OIDs
were the same!!!! Taking a yet closer look, I found out by querying
pg_tables that
there were duplicates of some tables. Then there was the message: "Backend
message type
0x44 arrived while idle"

I was running VACUUM and VACUUM FULL a hundred times - but it failed to
repair these
errors. It didn't even succeed in running VACUUM on all tables: VACUUM
complained something
about "UNIQUE" (I didn't write down the exact error message though).

Then I tried to DUMP as much as I could, then I stopped the database, moved
the db-folder to
a different location, did a new initdb and restored the whole system.
Unfortunately
there was one table I couldn't dump at all and I had to use the 15 hours old
backup copy.

But, please correct me if I am wrong, this should never actually happen,
shouldn't it?

Anyone had any of these problems before? I will see if this happens again -
and if it
does I will have to think about using a different backend-server. I'll don't
have to
explain to you, that a database server that corrupts data, is completely
useless.


Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com          Tel. +49 4141 991230
mail: steffen@topconcepts.com       Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline:  +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------



pgsql-general by date:

Previous
From: tony
Date:
Subject: Re: advocacy
Next
From: Martijn van Oosterhout
Date:
Subject: Re: Serious Crash last Friday