Re: Serious Crash last Friday - Mailing list pgsql-general

From Martijn van Oosterhout
Subject Re: Serious Crash last Friday
Date
Msg-id 20020617174311.A31157@svana.org
Whole thread Raw
In response to Serious Crash last Friday  ("Henrik Steffen" <steffen@city-map.de>)
List pgsql-general
On Mon, Jun 17, 2002 at 08:43:37AM +0200, Henrik Steffen wrote:
>
> Hello all,
>
> on Friday we experienced a very very worrying crash of our postgresql
> server.

Sound like the CTIDs are out of whack or something. If you're really
desperate you can try the program here, it may be able to dump something.
http://svana.org/kleptog/pgsql/pgfsck.html

> Well, the crash was indicated as follows: One of my employees complained
> that she couldn't
> work anymore (via webinterface). The error-message was due to an error in
> the
> employee-table. This particular table has a unique row for employee-numbers.
> Suddenly
> there were 11 entries for the same employee. Even my name was included
> twice, and
> another employee still working on friday afternoon was also included 3
> times. Note:
> This was a table with a UNIQUE KEY - this shouldn't be possible IMHO.

What DB version is this. Could it be XID wraparound?

> Taking a closer look, I found additional tables, with non-unique values in
> UNIQUE columns.
>
> When trying to delete unique values by using the OIDs, I found out, that
> even the OIDs
> were the same!!!! Taking a yet closer look, I found out by querying
> pg_tables that
> there were duplicates of some tables. Then there was the message: "Backend
> message type
> 0x44 arrived while idle"

Try the CTIDs, they will be unique.

> I was running VACUUM and VACUUM FULL a hundred times - but it failed to
> repair these
> errors. It didn't even succeed in running VACUUM on all tables: VACUUM
> complained something
> about "UNIQUE" (I didn't write down the exact error message though).

Please post the message exactly as printed out.

> Then I tried to DUMP as much as I could, then I stopped the database, moved
> the db-folder to
> a different location, did a new initdb and restored the whole system.
> Unfortunately
> there was one table I couldn't dump at all and I had to use the 15 hours old
> backup copy.
>
> But, please correct me if I am wrong, this should never actually happen,
> shouldn't it?

Never, that's why it would be helpful to know what went wrong.

> Anyone had any of these problems before? I will see if this happens again -
> and if it
> does I will have to think about using a different backend-server. I'll don't
> have to
> explain to you, that a database server that corrupts data, is completely
> useless.

HTH,
--
Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> There are 10 kinds of people in the world, those that can do binary
> arithmetic and those that can't.

pgsql-general by date:

Previous
From: "Henrik Steffen"
Date:
Subject: Serious Crash last Friday
Next
From: "Henrik Steffen"
Date:
Subject: Re: Serious Crash last Friday