Re: Serious Crash last Friday - Mailing list pgsql-general

From Henrik Steffen
Subject Re: Serious Crash last Friday
Date
Msg-id 035001c215d7$8a2dbb40$7100a8c0@topconcepts.net
Whole thread Raw
In response to Serious Crash last Friday  ("Henrik Steffen" <steffen@city-map.de>)
List pgsql-general
Hi Martijn,

cute little program you pointed me to, thank you. So I am not the only
one expiriencing problems on certain SELECTs sometimes. That's another
very annoying thing about postgresql. Had it several times by now and
always tried to find the corrupted tuples by hand...

ok, but back to the crash of last friday:

> What DB version is this. Could it be XID wraparound?

it's postgres 7.2.1

what actually is XID wraparound, and how can I find out if I have it?

> Try the CTIDs, they will be unique.

ah, this was also new for me - have been working with oids sometimes,
but never heard of ctids before, thanks again.

> Please post the message exactly as printed out.

This is what I can see from /var/log/messages:
(these messages were often repeated:)

XLogFlush: request D/39CC9F8 is not satisfied - flushed only to D/39A4354

(some messages are in German, I'll try to translate them:)

Can't create "Unique"-Index, because table contains duplicated values

This happened while vacuuming:
Duplicated value cannot be inserted in "Unique"-Index pg_class_relname_index
Duplicated value cannot be inserted in "Unique"-Index
pg_statistic_relid_att_index
Duplicated value cannot be inserted in "Unique"-Index pg_class_oid_index

Looks like these system-tables have been corrupted, too

As i mentioned before, I copied the complete data-directory to a different
location, so
someone could have a look at the complete corrupted data.


Mit freundlichem Gruß

Henrik Steffen
Geschäftsführer

top concepts Internetmarketing GmbH
Am Steinkamp 7 - D-21684 Stade - Germany
--------------------------------------------------------
http://www.topconcepts.com          Tel. +49 4141 991230
mail: steffen@topconcepts.com       Fax. +49 4141 991233
--------------------------------------------------------
24h-Support Hotline:  +49 1908 34697 (EUR 1.86/Min,topc)
--------------------------------------------------------
System-Partner gesucht: http://www.franchise.city-map.de
--------------------------------------------------------
Handelsregister: AG Stade HRB 5811 - UstId: DE 213645563
--------------------------------------------------------

----- Original Message -----
From: "Martijn van Oosterhout" <kleptog@svana.org>
To: "Henrik Steffen" <steffen@city-map.de>
Cc: <pgsql-general@postgresql.org>
Sent: Monday, June 17, 2002 9:43 AM
Subject: Re: [GENERAL] Serious Crash last Friday


> On Mon, Jun 17, 2002 at 08:43:37AM +0200, Henrik Steffen wrote:
> >
> > Hello all,
> >
> > on Friday we experienced a very very worrying crash of our postgresql
> > server.
>
> Sound like the CTIDs are out of whack or something. If you're really
> desperate you can try the program here, it may be able to dump something.
> http://svana.org/kleptog/pgsql/pgfsck.html
>
> > Well, the crash was indicated as follows: One of my employees complained
> > that she couldn't
> > work anymore (via webinterface). The error-message was due to an error
in
> > the
> > employee-table. This particular table has a unique row for
employee-numbers.
> > Suddenly
> > there were 11 entries for the same employee. Even my name was included
> > twice, and
> > another employee still working on friday afternoon was also included 3
> > times. Note:
> > This was a table with a UNIQUE KEY - this shouldn't be possible IMHO.
>
> What DB version is this. Could it be XID wraparound?
>
> > Taking a closer look, I found additional tables, with non-unique values
in
> > UNIQUE columns.
> >
> > When trying to delete unique values by using the OIDs, I found out, that
> > even the OIDs
> > were the same!!!! Taking a yet closer look, I found out by querying
> > pg_tables that
> > there were duplicates of some tables. Then there was the message:
"Backend
> > message type
> > 0x44 arrived while idle"
>
> Try the CTIDs, they will be unique.
>
> > I was running VACUUM and VACUUM FULL a hundred times - but it failed to
> > repair these
> > errors. It didn't even succeed in running VACUUM on all tables: VACUUM
> > complained something
> > about "UNIQUE" (I didn't write down the exact error message though).
>
> Please post the message exactly as printed out.
>
> > Then I tried to DUMP as much as I could, then I stopped the database,
moved
> > the db-folder to
> > a different location, did a new initdb and restored the whole system.
> > Unfortunately
> > there was one table I couldn't dump at all and I had to use the 15 hours
old
> > backup copy.
> >
> > But, please correct me if I am wrong, this should never actually happen,
> > shouldn't it?
>
> Never, that's why it would be helpful to know what went wrong.
>
> > Anyone had any of these problems before? I will see if this happens
again -
> > and if it
> > does I will have to think about using a different backend-server. I'll
don't
> > have to
> > explain to you, that a database server that corrupts data, is completely
> > useless.
>
> HTH,
> --
> Martijn van Oosterhout   <kleptog@svana.org>   http://svana.org/kleptog/
> > There are 10 kinds of people in the world, those that can do binary
> > arithmetic and those that can't.
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly


pgsql-general by date:

Previous
From: Martijn van Oosterhout
Date:
Subject: Re: Serious Crash last Friday
Next
From: "Henrik Steffen"
Date:
Subject: Re: Serious Crash last Friday