Suggestions for post-mortem... - Mailing list pgsql-hackers

From Philip Warner
Subject Suggestions for post-mortem...
Date
Msg-id 43D77029.4070209@rhyme.com.au
Whole thread Raw
Responses Re: Suggestions for post-mortem...  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
Re: Suggestions for post-mortem...  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
We just had a DB die quite nastily, and have no clear idea why.

Looking in the system logs shows nothing out of the ordinary, and
looking in the db logs shows a few odd records:

2006-01-25 12:25:31 EST [mail,5017]: ERROR:  failed to fetch new tuple
for AFTER trigger
2006-01-25 12:26:01 EST [mail,93689]: WARNING:  index "XXXX_pkey"
contains 1416 row versions, but table contains 1410 row versions
2006-01-25 12:26:01 EST [mail,93689]: HINT:  Rebuild the index with REINDEX.
2006-01-25 12:26:01 EST [mail,93689]: WARNING:  index "YYYY" contains
1416 row versions, but table contains 1410 row versions

...repeated several times for several indexes of the same table.

These messages occurred almost immediately before we noticed the dead
state of the DB. Over an hour before these messages there was a
deadlock, but that's not too worrying -- the DB was still OK.

After the above messages, about 80 rows were missing from the table, and
a REINDEX did not restore them (not really surprising). The table in
question has only a small number of rows (1400-ish), but gets updated up
to 5 to 10 times per second.

Thankfully, we had replication in place and just failed over, but we'd
like to try to understand what happened to the old DB.

Any suggestions where to start? Or what the first error might signify?
Or what to put in place to catch more details next time?

It's been running fine for several months (until now) using PG 8.0.3.









pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: Cleaning up the INET/CIDR mess
Next
From: Devrim GUNDUZ
Date:
Subject: Adding a --quiet option to initdb