Thread: Mysterious server crashes

Mysterious server crashes

From
Žiga Kranjec
Date:
Hello!

Recently we have upgraded our debian system (sid),
which has since started crashing mysteriously.
We are still looking into that. It runs on 3ware RAID.
Postgres package is 8.4.8-2.

The database came back up apparently ok, except
for indexes. Running reindex produces this error on
one of the tables:

ERROR:  unexpected chunk number 1 (expected 0) for toast value 17539760 
in pg_toast_16992

Same with select.

I tried running reindex on toast table didn't help. Running:

select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;

crashed postgres backend (and apparently the whole server).

Is there anything we can/should do to fix the problem, besides
restoring the whole database from backup?

Thanks!

Ziga



Re: Mysterious server crashes

From
"ktm@rice.edu"
Date:
On Fri, Jul 15, 2011 at 11:37:54PM +0200, Žiga Kranjec wrote:
> Hello!
> 
> Recently we have upgraded our debian system (sid),
> which has since started crashing mysteriously.
> We are still looking into that. It runs on 3ware RAID.
> Postgres package is 8.4.8-2.
> 
> The database came back up apparently ok, except
> for indexes. Running reindex produces this error on
> one of the tables:
> 
> ERROR:  unexpected chunk number 1 (expected 0) for toast value
> 17539760 in pg_toast_16992
> 
> Same with select.
> 
> I tried running reindex on toast table didn't help. Running:
> 
> select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;
> 
> crashed postgres backend (and apparently the whole server).
> 
> Is there anything we can/should do to fix the problem, besides
> restoring the whole database from backup?
> 
> Thanks!
> 
> Ziga
> 

Hi Ziga,

I do not want to be negative, but it sounds like your server is
having serious problems completely outside of PostgreSQL. Reading a
file should not cause your system to crash. That sounds like a
driver or hardware problem and you need to fix that. I would make
sure you have a good backup for your DB before you do anything
else.

Good luck,
Ken


Re: Mysterious server crashes

From
Robert Haas
Date:
On Fri, Jul 15, 2011 at 5:37 PM, Žiga Kranjec <ziga@ljudmila.org> wrote:
> Recently we have upgraded our debian system (sid),
> which has since started crashing mysteriously.
> We are still looking into that. It runs on 3ware RAID.
> Postgres package is 8.4.8-2.
>
> The database came back up apparently ok, except
> for indexes. Running reindex produces this error on
> one of the tables:
>
> ERROR:  unexpected chunk number 1 (expected 0) for toast value 17539760 in
> pg_toast_16992
>
> Same with select.
>
> I tried running reindex on toast table didn't help. Running:
>
> select * from pg_toast.pg_toast_16992 where chunk_id = 17539760;
>
> crashed postgres backend (and apparently the whole server).
>
> Is there anything we can/should do to fix the problem, besides
> restoring the whole database from backup?

Well, in theory, an operating system crash shouldn't corrupt your
database.  Maybe you've configured fsync=off, or have some other
problem that is making it not work reliably.  There are some useful
resources here:

http://wiki.postgresql.org/wiki/Reliable_Writes

At this point, it sounds like things are pretty badly messed up.  A
restore from backup seems like a good idea, but first you might want
to try to track down what else is wrong with this machine (bad memory?
corrupted OS?), else you might find yourself back in the same
situation all over again pretty quickly.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company