Re: missing chunk number 0 for toast value - Mailing list pgsql-admin

From BJ Taylor
Subject Re: missing chunk number 0 for toast value
Date
Msg-id 3d78fcfd0809250909h148daba8k2ad29e176d9a06fd@mail.gmail.com
Whole thread Raw
In response to Re: missing chunk number 0 for toast value  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: missing chunk number 0 for toast value  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: missing chunk number 0 for toast value  ("Scott Marlowe" <scott.marlowe@gmail.com>)
List pgsql-admin
Hey Tom,

Here are some recent logs from our system.  Unfortunately, I didn't think to grab the logs at the time I killed those processes, and now they are gone.  I found those processes by using ps, and then I killed them with a simple kill processid.  Here are samples of our current log files:

FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
LOG:  autovacuum launcher started
LOG:  database system is ready to accept connections
PANIC:  right sibling's left-link doesn't match: block 175337 links to 243096 instead of expected 29675 in index "dbmail_headervalue_3"
STATEMENT:  INSERT INTO dbmail_headervalue (headername_id, physmessage_id, headervalue) VALUES (4,12335778,'from [76.13.13.25] by n6.bullet.mail.ac4.yahoo.com with NNFMP; 25 Sep 2008 04:01:36 -0000')
LOG:  server process (PID 13888) was terminated by signal 6: Aborted
LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
FATAL:  the database system is in recovery mode
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
LOG:  all server processes terminated; reinitializing
LOG:  database system was interrupted; last known up at 2008-09-25 09:12:41 MDT
LOG:  database system was not properly shut down; automatic recovery in progress
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode

...

FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
LOG:  redo starts at 3A/2D0DEA78
LOG:  record with zero length at 3A/2D1B8D68
LOG:  redo done at 3A/2D1B8D3C
LOG:  last completed transaction was at log time 2008-09-25 09:12:45.204162-06
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode

...

FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode
LOG:  redo starts at 3A/2D1B8DA8
LOG:  unexpected pageaddr 3A/2520A000 in log file 58, segment 45, offset 2138112
LOG:  redo done at 3A/2D208660
LOG:  last completed transaction was at log time 2008-09-25 09:12:47.971207-06
FATAL:  the database system is in recovery mode
FATAL:  the database system is in recovery mode

...

LOG:  unexpected EOF on client connection
LOG:  unexpected EOF on client connection
ERROR:  missing chunk number 0 for toast value 554365
STATEMENT:  SELECT messageblk, is_header FROM dbmail_messageblks WHERE physmessage_id = 12111760 ORDER BY messageblk_idnr
LOG:  unexpected EOF on client connection
LOG:  unexpected EOF on client connection


To be honest, I don't know if all of these logs are relevant or not.  I half way suspect that nagios causes the "unexpected EOF on client connection" notices, but I can't be certain.

You also asked how it is being unstable.  It drops connections seemingly at random.  The error received when a connection is dropped is the following:

WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory.
HINT:  In a moment you should be able to reconnect to the database and repeat your command.


Please let me know if there are any other questions I can answer for you.

Thanks,
BJ

On Thu, Sep 25, 2008 at 7:24 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
"BJ Taylor" <btaylor@propertysolutions.com> writes:
> We are using version 8.3.1.  And to be precise, when I started the vacuum
> (analyze), I started it as a cron job to run daily around midnight.  The
> next day I came in and checked on it and it was still running.  Not thinking
> that it would take more than a full 24 hours to run, I let it be, and the
> next day I came in and the server started acting weird.  I believe the
> vacuum process continued to run, and a second vacuum process was started.
> The server became unstable, and refused incoming connections.

Unstable how?  What error did you get on the refused connections?  What
was showing up in the postmaster log?

> At which
> point, I killed all vacuum processes, and restarted postgresql.

How did you do that killing exactly?

> I believe
> it was somewhere during this process that the database became corrupted.  I
> am not certain what happens when two vacuum processes run at the same time.

Nothing of interest, it's done all the time.

> That may have been the problem, or it may not have.  Or it may have been
> that I killed the vacuum process in the middle of what it was doing.  One
> way or another, the problem that we have now, is that we are unable to get a
> dump of the database for backups, and the database seems less stable than it
> was previously (dropping connections, and refusing connections seemingly at
> random).

Again, what errors are you getting exactly, and what shows up in the
postmaster log?

                       regards, tom lane

--
Sent via pgsql-admin mailing list (pgsql-admin@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-admin

pgsql-admin by date:

Previous
From: Tom Lane
Date:
Subject: Re: missing chunk number 0 for toast value
Next
From: "Aras Angelo"
Date:
Subject: Strange highload on server