Thread: corruption on data table disconnects other clients silently when page is read

I have a table that when I run a sequential scan on it results in this error:

ERROR:  compressed data is corrupt

I tried to reindex the table, and I got this error on only one of the indexes:

ERROR:  index row requires 509139048 bytes, maximum size is 8191

This particular index references the primary key and one other column.
 The reindex of the PK itself succeeded.  The reindex of the index
from two other columns also succeeded.

When I run pg_dump on that table, I get this error:

ERROR:  timestamp out of range

Clearly, there is something bad with the data and I need to recreate
it from backup.

However, the issue I'm a bit more concerned with is that the select
query that results in the sequential scan not only issues that
warning, but seemingly causes all other clients connected to the DB to
be disconnected with the error "server closed the connection
unexpectedly  This probably means the server terminated abnormally
before or while processing the request."

Not only are the other clients disconnected, but there is NO
indication whatsoever in the server log that the DB disconnected any
clients and is "recovering".  An immediate reconnect to the DB by a
disconnected app shows the error "FATAL:  the database system is in
recovery mode".  Strangely, the psql session in which I ran the select
did not itself disconnect!

I am suspecting the cause of this initially was hardware (this is the
second corruption I've found on this server *after* I had two nearly
simultaneous disk failures on a RAID6 volume) so I will be rebuilding
the whole filesystem and PG directory, but I wanted to post these
details out here.

It would also be extremely helpful if the data read problems would at
least spit out the relation ID and CID of the tuple.  Narrowing it
down to a specific table was just lucky guesswork.

Vick Khera <vivek@khera.org> writes:
> Not only are the other clients disconnected, but there is NO
> indication whatsoever in the server log that the DB disconnected any
> clients and is "recovering".

You sure you're looking at the right server log?  This certainly
sounds pretty much like a standard crash-and-recovery sequence.

            regards, tom lane

On Wed, May 20, 2009 at 11:25 AM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Vick Khera <vivek@khera.org> writes:
>> Not only are the other clients disconnected, but there is NO
>> indication whatsoever in the server log that the DB disconnected any
>> clients and is "recovering".
>
> You sure you're looking at the right server log?  This certainly
> sounds pretty much like a standard crash-and-recovery sequence.

Yes; I see the ERROR from the select, including the select statement
itself, then nothing after that.  I use syslog to log and everything
goes to that one log file.  I normally see the DB recovery notices in
there when the DB is in recovery.

Does psql silently reconnect to the DB?  Because my interactive shell
that caused the error did not disconnect.

On 2009-05-20, Vick Khera <vivek@khera.org> wrote:

> Does psql silently reconnect to the DB?

I have noticed that behaviour recently.