corruption on data table disconnects other clients silently when page is read - Mailing list pgsql-general

From Vick Khera
Subject corruption on data table disconnects other clients silently when page is read
Date
Msg-id 2968dfd60905200806le852cadv6fd59aa00a03ba3b@mail.gmail.com
Whole thread Raw
Responses Re: corruption on data table disconnects other clients silently when page is read  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
I have a table that when I run a sequential scan on it results in this error:

ERROR:  compressed data is corrupt

I tried to reindex the table, and I got this error on only one of the indexes:

ERROR:  index row requires 509139048 bytes, maximum size is 8191

This particular index references the primary key and one other column.
 The reindex of the PK itself succeeded.  The reindex of the index
from two other columns also succeeded.

When I run pg_dump on that table, I get this error:

ERROR:  timestamp out of range

Clearly, there is something bad with the data and I need to recreate
it from backup.

However, the issue I'm a bit more concerned with is that the select
query that results in the sequential scan not only issues that
warning, but seemingly causes all other clients connected to the DB to
be disconnected with the error "server closed the connection
unexpectedly  This probably means the server terminated abnormally
before or while processing the request."

Not only are the other clients disconnected, but there is NO
indication whatsoever in the server log that the DB disconnected any
clients and is "recovering".  An immediate reconnect to the DB by a
disconnected app shows the error "FATAL:  the database system is in
recovery mode".  Strangely, the psql session in which I ran the select
did not itself disconnect!

I am suspecting the cause of this initially was hardware (this is the
second corruption I've found on this server *after* I had two nearly
simultaneous disk failures on a RAID6 volume) so I will be rebuilding
the whole filesystem and PG directory, but I wanted to post these
details out here.

It would also be extremely helpful if the data read problems would at
least spit out the relation ID and CID of the tuple.  Narrowing it
down to a specific table was just lucky guesswork.

pgsql-general by date:

Previous
From: Sam Mason
Date:
Subject: Re: Can I pause the pg_dump or pg_restore
Next
From: Tom Lane
Date:
Subject: Re: corruption on data table disconnects other clients silently when page is read