Thread: a question about data corruption

a question about data corruption

From
"Jacek Rembisz"
Date:
Hello,

My customer has experienced a serious data corruption.
He was using a posgresql version 8.0.3
I know that it is an old version.

Postgresql started  to log "could not access status of transaction"
messages. Since the transaction IDs were far away from what
server was using I took a look at data files i pgsql/base/ and I found
a total garbage there.

In five tables (of about 100) I have found a one to four blocks of random data.
In two places it was data from other table instead of random data.
All these blocks have sizes which are multiplications of 512 (for
example 1, 9, 26) and
starts at offsets which are also a multiplication of 512.

The database is small <1G and there are no deletes or updates to
affected tables,
just inserts and selects. The database was vacuumed about 3 months
before the time
the corruption has been noticed. The system is linux 2.4.31 filesystem
XFS on RAID5
And there are no messages sugesting I/O error or something

My question is: Is there any known bug in postgresql 8.0.3 that
could lead to such a data corruption or is it rather a hardware problem?

Best regards,
Jacek Rembisz
PS. Please cc me as I'm not a subscriber.

Re: a question about data corruption

From
Tom Lane
Date:
"Jacek Rembisz" <jacek.rembisz@gmail.com> writes:
> Postgresql started  to log "could not access status of transaction"
> messages. Since the transaction IDs were far away from what
> server was using I took a look at data files i pgsql/base/ and I found
> a total garbage there.

> In five tables (of about 100) I have found a one to four blocks of random data.
> In two places it was data from other table instead of random data.
> All these blocks have sizes which are multiplications of 512 (for
> example 1, 9, 26) and
> starts at offsets which are also a multiplication of 512.

Substituting sector-size blocks of one file for another could easily
be a filesystem (kernel) bug ...

> The system is linux 2.4.31 filesystem XFS on RAID5

... and XFS on such an old kernel version doesn't seem like a very
good bet for stability.

> My question is: Is there any known bug in postgresql 8.0.3 that
> could lead to such a data corruption or is it rather a hardware problem?

No, nothing like that has ever been reported in any released PG
version.  If the substituted blocks were from non-Postgres files
then I think you could write off the idea of a PG bug entirely.
It could still be a software issue though.

            regards, tom lane

Re: a question about data corruption

From
"Jacek Rembisz"
Date:
2008/6/26 Tom Lane <tgl@sss.pgh.pa.us>:

> No, nothing like that has ever been reported in any released PG
> version.  If the substituted blocks were from non-Postgres files
> then I think you could write off the idea of a PG bug entirely.
> It could still be a software issue though.

Thank you for a quick response. The issue is clear now.
The machine didn't pass a hardware test.

Best regards,
Jacek Rembisz