Thread: Corrupt data pages...

Corrupt data pages...

From
Kevin Brown
Date:
Apologies if my previous attempts to post this to the mailing list
have actually succeeded, but I've seen no evidence of that...


While doing some bugzilla testing, I ran into a data page corruption
issue.

The symptom was the usual "could not access status of transaction
<bignum>".  I tracked it down via the usual pg_filedump method and
found what appears to my untrained eye to be data tuples from a
different table residing in the (now corrupt) table in question.

First, the relevant environmental details:

System:   - ASUS A8V motherboard   - Athlon 64 3200   - 2G ECC memory (which is enabled.  The kernel will panic in the
  face of an uncorrectable ECC error).   - 2 250G Seagate ST3250823AS rev 3.03 in a mirrored configuration     via
Linuxsoftware RAID.
 

OS: Debian Linux x86_64, 'unstable' branch.  'uname -a' yields:   Linux kbrown-desktop 2.6.15-1-amd64-generic #2 Wed
Jan4 06:22:42   CET 2006 x86_64 GNU/Linux
 

Filesystem: XFS with the default mount options.

PostgreSQL:   - version: 8.1.3.   - The package is postgresql-8.1 version 8.1.3-2.   - Autovacuum is enabled.  I'm
attachingmy postgresql.conf.
 



The context:

I'm performing testing in preparation for migrating from a different
bug tracking system to bugzilla.  Most of this testing revolves around
the migration tool, which reads directly from the other bug tracking
system's database and inserts into the bugzilla database.  The
bugzilla database is hosted on PostgreSQL and has a lot of additional
foreign keys that bugzilla does not have by default.  All of the
constraints are DEFERRABLE and INITIALLY DEFERRED.

A test run consists of dropping all the constraints and all the
tables, recreating all the tables and constraints, and importing a
base set of data.  The commands for doing all that were created via
pg_dump.  They are all executed in a single transaction and then
committed.

The migration run itself also executes in a single transaction and
involves a savepoint prior to every insertion group (the name of the
savepoint is based on the name of the operation, so the total number
of outstanding savepoints at any one time is perhaps 4 or 5).  Each
migration run ends up executing perhaps 300,000 inserts.



The meat of the problem:

After the last run, a later select from one of the tables yielded the
error.  I narrowed it down to a single table (series_data) and tracked
down the offending row via a binary search.  Here's the end results:


stagezilla=# select * from series_data;
ERROR:  could not access status of transaction 1685201011
DETAIL:  could not open file "pg_clog/0647": No such file or directory
stagezilla=# select * from series_data offset 9927 limit 1;series_id |     series_date     | series_value
-----------+---------------------+--------------       46 | 2006-01-06 00:00:00 |            0
(1 row)

stagezilla=# select * from series_data offset 9928 limit 1;
ERROR:  could not access status of transaction 1685201011
DETAIL:  could not open file "pg_clog/0647": No such file or directory
stagezilla=# select ctid from series_data offset 9927 limit 1;  ctid
----------(72,136)
(1 row)

stagezilla=# select oid from pg_database where datname = 'stagezilla'; oid
-------20518
(1 row)

stagezilla=# select oid from pg_class where relname = 'series_data'; oid
-------53294
(1 row)

stagezilla=# \d series_data               Table "public.series_data"   Column    |              Type              |
Modifiers
 
--------------+--------------------------------+-----------series_id    | integer                        | not
nullseries_date | timestamp(0) without time zone | not nullseries_value | integer                        | not null
 
Indexes:   "series_data_series_id_idx" UNIQUE, btree (series_id, series_date)
Foreign-key constraints:   "series_data_series_id_series_series_id_fk" FOREIGN KEY               (series_id) REFERENCES
series(series_id)ON UPDATE               CASCADE ON DELETE CASCADE DEFERRABLE INITIALLY               DEFERRED
 



The only file in pg_clog is 0000.


After examining the output of pg_filedump, it became obvious this was
a corrupt page issue.  What bothers me is the way in which it's
corrupt.  The corrupt data looks supiciously like the data from
different table, or perhaps from an index.  In this case, from the
short_desc field of the bugs table, which has a datatype of 'text' (I
presume this means it's TOASTed?).  The text data in the corrupt page
exactly matches entries in that field of the bugs table.


So: because the hardware is relatively new and is using components
that are as reliable as I can make them given my somewhat limited
budget (ECC memory, mirrored SATA disks), I have to consider the
possibility that there's a bug here.


Note that on the amd64 architecture, gcc (or some versions thereof)
seems to have a code generation bug that can be eliminated by using
the -fno-strict-aliasing switch.  The version of gcc that seems to be
in current use is 4.0.3.  It's possible that this is causing problems,
but to be honest this is the first time I've ever seen this happen.

If you guys have any suggestions on what I might do to help track this
one down, please let me know.  I tarred up the entire 'stagezilla'
database after shutting down PostgreSQL so that I can do additional
examination of the datafiles if that proves useful.


Additionally, if you'd like to see the output of pg_filedump, I'll be
happy to post it here (or email it separately).  I hesitate to attach
it here because it's not clear the mailing list's spam filtering would
let it pass.



-- 
Kevin Brown                          kevin@sysexperts.com


Re: Corrupt data pages...

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
> After examining the output of pg_filedump, it became obvious this was
> a corrupt page issue.  What bothers me is the way in which it's
> corrupt.  The corrupt data looks supiciously like the data from
> different table, or perhaps from an index.  In this case, from the
> short_desc field of the bugs table, which has a datatype of 'text' (I
> presume this means it's TOASTed?).  The text data in the corrupt page
> exactly matches entries in that field of the bugs table.

Substitution of a page from a different table seems unlikely to result
in this error, as the tuple header layout is the same for all tables.
A page from an index, though, maybe.  Does the suspicious looking page
have any pd_special space?

We have in the past seen data-corruption errors that were clearly due to
substitution of a chunk of entirely non-Postgres data for a table page.
I suppose that those are symptoms of either kernel or device misfeasance
... it's hard to see how Postgres could cause that to happen.  We've not
seen enough of them to detect any pattern though.  I rather wonder if
what you've seen is the same thing but the substituted data happens to
be from another Postgres file.

> Additionally, if you'd like to see the output of pg_filedump, I'll be
> happy to post it here (or email it separately).  I hesitate to attach
> it here because it's not clear the mailing list's spam filtering would
> let it pass.

You should at least show the page you think is corrupt.
        regards, tom lane


Re: Corrupt data pages...

From
Kevin Brown
Date:
Tom Lane wrote:
> You should at least show the page you think is corrupt.

I attempted to send this additional info to the list but I think the
message got dropped on the floor by the mailing list software or by
the spam filter.

I'll put the files on a web server and post links to them here.



-- 
Kevin Brown                          kevin@sysexperts.com


Re: Corrupt data pages...

From
Kevin Brown
Date:
I wrote:
> I attempted to send this additional info to the list but I think the
> message got dropped on the floor by the mailing list software or by
> the spam filter.
> 
> I'll put the files on a web server and post links to them here.

You can find them here:
   https://gazebo.sysexperts.com/~kevin/postgresql

The files are bad-page-info.txt and bad-page.txt.



-- 
Kevin Brown                          kevin@sysexperts.com


Re: Corrupt data pages...

From
Tom Lane
Date:
Kevin Brown <kevin@sysexperts.com> writes:
>> I'll put the files on a web server and post links to them here.
> You can find them here:
>     https://gazebo.sysexperts.com/~kevin/postgresql

AFAICT, the first half of page 73 is OK, but the second half clearly is
trashed.  In the raw-format dump it does look a whole lot like the
second half has been replaced by the first half of a page of some index.

It's fairly hard to see how that could happen inside Postgres.  One can
readily imagine bugs that might replace one whole page with another,
but there aren't any operations that manipulate half-a-page.  On the
other hand, if your kernel uses 4K blocksize, this could be explained
as substitution of one buffer for another at the kernel level.  So my
money is on a kernel bug.  As I mentioned, we've once or twice before
seen reports that looked like similar substitutions of single pages by
the kernel.
        regards, tom lane


Re: Corrupt data pages...

From
Kevin Brown
Date:
Tom Lane wrote:
> It's fairly hard to see how that could happen inside Postgres.  One can
> readily imagine bugs that might replace one whole page with another,
> but there aren't any operations that manipulate half-a-page.  On the
> other hand, if your kernel uses 4K blocksize, this could be explained
> as substitution of one buffer for another at the kernel level.  So my
> money is on a kernel bug.  As I mentioned, we've once or twice before
> seen reports that looked like similar substitutions of single pages by
> the kernel.

Yeah, I agree that's what's likely to have happened here.  The
kernel's page size is 4k, as is the default block size used by XFS.

Tracking this one down any further is going to be nigh unto
impossible, I think.

There have been a pile of fixes to XFS in 2.6.16, so I'll start using
that, at least...


Thanks for looking at this.


-- 
Kevin Brown                          kevin@sysexperts.com