Thread: Bug (#3484) - Invalid page header again
Hi folks, we had reported about various database problems some weeks ago. Since then we have updated the database to release 8.2.4 und the linux kernel to 2.6.22.6-smp. Now we got an error again: IN SHORT: - data is inserted - the same data is read and exported successfully - after a nightly vacuum analyze the data is corrupted and cannot be read any more. Error message(s): see below CONCLUSION Apparently the data is corrupted by "vacuum analyze". IN DETAIL: We got an error during the nightly dump again: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551610 Some records in table "transaktion" (which contains about 45 million records) are corrupted. But the data has not been corrupted after insertion, it must have been corrupted later. Let us explain the track of the error: 1. 2007/12/07 ~3:30h: The (now corrupted) data was inserted successfully 2. 2007/12/07 7h : The (now corrupted) data was read and exported successfully! ( We run an export of the data every morning at 7h, which exports the data we retrieved/inserted during the last 24 hours ) 3. 2007/12/07 22h: Database was dumped successfully 4. 2007/12/07 23:15h: Database "vacuum analyze" was run successfully 5. 2007/12/08 22h: The database dump got the error described above: pg_dump: SQL command failed pg_dump: Error message from server: ERROR: invalid memory alloc request size 18446744073709551610 6. 2007/12/08 23h: "vacuum analyze" threw an error: INFO: vacuuming "public.transaktion" WARNING: relation "transaktion" TID 1240631/12: OID is invalid ERROR: invalid page header in block 1240632 of relation "transaktion" 7. 2007/12/10 : We started the export of the data ( which runs every morning ) for the last days again. These exports use the same SQL-Commands as the automatical run. But now, we got an error when exporting the data for 2007/12/07. ERROR: invalid memory alloc request size 18446744073709551610 The process exporting the same set of data ran successfully in the morning of the 2007/12/07: We are very sure, that the data has not been manipulated since the time of insertion, because the error occurs on the testing system and at the moment no tests except from inserting and exporting the data are done. 8. 2007/12/14 When we now start a select over the corrupted data, we get the error message: ERROR: could not access status of transaction 313765632 DETAIL: Could not open file "pg_clog/012B": Datei oder Verzeichnis nicht gefunden. We are using Linux version: 2.6.22.6-smp. Hardware system: 2 dual core processor ( Intel(R) Xeon(TM) CPU 2.80GHz ) postgres-Version: 8.2.4 -------- Original-Nachricht -------- Betreff: Missing pg_clog file / corrupt index / invalid page header Datum: Wed, 05 Sep 2007 08:18:31 +0200 Von: alex <an@clickware.de> Organisation: click:ware GmbH An: pgsql-bugs@postgresql.org My colleague Marc Schablewski reported this Bug (#3484) the first time at the end of July. The described problem occured twice at our database and now it happened again. Summary ========== Various errors like: "invalid page header in block 8658 of relation", "could not open segment 2 of relation 1663/77142409/266753945 (target block 809775152)", "ERROR: could not access status of transaction 2134240 DETAIL: could not open file "pg_clog/0002": File not found", "CEST PANIC: right sibling's left-link doesn't match" on the following system: Postgres 8.1.8 SUsE Linux Kernel 2.6.13-15.8-smp 2 Intel XEON Processors with 2 cores each ECC-Ram Hardware Raid (mirror set) Detailed description ======================= The message was thrown by the nightly pg_dump: pg_dump: ERROR: invalid page header in block 8658 of relation "import_data_zeilen" pg_dump: SQL command to dump the contents of table "import_data_zeilen" failed: PQendcopy() failed. pg_dump: Error message from server: ERROR: invalid page header in block 8658 of relation "import_data_zeilen" pg_dump: The command was: COPY public.import_data_zeilen (id, eda_id, zeile, man_id, sta_id) TO stdout; A manually executed dedicated dump on the concerned table was processed successfully ( at daytime! ) We were really suprised! Also, select-queries (using indexes) on the table succeeded. (in the past when the error occured, select-queries failed). So, no repair seemed to be needed for the table. The following night, the pg_dump succeeded, but the "vacuum analyze" (executed after the pg_dump) threw the same error: INFO: vacuuming "public.import_data_zeilen" ERROR: invalid page header in block 8658 of relation "import_data_zeilen" Any select on this table using indexes now failed! ( if the resultset contained the corrupted data ) This behaviour is very confusing. Re-creating the table solved the problem. However, the damaged rows were lost. We have two systems, one active, one for tests. They are nearly identical, having similar hardare, using the same software and they are running under the same load. The errors always occured on the active server, the test-server didn't run into errors after upgrading both servers from 8.1.3 to 8.1.8. So even though no hardware errors were detected (neither ECC-RAM-Errors nor disk errors) we decided to swap the server's roles, to find out if its a hardware or software problem. This was 12 days ago. Now we got another error, again on the active system (which now uses the hardware from the other system except for the one of the hard disks in the raid), which was thrown by an insert statement done by the software: org.postgresql.util.PSQLException: ERROR: could not open segment 2 of relation 1663/77142409/266753945 (target block 809775152): Datei oder Verzeichnis nicht gefunden. Obviously we have a problem with the he active server. But its unlikely to be a hardware problem, because we changed the hard disks and the error occured at the same (software) system. Also we are using ECC-Ram and a raid system (mirrorset) with hardware raid controller, which hasn't reported any errors. We read the last post/thread concerning this bug. In this thread the problem was connected to some kernel bug in 2.6.11. We are using a higher Linux version: 2.6.13-15.8-smp. Hardware system: 2 dual core processor ( Intel(R) Xeon(TM) CPU 2.80GHz ) postgres-Version: 8.1.8 We have done a lot of database maintenance 4 days ago, which among other updates dropped about 10 indexes on one big table ( 35'000'000 recordsets ) and created some other 10 indexes (for better performance). Given that the problem occurred on two different machines we are very sure that it is *not* a hardware problem. We would really appreciate any help with our problems. Thanks in advance A. Nitzschke
alex <an@clickware.de> writes: > Given that the problem occurred on two different machines we are very > sure that it is *not* a hardware problem. I wouldn't be so sure, especially not when the behavior looks like a hardware problem in every other respect. You've heard of common-mode failures, no? Do both machines have RAM chips from the same batch, for instance? regards, tom lane
alex wrote: <snip> > WARNING: relation "transaktion" TID 1240631/12: OID is invalid > ERROR: invalid page header in block 1240632 of relation "transaktion" > 7. 2007/12/10 : We started the export of the data ( which runs every > morning ) for the last days again. These exports use the same > SQL-Commands as the automatical run. Alex, please can you provide binary dump of these two pages or if there are sensitive data try to use pg_filedump to get only page and tuple headers? Zdenek
Zdenek Kotala wrote: > alex wrote: > > <snip> > >> WARNING: relation "transaktion" TID 1240631/12: OID is invalid >> ERROR: invalid page header in block 1240632 of relation "transaktion" >> 7. 2007/12/10 : We started the export of the data ( which runs every >> morning ) for the last days again. These exports use the same >> SQL-Commands as the automatical run. > > Alex, > > please can you provide binary dump of these two pages or if there are > sensitive data try to use pg_filedump to get only page and tuple headers? > > I got dump of affected two blocks from Alex and it seems that both blocks were overwritten together with some 128bytes length structure (there some pattern) and complete damaged size is 9728bytes (first block is overwritten completely and second one only at the beginning), but another buffer from another relation could be overwritten too. I think it is more software bug than hardware, because bad data contains some logic. There is x54 byte which is repeated after each 128 bytes and so on and most data are zeros. My suggestion is apply following patch to catch if data are corrupted by postgreSQL or elsewhere. It should be failed before writing damaged data to the disk. It is for HEAD but similar patch could be backported. Index: backend/storage/buffer/bufmgr.c =================================================================== RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/bufmgr.c,v retrieving revision 1.227 diff -c -r1.227 bufmgr.c *** backend/storage/buffer/bufmgr.c 15 Nov 2007 21:14:37 -0000 1.227 --- backend/storage/buffer/bufmgr.c 18 Dec 2007 15:50:06 -0000 *************** *** 1734,1739 **** --- 1734,1741 ---- buf->flags &= ~BM_JUST_DIRTIED; UnlockBufHdr(buf); + if (!PageHeaderIsValid((PageHeader) BufHdrGetBlock(buf))) + elog(FATAL, "Buffer cache is damaged!"); smgrwrite(reln, buf->tag.blockNum, (char *) BufHdrGetBlock(buf), *************** *** 1966,1971 **** --- 1968,1976 ---- errcontext.previous = error_context_stack; error_context_stack = &errcontext; + if (!PageHeaderIsValid((PageHeader) BufHdrGetBlock(bufHdr))) + elog(FATAL, "Buffer cache is damaged!"); + smgrwrite(rel->rd_smgr, bufHdr->tag.blockNum, (char *) LocalBufHdrGetBlock(bufHdr), Index: backend/storage/buffer/localbuf.c =================================================================== RCS file: /zfs_data/cvs_pgsql/cvsroot/pgsql/src/backend/storage/buffer/localbuf.c,v retrieving revision 1.78 diff -c -r1.78 localbuf.c *** backend/storage/buffer/localbuf.c 15 Nov 2007 21:14:38 -0000 1.78 --- backend/storage/buffer/localbuf.c 18 Dec 2007 16:05:49 -0000 *************** *** 16,21 **** --- 16,22 ---- #include "postgres.h" #include "storage/buf_internals.h" + #include"storage/bufpage.h" #include "storage/bufmgr.h" #include "storage/smgr.h" #include "utils/guc.h" *************** *** 161,166 **** --- 162,169 ---- oreln = smgropen(bufHdr->tag.rnode); /* And write... */ + if (!PageHeaderIsValid((PageHeader) LocalBufHdrGetBlock(bufHdr))) + elog(FATAL, "Local buffer cache is damaged!"); smgrwrite(oreln, bufHdr->tag.blockNum, (char *) LocalBufHdrGetBlock(bufHdr),
Zdenek Kotala wrote: > Zdenek Kotala wrote: > I got dump of affected two blocks from Alex and it seems that both > blocks were overwritten together with some 128bytes length structure > (there some pattern) and complete damaged size is 9728bytes (first block > is overwritten completely and second one only at the beginning), but > another buffer from another relation could be overwritten too. I'm sorry, It is 129 length (do not trust ghex2 :( ) and probably size of damaged data is about 9728bytes. Zdenek
"Zdenek Kotala" <Zdenek.Kotala@Sun.COM> writes: > I got dump of affected two blocks from Alex and it seems that both blocks were > overwritten together with some 128bytes length structure (there some pattern) > and complete damaged size is 9728bytes (first block is overwritten completely > and second one only at the beginning), but another buffer from another relation > could be overwritten too. I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you saying one whole block is trashed and then part, but not all, of the next block? What's the block size of the ZFS filesystem? And what exactly does the trash data look like? -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
Gregory Stark <stark@enterprisedb.com> writes: > I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you > saying one whole block is trashed and then part, but not all, of the next > block? > What's the block size of the ZFS filesystem? And what exactly does the trash > data look like? It seems relevant that 9728 is a multiple of 512 (and not of any larger power of 2). That makes it sound like the misfeasance happened at the disk block level. Postgres doesn't internally deal in 512-byte chunks anywhere I can think of, so to posit a DB bug you have to come up with some other explanation why exactly that much of the block got trashed. A filesystem-level software bug is not out of the question, though this still smells to me like it's a hardware issue of some kind. regards, tom lane
Gregory Stark wrote: > "Zdenek Kotala" <Zdenek.Kotala@Sun.COM> writes: > >> I got dump of affected two blocks from Alex and it seems that both blocks were >> overwritten together with some 128bytes length structure (there some pattern) >> and complete damaged size is 9728bytes (first block is overwritten completely >> and second one only at the beginning), but another buffer from another relation >> could be overwritten too. > > I don't understand this 9728 bytes. Postgres blocks are 8192 bytes. Are you > saying one whole block is trashed and then part, but not all, of the next > block? > > What's the block size of the ZFS filesystem? And what exactly does the trash > data look like? > ZFS has dynamic block size (max 128kB), but this system runs on Linux probably ext3 FS. Please, ale can you confirm used filesystem? Zdenek