BUG #1208: Invalid page header - Mailing list pgsql-bugs
From | PostgreSQL Bugs List |
---|---|
Subject | BUG #1208: Invalid page header |
Date | |
Msg-id | 20040810134538.02E995A1105@www.postgresql.com Whole thread Raw |
Responses |
Re: BUG #1208: Invalid page header
|
List | pgsql-bugs |
The following bug has been logged online: Bug reference: 1208 Logged by: Robert E Bruccoleri Email address: bruc@stone.congenomics.com PostgreSQL version: 7.4 Operating system: Linux Advanced Server 2.1 and SGI ProPack 2.4 Description: Invalid page header Details: ============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Robert Bruccoleri Your email address : bruc@acm.org System Configuration --------------------- Architecture (example: Intel Pentium) : Intel Itanium 2 Operating System (example: Linux 2.4.18) : Linux 2.4.21 (SGI Propack 2.4 patch 10074) PostgreSQL version (example: PostgreSQL-7.4.3): PostgreSQL-7.4.3 Compiler used (example: gcc 2.95.2) : Intel C compiler version 8.0 Please enter a FULL description of your problem: ------------------------------------------------ I am getting sporadic invalid page header errors when loading or vacuuming databases in parallel. We are in the process of migrating from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running PostgreSQL 7.4.3. The Altix system has 64 processors with 256 gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we start the system with a buffer cache of 130000 pages. Fdatasync is used for synchronization. We use an LSI Logic storage system to store the PostgreSQL databases as well as for much of our departments data, and we have about 5 terabytes used actively. The filesystem is XFS as delivered by SGI, which wrote it. I do not believe that we have any problems with unreliable disk storage. First, no other users have complained about problems and we have a lot more in use than what PostgreSQL is using. Second, the storage system is an enterprise class Fibre Channel dual controller RAID system designed for high redundancy and reliability. It has no single points of failure. We've been using it for over a year with no problems. We have about 14 active databases, and I loaded all 14 simultaneously. No errors were noted during the load, but upon vacuuming all the databases, one of the databases encountered the following message: INFO: vacuuming "public.relationships" vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR: invalid page header in block 4763 of relation "relationships" There may be others with problems, but vacuumdb quit after this error. I downloaded pg_filedump and I ran it on the file containing this relation specifying a range covering a block around the erroneous block. The two blocks around the bad block have data as I would have expected for the "relationships" table, but the bad block has data from a table in another database. Here is part of the pg_filedump output: ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 367457 * Options used: -f -R 4763 4763 * * Dump created on: Wed Aug 4 19:47:46 2004 ******************************************************************* Block 4763 ******************************************************** <Header> ----- Block Offset: 0x094d8000 Offsets: Lower 0 (0x0000) Block: Size 0 Version 0 Upper 61440 (0xf000) LSN: logid 118874 recoff 0x0000000d Special 25476 (0x6384) Items: 0 Free Space: 61440 Length (including item array): 24 Error: Invalid header information. 0000: 5ad00100 0d000000 22000000 000000f0 Z......."....... 0010: 84630000 00000000 .c...... <Data> ------ Empty block - no items listed <Special Section> ----- Error: Invalid special section encountered. 6384: 32343433 38320000 a9270000 ab270000 244382...'...'.. 6394: 00000000 01000000 00000000 1edbab73 ...............s 63a4: 0e8f3ba6 22000000 40e3ffef 22000000 ..;."...@..."... 63b4: 68e2ffef 020a0000 b400000a fdb70500 h............... 63c4: bbc30500 08008f6e ae001200 02081800 .......n........ 63d4: 0e000000 52313031 5f343438 38340000 ....R101_44884.. 63e4: 15000000 15000000 4e545f30 31303839 ........NT_01089 63f4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... 6404: 0f000000 70646231 63686b2e 412e2d00 ....pdb1chk.A.-. 6414: ee000000 00000000 48e17a14 ae470340 ........H.z..G.@ 6424: 295c8fc2 f5280640 c3f5285c 8fc20b40 )\...(.@..(\...@ 6434: 3d0ad7a3 703d1340 0d000000 7f000000 =...p=.@........ 6444: 06819543 8b6c0640 d7a3703d 0a571040 ...C.l.@..p=.W.@ 6454: 91b8c7d2 87e62640 00000000 0078ca40 ......&@.....x.@ 6464: 00000000 00000000 00000000 002062c0 ............. b. 6474: 00000000 e5fd877a 720918a8 22000000 .......zr..."... 6484: a06300f0 22000000 a06300f0 020a0000 .c.."....c...... 6494: b400800a fdb70500 bbc30500 0800906e ...............n 64a4: 01001200 02081800 0e000000 52313031 ............R101 64b4: 5f343438 38340000 15000000 15000000 _44884.......... 64c4: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se 64d4: 63000000 91000000 0f000000 70646231 c...........pdb1 64e4: 63686d2e 422e2d00 91010000 00000000 chm.B.-......... 64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40 .Q.....@=...p=.@ 6504: 52b81e85 eb511140 b81e85eb 51b81a40 R....Q.@....Q..@ 6514: 13000000 6c000000 e7fba9f1 d24d0d40 ....l........M.@ 6524: 52b81e85 ebd11740 7940d994 2bd03540 R......@y@..+.5@ 6534: 00000000 0043bd40 00000000 00000000 .....C.@........ 6544: 00000000 00c068c0 00000000 f7d17b03 ......h.......{. 6554: 08edd30d 22000000 786400f0 22000000 ...."...xd.."... 6564: 786400f0 020a0000 b400000a fdb70500 xd.............. 6574: bbc30500 0800906e 02001200 02081800 .......n........ 6584: 0e000000 52313031 5f343438 38340000 ....R101_44884.. 6594: 15000000 15000000 4e545f30 31303839 ........NT_01089 65a4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... 65b4: 0f000000 70646231 6369342e 412e2d00 ....pdb1ci4.A.-. 65c4: 59000000 00000000 3d0ad7a3 703df63f Y.......=...p=.? 65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40 ...Q...@..(\...@ 65e4: 33333333 33331840 12000000 54000000 333333.@....T... 65f4: 06819543 8b6c0640 cdcccccc cc4c1540 ...C.l.@.....L.@ 6604: c3d32b65 19da2d40 00000000 0033be40 ..+e..-@.....3.@ 6614: 00000000 00000000 00000000 00406e40 .............@n@ 6624: 00000000 c61e23a3 820d4664 22000000 ......#...Fd"... 6634: 506500f0 22000000 506500f0 020a0000 Pe.."...Pe...... 6644: b400000a fdb70500 bbc30500 0800906e ...............n 6654: 03001200 02081800 0e000000 52313031 ............R101 6664: 5f343438 38340000 15000000 15000000 _44884.......... 6674: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se 6684: 63000000 91000000 0f000000 70646231 c...........pdb1 6694: 6369642e 2d2e2d00 b1000000 00000000 cid.-.-......... <truncated> In block 4763, there is data from another database named proceryon in the 14 that I loaded simultaneously. If this were an disk I/O error, then I would not have expected to see tuples from another database. I'd expect gibberish or nulls. I ran a vacuumdb on the table in proceryon that had data above, and there is no error. However, other tables in the proceryon database have invalid page headers. Here is another example: > pg_filedump -d -R 18311 18311 379598.3 ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 379598.3 * Options used: -d -R 18311 18311 * * Dump created on: Thu Aug 5 16:18:39 2004 ******************************************************************* Block 18311 ******************************************************** 0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll <truncated -- all the same> *** End of Requested Range Encountered. Last Block Read: 18311 *** Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- I have been trying to use the test case of Hubert Froehlich, http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php, but they do not generate any errors on our system. Only these big loads cause it. If you know how this problem might be fixed, list the solution below: --------------------------------------------------------------------- I am willing to be the hands of any PostgreSQL developer to explore this problem. The system is not in production, so I can make changes at will. +-----------------------------+------------------------------------+ | Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org | | President, Congenair LLC | URL: http://www.congen.com/~bruc | | P.O. Box 314 | Phone: 609 818 7251 | Command: Quit Folder unchanged. stone bruc 2 >>cat foo.foo.foo.invalid From bruc Sun Aug 8 20:18:49 2004 Subject: Invalid page header errors in PostgreSQL 7.4.3 To: pgsql-bugs@postgresql.org Date: Sun, 8 Aug 2004 20:18:49 -0400 (EDT) Cc: hubert.froehlich@bvv.bayern.de, tgl@sss.pgh.pa.us Reply-To: bruc@stone.congen.com X-Mailer: ELM [version 2.4 PL25 ME8b] MIME-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit Content-Length: 9173 Status: RO X-Archive-Number: 200408/40 ============================================================================ POSTGRESQL BUG REPORT TEMPLATE ============================================================================ Your name : Robert Bruccoleri Your email address : bruc@acm.org System Configuration --------------------- Architecture (example: Intel Pentium) : Intel Itanium 2 Operating System (example: Linux 2.4.18) : Linux 2.4.21 (SGI Propack 2.4 patch 10074) PostgreSQL version (example: PostgreSQL-7.4.3): PostgreSQL-7.4.3 Compiler used (example: gcc 2.95.2) : Intel C compiler version 8.0 Please enter a FULL description of your problem: ------------------------------------------------ I am getting sporadic invalid page header errors when loading or vacuuming databases in parallel. We are in the process of migrating from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running PostgreSQL 7.4.3. The Altix system has 64 processors with 256 gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we start the system with a buffer cache of 130000 pages. Fdatasync is used for synchronization. We use an LSI Logic storage system to store the PostgreSQL databases as well as for much of our departments data, and we have about 5 terabytes used actively. The filesystem is XFS as delivered by SGI, which wrote it. I do not believe that we have any problems with unreliable disk storage. First, no other users have complained about problems and we have a lot more in use than what PostgreSQL is using. Second, the storage system is an enterprise class Fibre Channel dual controller RAID system designed for high redundancy and reliability. It has no single points of failure. We've been using it for over a year with no problems. We have about 14 active databases, and I loaded all 14 simultaneously. No errors were noted during the load, but upon vacuuming all the databases, one of the databases encountered the following message: INFO: vacuuming "public.relationships" vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR: invalid page header in block 4763 of relation "relationships" There may be others with problems, but vacuumdb quit after this error. I downloaded pg_filedump and I ran it on the file containing this relation specifying a range covering a block around the erroneous block. The two blocks around the bad block have data as I would have expected for the "relationships" table, but the bad block has data from a table in another database. Here is part of the pg_filedump output: ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 367457 * Options used: -f -R 4763 4763 * * Dump created on: Wed Aug 4 19:47:46 2004 ******************************************************************* Block 4763 ******************************************************** <Header> ----- Block Offset: 0x094d8000 Offsets: Lower 0 (0x0000) Block: Size 0 Version 0 Upper 61440 (0xf000) LSN: logid 118874 recoff 0x0000000d Special 25476 (0x6384) Items: 0 Free Space: 61440 Length (including item array): 24 Error: Invalid header information. 0000: 5ad00100 0d000000 22000000 000000f0 Z......."....... 0010: 84630000 00000000 .c...... <Data> ------ Empty block - no items listed <Special Section> ----- Error: Invalid special section encountered. 6384: 32343433 38320000 a9270000 ab270000 244382...'...'.. 6394: 00000000 01000000 00000000 1edbab73 ...............s 63a4: 0e8f3ba6 22000000 40e3ffef 22000000 ..;."...@..."... 63b4: 68e2ffef 020a0000 b400000a fdb70500 h............... 63c4: bbc30500 08008f6e ae001200 02081800 .......n........ 63d4: 0e000000 52313031 5f343438 38340000 ....R101_44884.. 63e4: 15000000 15000000 4e545f30 31303839 ........NT_01089 63f4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... 6404: 0f000000 70646231 63686b2e 412e2d00 ....pdb1chk.A.-. 6414: ee000000 00000000 48e17a14 ae470340 ........H.z..G.@ 6424: 295c8fc2 f5280640 c3f5285c 8fc20b40 )\...(.@..(\...@ 6434: 3d0ad7a3 703d1340 0d000000 7f000000 =...p=.@........ 6444: 06819543 8b6c0640 d7a3703d 0a571040 ...C.l.@..p=.W.@ 6454: 91b8c7d2 87e62640 00000000 0078ca40 ......&@.....x.@ 6464: 00000000 00000000 00000000 002062c0 ............. b. 6474: 00000000 e5fd877a 720918a8 22000000 .......zr..."... 6484: a06300f0 22000000 a06300f0 020a0000 .c.."....c...... 6494: b400800a fdb70500 bbc30500 0800906e ...............n 64a4: 01001200 02081800 0e000000 52313031 ............R101 64b4: 5f343438 38340000 15000000 15000000 _44884.......... 64c4: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se 64d4: 63000000 91000000 0f000000 70646231 c...........pdb1 64e4: 63686d2e 422e2d00 91010000 00000000 chm.B.-......... 64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40 .Q.....@=...p=.@ 6504: 52b81e85 eb511140 b81e85eb 51b81a40 R....Q.@....Q..@ 6514: 13000000 6c000000 e7fba9f1 d24d0d40 ....l........M.@ 6524: 52b81e85 ebd11740 7940d994 2bd03540 R......@y@..+.5@ 6534: 00000000 0043bd40 00000000 00000000 .....C.@........ 6544: 00000000 00c068c0 00000000 f7d17b03 ......h.......{. 6554: 08edd30d 22000000 786400f0 22000000 ...."...xd.."... 6564: 786400f0 020a0000 b400000a fdb70500 xd.............. 6574: bbc30500 0800906e 02001200 02081800 .......n........ 6584: 0e000000 52313031 5f343438 38340000 ....R101_44884.. 6594: 15000000 15000000 4e545f30 31303839 ........NT_01089 65a4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... 65b4: 0f000000 70646231 6369342e 412e2d00 ....pdb1ci4.A.-. 65c4: 59000000 00000000 3d0ad7a3 703df63f Y.......=...p=.? 65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40 ...Q...@..(\...@ 65e4: 33333333 33331840 12000000 54000000 333333.@....T... 65f4: 06819543 8b6c0640 cdcccccc cc4c1540 ...C.l.@.....L.@ 6604: c3d32b65 19da2d40 00000000 0033be40 ..+e..-@.....3.@ 6614: 00000000 00000000 00000000 00406e40 .............@n@ 6624: 00000000 c61e23a3 820d4664 22000000 ......#...Fd"... 6634: 506500f0 22000000 506500f0 020a0000 Pe.."...Pe...... 6644: b400000a fdb70500 bbc30500 0800906e ...............n 6654: 03001200 02081800 0e000000 52313031 ............R101 6664: 5f343438 38340000 15000000 15000000 _44884.......... 6674: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se 6684: 63000000 91000000 0f000000 70646231 c...........pdb1 6694: 6369642e 2d2e2d00 b1000000 00000000 cid.-.-......... <truncated> In block 4763, there is data from another database named proceryon in the 14 that I loaded simultaneously. If this were an disk I/O error, then I would not have expected to see tuples from another database. I'd expect gibberish or nulls. I ran a vacuumdb on the table in proceryon that had data above, and there is no error. However, other tables in the proceryon database have invalid page headers. Here is another example: > pg_filedump -d -R 18311 18311 379598.3 ******************************************************************* * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 * * File: 379598.3 * Options used: -d -R 18311 18311 * * Dump created on: Thu Aug 5 16:18:39 2004 ******************************************************************* Block 18311 ******************************************************** 0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll 0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll <truncated -- all the same> *** End of Requested Range Encountered. Last Block Read: 18311 *** Please describe a way to repeat the problem. Please try to provide a concise reproducible example, if at all possible: ---------------------------------------------------------------------- I have been trying to use the test case of Hubert Froehlich, http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php, but they do not generate any errors on our system. Only these big loads cause it. If you know how this problem might be fixed, list the solution below: --------------------------------------------------------------------- I am willing to be the hands of any PostgreSQL developer to explore this problem. The system is not in production, so I can make changes at will.
pgsql-bugs by date: