BUG #1208: Invalid page header - Mailing list pgsql-bugs

From PostgreSQL Bugs List
Subject BUG #1208: Invalid page header
Date
Msg-id 20040810134538.02E995A1105@www.postgresql.com
Whole thread Raw
Responses Re: BUG #1208: Invalid page header
List pgsql-bugs
The following bug has been logged online:

Bug reference:      1208
Logged by:          Robert E Bruccoleri

Email address:      bruc@stone.congenomics.com

PostgreSQL version: 7.4

Operating system:   Linux Advanced Server 2.1 and SGI ProPack 2.4

Description:        Invalid page header

Details:

============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name               :       Robert Bruccoleri
Your email address      :       bruc@acm.org


System Configuration
---------------------
  Architecture (example: Intel Pentium)         :   Intel Itanium 2

  Operating System (example: Linux 2.4.18)      :   Linux 2.4.21 (SGI
Propack 2.4 patch 10074)

  PostgreSQL version (example: PostgreSQL-7.4.3):   PostgreSQL-7.4.3

  Compiler used (example:  gcc 2.95.2)          :   Intel C compiler version
8.0


Please enter a FULL description of your problem:
------------------------------------------------

I am getting sporadic invalid page header errors when loading or
vacuuming databases in parallel. We are in the process of migrating
from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running
PostgreSQL 7.4.3.  The Altix system has 64 processors with 256
gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we
start the system with a buffer cache of 130000 pages.  Fdatasync is
used for synchronization. We use an LSI Logic storage system to store
the PostgreSQL databases as well as for much of our departments data,
and we have about 5 terabytes used actively. The filesystem is XFS as
delivered by SGI, which wrote it.

I do not believe that we have any problems with unreliable disk
storage.  First, no other users have complained about problems and we
have a lot more in use than what PostgreSQL is using. Second, the
storage system is an enterprise class Fibre Channel dual controller
RAID system designed for high redundancy and reliability.  It has no
single points of failure. We've been using it for over a year with no
problems.

We have about 14 active databases, and I loaded all 14 simultaneously. No
errors were noted during the load, but upon vacuuming all the databases,
one of the databases encountered the following message:

INFO:  vacuuming "public.relationships"
vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR:
invalid page header in block 4763 of relation "relationships"

There may be others with problems, but vacuumdb quit after this error.

I downloaded pg_filedump and I ran it on the file containing this
relation specifying a range covering a block around the erroneous
block. The two blocks around the bad block have data as I would have
expected for the "relationships" table, but the bad block has data from
a table in another database.

Here is part of the pg_filedump output:

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0
*
* File: 367457
* Options used: -f -R 4763 4763
*
* Dump created on: Wed Aug  4 19:47:46 2004
*******************************************************************

Block 4763 ********************************************************
<Header> -----
 Block Offset: 0x094d8000         Offsets: Lower       0 (0x0000)
 Block: Size    0  Version    0            Upper    61440 (0xf000)
 LSN:  logid 118874 recoff 0x0000000d      Special  25476 (0x6384)
 Items:    0                   Free Space: 61440
 Length (including item array): 24

 Error: Invalid header information.

  0000: 5ad00100 0d000000 22000000 000000f0  Z.......".......
  0010: 84630000 00000000                    .c......

<Data> ------
 Empty block - no items listed

<Special Section> -----
 Error: Invalid special section encountered.
  6384: 32343433 38320000 a9270000 ab270000  244382...'...'..
  6394: 00000000 01000000 00000000 1edbab73  ...............s
  63a4: 0e8f3ba6 22000000 40e3ffef 22000000  ..;."...@..."...
  63b4: 68e2ffef 020a0000 b400000a fdb70500  h...............
  63c4: bbc30500 08008f6e ae001200 02081800  .......n........
  63d4: 0e000000 52313031 5f343438 38340000  ....R101_44884..
  63e4: 15000000 15000000 4e545f30 31303839  ........NT_01089
  63f4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
  6404: 0f000000 70646231 63686b2e 412e2d00  ....pdb1chk.A.-.
  6414: ee000000 00000000 48e17a14 ae470340  ........H.z..G.@
  6424: 295c8fc2 f5280640 c3f5285c 8fc20b40  )\...(.@..(\...@
  6434: 3d0ad7a3 703d1340 0d000000 7f000000  =...p=.@........
  6444: 06819543 8b6c0640 d7a3703d 0a571040  ...C.l.@..p=.W.@
  6454: 91b8c7d2 87e62640 00000000 0078ca40  ......&@.....x.@
  6464: 00000000 00000000 00000000 002062c0  ............. b.
  6474: 00000000 e5fd877a 720918a8 22000000  .......zr..."...
  6484: a06300f0 22000000 a06300f0 020a0000  .c.."....c......
  6494: b400800a fdb70500 bbc30500 0800906e  ...............n
  64a4: 01001200 02081800 0e000000 52313031  ............R101
  64b4: 5f343438 38340000 15000000 15000000  _44884..........
  64c4: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
  64d4: 63000000 91000000 0f000000 70646231  c...........pdb1
  64e4: 63686d2e 422e2d00 91010000 00000000  chm.B.-.........
  64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40  .Q.....@=...p=.@
  6504: 52b81e85 eb511140 b81e85eb 51b81a40  R....Q.@....Q..@
  6514: 13000000 6c000000 e7fba9f1 d24d0d40  ....l........M.@
  6524: 52b81e85 ebd11740 7940d994 2bd03540  R......@y@..+.5@
  6534: 00000000 0043bd40 00000000 00000000  .....C.@........
  6544: 00000000 00c068c0 00000000 f7d17b03  ......h.......{.
  6554: 08edd30d 22000000 786400f0 22000000  ...."...xd.."...
  6564: 786400f0 020a0000 b400000a fdb70500  xd..............
  6574: bbc30500 0800906e 02001200 02081800  .......n........
  6584: 0e000000 52313031 5f343438 38340000  ....R101_44884..
  6594: 15000000 15000000 4e545f30 31303839  ........NT_01089
  65a4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
  65b4: 0f000000 70646231 6369342e 412e2d00  ....pdb1ci4.A.-.
  65c4: 59000000 00000000 3d0ad7a3 703df63f  Y.......=...p=.?
  65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40  ...Q...@..(\...@
  65e4: 33333333 33331840 12000000 54000000  333333.@....T...
  65f4: 06819543 8b6c0640 cdcccccc cc4c1540  ...C.l.@.....L.@
  6604: c3d32b65 19da2d40 00000000 0033be40  ..+e..-@.....3.@
  6614: 00000000 00000000 00000000 00406e40  .............@n@
  6624: 00000000 c61e23a3 820d4664 22000000  ......#...Fd"...
  6634: 506500f0 22000000 506500f0 020a0000  Pe.."...Pe......
  6644: b400000a fdb70500 bbc30500 0800906e  ...............n
  6654: 03001200 02081800 0e000000 52313031  ............R101
  6664: 5f343438 38340000 15000000 15000000  _44884..........
  6674: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
  6684: 63000000 91000000 0f000000 70646231  c...........pdb1
  6694: 6369642e 2d2e2d00 b1000000 00000000  cid.-.-.........

<truncated>

In block 4763, there is data from another database named proceryon in
the 14 that I loaded simultaneously. If this were an disk I/O error,
then I would not have expected to see tuples from another
database. I'd expect gibberish or nulls.

I ran a vacuumdb on the table in proceryon that had data above, and
there is no error. However, other tables in the proceryon database
have invalid page headers. Here is another example:

> pg_filedump -d -R 18311 18311 379598.3

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0
*
* File: 379598.3
* Options used: -d -R 18311 18311
*
* Dump created on: Thu Aug  5 16:18:39 2004
*******************************************************************

Block 18311 ********************************************************
  0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll

<truncated -- all the same>

*** End of Requested Range Encountered. Last Block Read: 18311 ***


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

I have been trying to use the test case of Hubert Froehlich,
http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php,
but they do not generate any errors on our system. Only these big
loads cause it.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

I am willing to be the hands of any PostgreSQL developer to explore
this problem. The system is not in production, so I can make changes
at will.
+-----------------------------+------------------------------------+
| Robert E. Bruccoleri, Ph.D. | email: bruc@acm.org                |
| President, Congenair LLC    | URL:   http://www.congen.com/~bruc |
| P.O. Box 314                | Phone: 609 818 7251                |
Command: Quit


                               Folder unchanged.
stone bruc 2 >>cat foo.foo.foo.invalid
From bruc Sun Aug  8 20:18:49 2004
Subject: Invalid page header errors in PostgreSQL 7.4.3
To: pgsql-bugs@postgresql.org
Date: Sun, 8 Aug 2004 20:18:49 -0400 (EDT)
Cc: hubert.froehlich@bvv.bayern.de, tgl@sss.pgh.pa.us
Reply-To: bruc@stone.congen.com
X-Mailer: ELM [version 2.4 PL25 ME8b]
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Content-Length: 9173
Status: RO
X-Archive-Number: 200408/40

============================================================================
                        POSTGRESQL BUG REPORT TEMPLATE
============================================================================


Your name               :       Robert Bruccoleri
Your email address      :       bruc@acm.org


System Configuration
---------------------
  Architecture (example: Intel Pentium)         :   Intel Itanium 2

  Operating System (example: Linux 2.4.18)      :   Linux 2.4.21 (SGI
Propack 2.4 patch 10074)

  PostgreSQL version (example: PostgreSQL-7.4.3):   PostgreSQL-7.4.3

  Compiler used (example:  gcc 2.95.2)          :   Intel C compiler version
8.0


Please enter a FULL description of your problem:
------------------------------------------------

I am getting sporadic invalid page header errors when loading or
vacuuming databases in parallel. We are in the process of migrating
from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running
PostgreSQL 7.4.3.  The Altix system has 64 processors with 256
gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we
start the system with a buffer cache of 130000 pages.  Fdatasync is
used for synchronization. We use an LSI Logic storage system to store
the PostgreSQL databases as well as for much of our departments data,
and we have about 5 terabytes used actively. The filesystem is XFS as
delivered by SGI, which wrote it.

I do not believe that we have any problems with unreliable disk
storage.  First, no other users have complained about problems and we
have a lot more in use than what PostgreSQL is using. Second, the
storage system is an enterprise class Fibre Channel dual controller
RAID system designed for high redundancy and reliability.  It has no
single points of failure. We've been using it for over a year with no
problems.

We have about 14 active databases, and I loaded all 14 simultaneously. No
errors were noted during the load, but upon vacuuming all the databases,
one of the databases encountered the following message:

INFO:  vacuuming "public.relationships"
vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR:
invalid page header in block 4763 of relation "relationships"

There may be others with problems, but vacuumdb quit after this error.

I downloaded pg_filedump and I ran it on the file containing this
relation specifying a range covering a block around the erroneous
block. The two blocks around the bad block have data as I would have
expected for the "relationships" table, but the bad block has data from
a table in another database.

Here is part of the pg_filedump output:

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0
*
* File: 367457
* Options used: -f -R 4763 4763
*
* Dump created on: Wed Aug  4 19:47:46 2004
*******************************************************************

Block 4763 ********************************************************
<Header> -----
 Block Offset: 0x094d8000         Offsets: Lower       0 (0x0000)
 Block: Size    0  Version    0            Upper    61440 (0xf000)
 LSN:  logid 118874 recoff 0x0000000d      Special  25476 (0x6384)
 Items:    0                   Free Space: 61440
 Length (including item array): 24

 Error: Invalid header information.

  0000: 5ad00100 0d000000 22000000 000000f0  Z.......".......
  0010: 84630000 00000000                    .c......

<Data> ------
 Empty block - no items listed

<Special Section> -----
 Error: Invalid special section encountered.
  6384: 32343433 38320000 a9270000 ab270000  244382...'...'..
  6394: 00000000 01000000 00000000 1edbab73  ...............s
  63a4: 0e8f3ba6 22000000 40e3ffef 22000000  ..;."...@..."...
  63b4: 68e2ffef 020a0000 b400000a fdb70500  h...............
  63c4: bbc30500 08008f6e ae001200 02081800  .......n........
  63d4: 0e000000 52313031 5f343438 38340000  ....R101_44884..
  63e4: 15000000 15000000 4e545f30 31303839  ........NT_01089
  63f4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
  6404: 0f000000 70646231 63686b2e 412e2d00  ....pdb1chk.A.-.
  6414: ee000000 00000000 48e17a14 ae470340  ........H.z..G.@
  6424: 295c8fc2 f5280640 c3f5285c 8fc20b40  )\...(.@..(\...@
  6434: 3d0ad7a3 703d1340 0d000000 7f000000  =...p=.@........
  6444: 06819543 8b6c0640 d7a3703d 0a571040  ...C.l.@..p=.W.@
  6454: 91b8c7d2 87e62640 00000000 0078ca40  ......&@.....x.@
  6464: 00000000 00000000 00000000 002062c0  ............. b.
  6474: 00000000 e5fd877a 720918a8 22000000  .......zr..."...
  6484: a06300f0 22000000 a06300f0 020a0000  .c.."....c......
  6494: b400800a fdb70500 bbc30500 0800906e  ...............n
  64a4: 01001200 02081800 0e000000 52313031  ............R101
  64b4: 5f343438 38340000 15000000 15000000  _44884..........
  64c4: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
  64d4: 63000000 91000000 0f000000 70646231  c...........pdb1
  64e4: 63686d2e 422e2d00 91010000 00000000  chm.B.-.........
  64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40  .Q.....@=...p=.@
  6504: 52b81e85 eb511140 b81e85eb 51b81a40  R....Q.@....Q..@
  6514: 13000000 6c000000 e7fba9f1 d24d0d40  ....l........M.@
  6524: 52b81e85 ebd11740 7940d994 2bd03540  R......@y@..+.5@
  6534: 00000000 0043bd40 00000000 00000000  .....C.@........
  6544: 00000000 00c068c0 00000000 f7d17b03  ......h.......{.
  6554: 08edd30d 22000000 786400f0 22000000  ...."...xd.."...
  6564: 786400f0 020a0000 b400000a fdb70500  xd..............
  6574: bbc30500 0800906e 02001200 02081800  .......n........
  6584: 0e000000 52313031 5f343438 38340000  ....R101_44884..
  6594: 15000000 15000000 4e545f30 31303839  ........NT_01089
  65a4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
  65b4: 0f000000 70646231 6369342e 412e2d00  ....pdb1ci4.A.-.
  65c4: 59000000 00000000 3d0ad7a3 703df63f  Y.......=...p=.?
  65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40  ...Q...@..(\...@
  65e4: 33333333 33331840 12000000 54000000  333333.@....T...
  65f4: 06819543 8b6c0640 cdcccccc cc4c1540  ...C.l.@.....L.@
  6604: c3d32b65 19da2d40 00000000 0033be40  ..+e..-@.....3.@
  6614: 00000000 00000000 00000000 00406e40  .............@n@
  6624: 00000000 c61e23a3 820d4664 22000000  ......#...Fd"...
  6634: 506500f0 22000000 506500f0 020a0000  Pe.."...Pe......
  6644: b400000a fdb70500 bbc30500 0800906e  ...............n
  6654: 03001200 02081800 0e000000 52313031  ............R101
  6664: 5f343438 38340000 15000000 15000000  _44884..........
  6674: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
  6684: 63000000 91000000 0f000000 70646231  c...........pdb1
  6694: 6369642e 2d2e2d00 b1000000 00000000  cid.-.-.........

<truncated>

In block 4763, there is data from another database named proceryon in
the 14 that I loaded simultaneously. If this were an disk I/O error,
then I would not have expected to see tuples from another
database. I'd expect gibberish or nulls.

I ran a vacuumdb on the table in proceryon that had data above, and
there is no error. However, other tables in the proceryon database
have invalid page headers. Here is another example:

> pg_filedump -d -R 18311 18311 379598.3

*******************************************************************
* PostgreSQL File/Block Formatted Dump Utility - Version 3.0
*
* File: 379598.3
* Options used: -d -R 18311 18311
*
* Dump created on: Thu Aug  5 16:18:39 2004
*******************************************************************

Block 18311 ********************************************************
  0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
  0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll

<truncated -- all the same>

*** End of Requested Range Encountered. Last Block Read: 18311 ***


Please describe a way to repeat the problem.   Please try to provide a
concise reproducible example, if at all possible:
----------------------------------------------------------------------

I have been trying to use the test case of Hubert Froehlich,
http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php,
but they do not generate any errors on our system. Only these big
loads cause it.

If you know how this problem might be fixed, list the solution below:
---------------------------------------------------------------------

I am willing to be the hands of any PostgreSQL developer to explore
this problem. The system is not in production, so I can make changes
at will.

pgsql-bugs by date:

Previous
From: Theodore Petrosky
Date:
Subject: OSX problem with make check...
Next
From: Fabien COELHO
Date:
Subject: Re: Bug: century/millenium still broken