Re: BUG #1208: Invalid page header - Mailing list pgsql-bugs

From Bruce Momjian
Subject Re: BUG #1208: Invalid page header
Date
Msg-id 200408161556.i7GFuUd18101@candle.pha.pa.us
Whole thread Raw
In response to BUG #1208: Invalid page header  ("PostgreSQL Bugs List" <pgsql-bugs@postgresql.org>)
Responses Re: BUG #1208: Invalid page header
List pgsql-bugs
If you are sure your storage and memory are good, I can think of only
two other ideas.  One is a gcc bug.  You are running Itanium so it is
possible.  The only other possibility I can think of is that that our
ia64 assembler code is wrong.  It is:

    static __inline__ int
    tas(volatile slock_t *lock)
    {
        long int    ret;

        __asm__ __volatile__(
            "   xchg4   %0=%1,%2    \n"
    :       "=r"(ret), "+m"(*lock)
    :       "r"(1)
    :       "memory");
        return (int) ret;
    }

It is possible we don't have this working properly on ia64 SMP machines.

Again, these are only guesses but this is all I can think of.  We have
no other reports of such failures _except_ for hardware problems.

You can try 8.0 beta1 and see if that helps.  I do see the assembly code
is sligtly modified from the 7.4.X release.  It might be significant,
but I doubt it.

---------------------------------------------------------------------------

PostgreSQL Bugs List wrote:
>
> The following bug has been logged online:
>
> Bug reference:      1208
> Logged by:          Robert E Bruccoleri
>
> Email address:      bruc@stone.congenomics.com
>
> PostgreSQL version: 7.4
>
> Operating system:   Linux Advanced Server 2.1 and SGI ProPack 2.4
>
> Description:        Invalid page header
>
> Details:
>
> ============================================================================
>                         POSTGRESQL BUG REPORT TEMPLATE
> ============================================================================
>
>
> Your name               :       Robert Bruccoleri
> Your email address      :       bruc@acm.org
>
>
> System Configuration
> ---------------------
>   Architecture (example: Intel Pentium)         :   Intel Itanium 2
>
>   Operating System (example: Linux 2.4.18)      :   Linux 2.4.21 (SGI
> Propack 2.4 patch 10074)
>
>   PostgreSQL version (example: PostgreSQL-7.4.3):   PostgreSQL-7.4.3
>
>   Compiler used (example:  gcc 2.95.2)          :   Intel C compiler version
> 8.0
>
>
> Please enter a FULL description of your problem:
> ------------------------------------------------
>
> I am getting sporadic invalid page header errors when loading or
> vacuuming databases in parallel. We are in the process of migrating
> from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running
> PostgreSQL 7.4.3.  The Altix system has 64 processors with 256
> gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we
> start the system with a buffer cache of 130000 pages.  Fdatasync is
> used for synchronization. We use an LSI Logic storage system to store
> the PostgreSQL databases as well as for much of our departments data,
> and we have about 5 terabytes used actively. The filesystem is XFS as
> delivered by SGI, which wrote it.
>
> I do not believe that we have any problems with unreliable disk
> storage.  First, no other users have complained about problems and we
> have a lot more in use than what PostgreSQL is using. Second, the
> storage system is an enterprise class Fibre Channel dual controller
> RAID system designed for high redundancy and reliability.  It has no
> single points of failure. We've been using it for over a year with no
> problems.
>
> We have about 14 active databases, and I loaded all 14 simultaneously. No
> errors were noted during the load, but upon vacuuming all the databases,
> one of the databases encountered the following message:
>
> INFO:  vacuuming "public.relationships"
> vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR:
> invalid page header in block 4763 of relation "relationships"
>
> There may be others with problems, but vacuumdb quit after this error.
>
> I downloaded pg_filedump and I ran it on the file containing this
> relation specifying a range covering a block around the erroneous
> block. The two blocks around the bad block have data as I would have
> expected for the "relationships" table, but the bad block has data from
> a table in another database.
>
> Here is part of the pg_filedump output:
>
> *******************************************************************
> * PostgreSQL File/Block Formatted Dump Utility - Version 3.0
> *
> * File: 367457
> * Options used: -f -R 4763 4763
> *
> * Dump created on: Wed Aug  4 19:47:46 2004
> *******************************************************************
>
> Block 4763 ********************************************************
> <Header> -----
>  Block Offset: 0x094d8000         Offsets: Lower       0 (0x0000)
>  Block: Size    0  Version    0            Upper    61440 (0xf000)
>  LSN:  logid 118874 recoff 0x0000000d      Special  25476 (0x6384)
>  Items:    0                   Free Space: 61440
>  Length (including item array): 24
>
>  Error: Invalid header information.
>
>   0000: 5ad00100 0d000000 22000000 000000f0  Z.......".......
>   0010: 84630000 00000000                    .c......
>
> <Data> ------
>  Empty block - no items listed
>
> <Special Section> -----
>  Error: Invalid special section encountered.
>   6384: 32343433 38320000 a9270000 ab270000  244382...'...'..
>   6394: 00000000 01000000 00000000 1edbab73  ...............s
>   63a4: 0e8f3ba6 22000000 40e3ffef 22000000  ..;."...@..."...
>   63b4: 68e2ffef 020a0000 b400000a fdb70500  h...............
>   63c4: bbc30500 08008f6e ae001200 02081800  .......n........
>   63d4: 0e000000 52313031 5f343438 38340000  ....R101_44884..
>   63e4: 15000000 15000000 4e545f30 31303839  ........NT_01089
>   63f4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
>   6404: 0f000000 70646231 63686b2e 412e2d00  ....pdb1chk.A.-.
>   6414: ee000000 00000000 48e17a14 ae470340  ........H.z..G.@
>   6424: 295c8fc2 f5280640 c3f5285c 8fc20b40  )\...(.@..(\...@
>   6434: 3d0ad7a3 703d1340 0d000000 7f000000  =...p=.@........
>   6444: 06819543 8b6c0640 d7a3703d 0a571040  ...C.l.@..p=.W.@
>   6454: 91b8c7d2 87e62640 00000000 0078ca40  ......&@.....x.@
>   6464: 00000000 00000000 00000000 002062c0  ............. b.
>   6474: 00000000 e5fd877a 720918a8 22000000  .......zr..."...
>   6484: a06300f0 22000000 a06300f0 020a0000  .c.."....c......
>   6494: b400800a fdb70500 bbc30500 0800906e  ...............n
>   64a4: 01001200 02081800 0e000000 52313031  ............R101
>   64b4: 5f343438 38340000 15000000 15000000  _44884..........
>   64c4: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
>   64d4: 63000000 91000000 0f000000 70646231  c...........pdb1
>   64e4: 63686d2e 422e2d00 91010000 00000000  chm.B.-.........
>   64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40  .Q.....@=...p=.@
>   6504: 52b81e85 eb511140 b81e85eb 51b81a40  R....Q.@....Q..@
>   6514: 13000000 6c000000 e7fba9f1 d24d0d40  ....l........M.@
>   6524: 52b81e85 ebd11740 7940d994 2bd03540  R......@y@..+.5@
>   6534: 00000000 0043bd40 00000000 00000000  .....C.@........
>   6544: 00000000 00c068c0 00000000 f7d17b03  ......h.......{.
>   6554: 08edd30d 22000000 786400f0 22000000  ...."...xd.."...
>   6564: 786400f0 020a0000 b400000a fdb70500  xd..............
>   6574: bbc30500 0800906e 02001200 02081800  .......n........
>   6584: 0e000000 52313031 5f343438 38340000  ....R101_44884..
>   6594: 15000000 15000000 4e545f30 31303839  ........NT_01089
>   65a4: 335f6735 352e7365 63000000 91000000  3_g55.sec.......
>   65b4: 0f000000 70646231 6369342e 412e2d00  ....pdb1ci4.A.-.
>   65c4: 59000000 00000000 3d0ad7a3 703df63f  Y.......=...p=.?
>   65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40  ...Q...@..(\...@
>   65e4: 33333333 33331840 12000000 54000000  333333.@....T...
>   65f4: 06819543 8b6c0640 cdcccccc cc4c1540  ...C.l.@.....L.@
>   6604: c3d32b65 19da2d40 00000000 0033be40  ..+e..-@.....3.@
>   6614: 00000000 00000000 00000000 00406e40  .............@n@
>   6624: 00000000 c61e23a3 820d4664 22000000  ......#...Fd"...
>   6634: 506500f0 22000000 506500f0 020a0000  Pe.."...Pe......
>   6644: b400000a fdb70500 bbc30500 0800906e  ...............n
>   6654: 03001200 02081800 0e000000 52313031  ............R101
>   6664: 5f343438 38340000 15000000 15000000  _44884..........
>   6674: 4e545f30 31303839 335f6735 352e7365  NT_010893_g55.se
>   6684: 63000000 91000000 0f000000 70646231  c...........pdb1
>   6694: 6369642e 2d2e2d00 b1000000 00000000  cid.-.-.........
>
> <truncated>
>
> In block 4763, there is data from another database named proceryon in
> the 14 that I loaded simultaneously. If this were an disk I/O error,
> then I would not have expected to see tuples from another
> database. I'd expect gibberish or nulls.
>
> I ran a vacuumdb on the table in proceryon that had data above, and
> there is no error. However, other tables in the proceryon database
> have invalid page headers. Here is another example:
>
> > pg_filedump -d -R 18311 18311 379598.3
>
> *******************************************************************
> * PostgreSQL File/Block Formatted Dump Utility - Version 3.0
> *
> * File: 379598.3
> * Options used: -d -R 18311 18311
> *
> * Dump created on: Thu Aug  5 16:18:39 2004
> *******************************************************************
>
> Block 18311 ********************************************************
>   0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>   0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c  llllllllllllllll
>
> <truncated -- all the same>
>
> *** End of Requested Range Encountered. Last Block Read: 18311 ***
>
>
> Please describe a way to repeat the problem.   Please try to provide a
> concise reproducible example, if at all possible:
> ----------------------------------------------------------------------
>
> I have been trying to use the test case of Hubert Froehlich,
> http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php,
> but they do not generate any errors on our system. Only these big
> loads cause it.
>
> If you know how this problem might be fixed, list the solution below:
> ---------------------------------------------------------------------
>
> I am willing to be the hands of any PostgreSQL developer to explore
> this problem. The system is not in production, so I can make changes
> at will.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

pgsql-bugs by date:

Previous
From: Tom Lane
Date:
Subject: Re: 8.0 beta1: pg_dump/restore failing
Next
From: Bruce Momjian
Date:
Subject: Re: [PATCHES] Bug: century/millenium still broken