Re: BUG #1208: Invalid page header - Mailing list pgsql-bugs
From | Bruce Momjian |
---|---|
Subject | Re: BUG #1208: Invalid page header |
Date | |
Msg-id | 200408161556.i7GFuUd18101@candle.pha.pa.us Whole thread Raw |
In response to | BUG #1208: Invalid page header ("PostgreSQL Bugs List" <pgsql-bugs@postgresql.org>) |
Responses |
Re: BUG #1208: Invalid page header
|
List | pgsql-bugs |
If you are sure your storage and memory are good, I can think of only two other ideas. One is a gcc bug. You are running Itanium so it is possible. The only other possibility I can think of is that that our ia64 assembler code is wrong. It is: static __inline__ int tas(volatile slock_t *lock) { long int ret; __asm__ __volatile__( " xchg4 %0=%1,%2 \n" : "=r"(ret), "+m"(*lock) : "r"(1) : "memory"); return (int) ret; } It is possible we don't have this working properly on ia64 SMP machines. Again, these are only guesses but this is all I can think of. We have no other reports of such failures _except_ for hardware problems. You can try 8.0 beta1 and see if that helps. I do see the assembly code is sligtly modified from the 7.4.X release. It might be significant, but I doubt it. --------------------------------------------------------------------------- PostgreSQL Bugs List wrote: > > The following bug has been logged online: > > Bug reference: 1208 > Logged by: Robert E Bruccoleri > > Email address: bruc@stone.congenomics.com > > PostgreSQL version: 7.4 > > Operating system: Linux Advanced Server 2.1 and SGI ProPack 2.4 > > Description: Invalid page header > > Details: > > ============================================================================ > POSTGRESQL BUG REPORT TEMPLATE > ============================================================================ > > > Your name : Robert Bruccoleri > Your email address : bruc@acm.org > > > System Configuration > --------------------- > Architecture (example: Intel Pentium) : Intel Itanium 2 > > Operating System (example: Linux 2.4.18) : Linux 2.4.21 (SGI > Propack 2.4 patch 10074) > > PostgreSQL version (example: PostgreSQL-7.4.3): PostgreSQL-7.4.3 > > Compiler used (example: gcc 2.95.2) : Intel C compiler version > 8.0 > > > Please enter a FULL description of your problem: > ------------------------------------------------ > > I am getting sporadic invalid page header errors when loading or > vacuuming databases in parallel. We are in the process of migrating > from an SGI Origin 3000 running PostgreSQL 7.4 to an SGI Altix running > PostgreSQL 7.4.3. The Altix system has 64 processors with 256 > gigabytes of RAM. PostgreSQL was built using a 32K blocksize, and we > start the system with a buffer cache of 130000 pages. Fdatasync is > used for synchronization. We use an LSI Logic storage system to store > the PostgreSQL databases as well as for much of our departments data, > and we have about 5 terabytes used actively. The filesystem is XFS as > delivered by SGI, which wrote it. > > I do not believe that we have any problems with unreliable disk > storage. First, no other users have complained about problems and we > have a lot more in use than what PostgreSQL is using. Second, the > storage system is an enterprise class Fibre Channel dual controller > RAID system designed for high redundancy and reliability. It has no > single points of failure. We've been using it for over a year with no > problems. > > We have about 14 active databases, and I loaded all 14 simultaneously. No > errors were noted during the load, but upon vacuuming all the databases, > one of the databases encountered the following message: > > INFO: vacuuming "public.relationships" > vacuumdb: vacuuming of database "human_genome_042003" failed: ERROR: > invalid page header in block 4763 of relation "relationships" > > There may be others with problems, but vacuumdb quit after this error. > > I downloaded pg_filedump and I ran it on the file containing this > relation specifying a range covering a block around the erroneous > block. The two blocks around the bad block have data as I would have > expected for the "relationships" table, but the bad block has data from > a table in another database. > > Here is part of the pg_filedump output: > > ******************************************************************* > * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 > * > * File: 367457 > * Options used: -f -R 4763 4763 > * > * Dump created on: Wed Aug 4 19:47:46 2004 > ******************************************************************* > > Block 4763 ******************************************************** > <Header> ----- > Block Offset: 0x094d8000 Offsets: Lower 0 (0x0000) > Block: Size 0 Version 0 Upper 61440 (0xf000) > LSN: logid 118874 recoff 0x0000000d Special 25476 (0x6384) > Items: 0 Free Space: 61440 > Length (including item array): 24 > > Error: Invalid header information. > > 0000: 5ad00100 0d000000 22000000 000000f0 Z......."....... > 0010: 84630000 00000000 .c...... > > <Data> ------ > Empty block - no items listed > > <Special Section> ----- > Error: Invalid special section encountered. > 6384: 32343433 38320000 a9270000 ab270000 244382...'...'.. > 6394: 00000000 01000000 00000000 1edbab73 ...............s > 63a4: 0e8f3ba6 22000000 40e3ffef 22000000 ..;."...@..."... > 63b4: 68e2ffef 020a0000 b400000a fdb70500 h............... > 63c4: bbc30500 08008f6e ae001200 02081800 .......n........ > 63d4: 0e000000 52313031 5f343438 38340000 ....R101_44884.. > 63e4: 15000000 15000000 4e545f30 31303839 ........NT_01089 > 63f4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... > 6404: 0f000000 70646231 63686b2e 412e2d00 ....pdb1chk.A.-. > 6414: ee000000 00000000 48e17a14 ae470340 ........H.z..G.@ > 6424: 295c8fc2 f5280640 c3f5285c 8fc20b40 )\...(.@..(\...@ > 6434: 3d0ad7a3 703d1340 0d000000 7f000000 =...p=.@........ > 6444: 06819543 8b6c0640 d7a3703d 0a571040 ...C.l.@..p=.W.@ > 6454: 91b8c7d2 87e62640 00000000 0078ca40 ......&@.....x.@ > 6464: 00000000 00000000 00000000 002062c0 ............. b. > 6474: 00000000 e5fd877a 720918a8 22000000 .......zr..."... > 6484: a06300f0 22000000 a06300f0 020a0000 .c.."....c...... > 6494: b400800a fdb70500 bbc30500 0800906e ...............n > 64a4: 01001200 02081800 0e000000 52313031 ............R101 > 64b4: 5f343438 38340000 15000000 15000000 _44884.......... > 64c4: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se > 64d4: 63000000 91000000 0f000000 70646231 c...........pdb1 > 64e4: 63686d2e 422e2d00 91010000 00000000 chm.B.-......... > 64f4: ec51b81e 85eb0940 3d0ad7a3 703d0a40 .Q.....@=...p=.@ > 6504: 52b81e85 eb511140 b81e85eb 51b81a40 R....Q.@....Q..@ > 6514: 13000000 6c000000 e7fba9f1 d24d0d40 ....l........M.@ > 6524: 52b81e85 ebd11740 7940d994 2bd03540 R......@y@..+.5@ > 6534: 00000000 0043bd40 00000000 00000000 .....C.@........ > 6544: 00000000 00c068c0 00000000 f7d17b03 ......h.......{. > 6554: 08edd30d 22000000 786400f0 22000000 ...."...xd.."... > 6564: 786400f0 020a0000 b400000a fdb70500 xd.............. > 6574: bbc30500 0800906e 02001200 02081800 .......n........ > 6584: 0e000000 52313031 5f343438 38340000 ....R101_44884.. > 6594: 15000000 15000000 4e545f30 31303839 ........NT_01089 > 65a4: 335f6735 352e7365 63000000 91000000 3_g55.sec....... > 65b4: 0f000000 70646231 6369342e 412e2d00 ....pdb1ci4.A.-. > 65c4: 59000000 00000000 3d0ad7a3 703df63f Y.......=...p=.? > 65d4: 1f85eb51 b81e0f40 c3f5285c 8fc20b40 ...Q...@..(\...@ > 65e4: 33333333 33331840 12000000 54000000 333333.@....T... > 65f4: 06819543 8b6c0640 cdcccccc cc4c1540 ...C.l.@.....L.@ > 6604: c3d32b65 19da2d40 00000000 0033be40 ..+e..-@.....3.@ > 6614: 00000000 00000000 00000000 00406e40 .............@n@ > 6624: 00000000 c61e23a3 820d4664 22000000 ......#...Fd"... > 6634: 506500f0 22000000 506500f0 020a0000 Pe.."...Pe...... > 6644: b400000a fdb70500 bbc30500 0800906e ...............n > 6654: 03001200 02081800 0e000000 52313031 ............R101 > 6664: 5f343438 38340000 15000000 15000000 _44884.......... > 6674: 4e545f30 31303839 335f6735 352e7365 NT_010893_g55.se > 6684: 63000000 91000000 0f000000 70646231 c...........pdb1 > 6694: 6369642e 2d2e2d00 b1000000 00000000 cid.-.-......... > > <truncated> > > In block 4763, there is data from another database named proceryon in > the 14 that I loaded simultaneously. If this were an disk I/O error, > then I would not have expected to see tuples from another > database. I'd expect gibberish or nulls. > > I ran a vacuumdb on the table in proceryon that had data above, and > there is no error. However, other tables in the proceryon database > have invalid page headers. Here is another example: > > > pg_filedump -d -R 18311 18311 379598.3 > > ******************************************************************* > * PostgreSQL File/Block Formatted Dump Utility - Version 3.0 > * > * File: 379598.3 > * Options used: -d -R 18311 18311 > * > * Dump created on: Thu Aug 5 16:18:39 2004 > ******************************************************************* > > Block 18311 ******************************************************** > 0000: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0010: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0020: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0030: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0040: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0050: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0060: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > 0070: 6c6c6c6c 6c6c6c6c 6c6c6c6c 6c6c6c6c llllllllllllllll > > <truncated -- all the same> > > *** End of Requested Range Encountered. Last Block Read: 18311 *** > > > Please describe a way to repeat the problem. Please try to provide a > concise reproducible example, if at all possible: > ---------------------------------------------------------------------- > > I have been trying to use the test case of Hubert Froehlich, > http://archives.postgresql.org/pgsql-general/2004-07/msg00670.php, > but they do not generate any errors on our system. Only these big > loads cause it. > > If you know how this problem might be fixed, list the solution below: > --------------------------------------------------------------------- > > I am willing to be the hands of any PostgreSQL developer to explore > this problem. The system is not in production, so I can make changes > at will. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania 19073
pgsql-bugs by date: