mailadministrator@canada.com (Anthony) writes:
> Our Database is having errors. We are currently using PostgreSQL to
> store 2.5 Million records per day. The average addition to our primary
> table is 4.5 Gigs of data.
>
> We are doing this on a dual Opteron 244 system with 1 TeraByte of HDD
> space. The drives are 250 Gig Western Digital. The Raid Controller is
> LSI Logic MegaRaid 150-6.
>
> We are getting an error after about 4-5 days worth of data being put
> into the system.
>
> *******************************************************
> ERROR: invalid page header in block 59305 of relation
> "item_info_2004_04_leaf_category_1"
> *******************************************************
>
> Our Base Server Configuration is as follows.
> PostgreSQL Version= 7.4.2
> x86_64-PC-Linux-GNU
> Compiled with GCC 3.3.3
> XFS File System
> Running on Gentoo Linux 3.3.3 Propolice-3.3-7
>
> Any help on how to solve this probelm would be extremely appreciated.
>
> Even the potential that Tom Lane might respond to this is worth it.
May I point you to the pg_filedump utility?
<http://sources.redhat.com/rhdb/utilities.html>
It can give you a fair idea of just where the system is blowing up.
I experienced what sounds like the same problem with a system that was
fairly similarly appointed with hardware, albeit with a few
conspicuous differences...
1. PostgreSQL 7.4.1
2. FreeBSD 4.9
3. Berkeley FFS with soft updates
4. Quad-Xeon, 8GB RAM (only using 4GB of it :-()
5. AMI MegaRaid controller...
6. Slightly less disk; 12x74GB SCSI drives
[root@hathi scsi]# cat /proc/scsi/megaraid/1
LSI Logic MegaRAID 1.74 254 commands 16 targs 7 chans 7 luns
What I found in looking at the page with the "invalid page header" was
that it was full of ASCII NUL values.
We had previously had quite a bit of trouble with a different box with
the same hardware configuration running RHAT 7.3, although when I
replaced a 2.4.18 Linux kernel with 2.6.2, those problems evaporated.
The only thing that we have been able to point to on the box in
question is a hardware problem. In view of the disk being RAIDed, the
causes seem to fall to three things being most likely sorts of
culprits:
1. Perhaps the controller is "glitched;"
2. Perhaps the controller driver is "glitched;"
3. Perhaps there is a RAM problem.
Notice that the list of suspects doesn't include any that actually
relate to database software.
Your best bet is to look for hardware problems.
--
(reverse (concatenate 'string "gro.gultn" "@" "enworbbc"))
http://cbbrowne.com/info/linuxxian.html
Never take life seriously. Nobody gets out alive anyway.