Thread: BUG #5055: Invalid page header error
The following bug has been logged online: Bug reference: 5055 Logged by: john martin Email address: postgres_bee@live.com PostgreSQL version: 8.3.6 Operating system: Centos 5.2 32 bit Description: Invalid page header error Details: All of a sudden we started seeing page header errors in certain queries. The messages are in the form of "ERROR: invalid page header in block xxxx of relation xxxx". The query fails. I found may previous messages in the archives. Most, if not all, replies seemed to indicate hardware errors. I have run all the disk/memory tests like fsync and memtest86 but nothing was found. I have also rebooted is multiple times . I found an unsatisfactory work around that causes, ahem, data loss. We went ahead with it anyway fortunately because the error happened in our dev environment. IOW, we could tolerate the data loss. The work around consists of adding the following parameter to postgresql.conf and restarting postgres. "zero_damaged_pages=TRUE" We no longer see the error messages with the above work around. Needless to say, the work around cannot be used in production. But the database is running on the SAME HARDWARE. Is it possible that it is a postgres bug? I found the issue reported 5 years back to my surprise. http://archives.postgresql.org/pgsql-hackers/2004-09/msg00869.php I am urging the community to investigate the possibility that it may not be hardware related, especially since it was first reported at least 5 years back. Or may be you have decided not to fix if the number of people reporting is very less. I have a very good opinion of postgres quality. While I am not 100% sure it is a bug (only circumstantial evidence), I do think it improves the product quality to fix an annoying old bug.
On Mon, 2009-09-14 at 23:17 +0000, john martin wrote: > All of a sudden we started seeing page header errors in certain queries. Was there any particular event that marked the onset of these issues? Anything in the system logs (dmesg / syslog etc) around that time? [for SATA disks]: does smartctl from the smartmontools package indicate anything interesting about the disk(s)? (Ignore the "health status", it's a foul lie, and rely on the error log plus the vendor attributes: reallocated sector count, pending sector, uncorrectable sector count, etc). Was Pg forcibly killed and restarted, or the machine hard-reset? (This _shouldn't_ cause data corruption, but might give some starting point for looking for a bug). > I am urging the community to investigate the possibility that it may not be > hardware related, especially since it was first reported at least 5 years > back. If anything, the fact that it was first reported 5 years back makes it _more_ likely to be hardware related. Bad hardware eats/scrambles some of your data; Pg goes "oh crap, that page is garbage". People aren't constantly getting their data eaten, though, despite the age of the initial reports. It's not turning up lots. It's not turning up in cases where hardware issues can be ruled out. There doesn't seem to be a strong pattern associating issues to a particular CPU / disk controller / drive etc to suggest it could be Pg triggering a hardware bug or a bug in Pg triggered by a hardware quirk. It doesn't seem to be reproducible and people generally don't seem to be able to trigger the issue repeatedly. Either it's a *really* rare and quirky bug that's really hard to trigger, or it's a variety of hardware / disk issues. If it's a really rare and quirky hard to trigger bug, where do you even start looking without *some* idea what happened to trigger the issue? Do you have any idea what might've started it in your case? *** DID YOU TAKE COPIES OF YOUR DATA FILES BEFORE "FIXING" THEM *** ? -- Craig Ringer
Craig Ringer wrote: > [for SATA disks]: does smartctl from the smartmontools package indicate > anything interesting about the disk(s)? (Ignore the "health status", > it's a foul lie, and rely on the error log plus the vendor attributes: > reallocated sector count, pending sector, uncorrectable sector count, > etc). > and, if you're doing RAID with desktop grade disks, its quite possible for the drive to spontaneously decide a sector error requires a data relocation but not have the 'good' data to relocate, and not return an error code in time for the RAID controller or host md-raid to do anything about it. this results in a very sneaky sort of data corruption which goes undetected until some time later. this is the primary reason to use the premium "ES" grade SATA drives rather than the cheaper desktop stuff in a raid, they return sector errors in a timely fashion rather than retrying for many minutes in the background.
On Mon, 2009-09-14 at 22:58 -0700, John R Pierce wrote: > and, if you're doing RAID with desktop grade disks, its quite possible > for the drive to spontaneously decide a sector error requires a data > relocation but not have the 'good' data to relocate, and not return an > error code in time for the RAID controller or host md-raid to do > anything about it. this results in a very sneaky sort of data > corruption which goes undetected until some time later. > > > this is the primary reason to use the premium "ES" grade SATA drives > rather than the cheaper desktop stuff in a raid, they return sector > errors in a timely fashion rather than retrying for many minutes in the > background. Ugh, really? What do the desktop drives return in the mean time, when they haven't been able to read a sector properly? Make something up and hope it gets written to soon? That seems too hacky even for desktop HDD firmware, which is saying something. I've generally seen fairly prompt failure responses from desktop-grade drives (and I see a lot of them fail!). While there are usually many layers of OS-driven retries above the drive that delay reporting of errors, the RAID volume the drive is a member of will generally block until a retry succeeds or the OS layers between the software RAID implementation and the disk give up and pass on the disk's error report. That said, I've mostly used Linux's `md' software RAID, which while imperfect seems to be pretty sane in terms of data preservation. -- Craig Ringer
> Was there any particular event that marked the onset of these issues? > Anything in the system logs (dmesg / syslog etc) around that time? Unfortunately, cannot recall anything abnormal.=20 > Was Pg forcibly killed and restarted, or the machine hard-reset? (This > _shouldn't_ cause data corruption, but might give some starting point > for looking for a bug). It is a normal pg shutdown followed by a reboot. > *** DID YOU TAKE COPIES OF YOUR DATA FILES BEFORE "FIXING" THEM *** ?=20 No. Should have taken a backup on hindsight. At that time, the primary moti= vation was to get over the hump and have the query result succssfully. Correct me if I am wrong, but I thought one of the, if not the most, primar= y tasks for relational databases is to ensure that no data loss ever occurs= . Which is why I was initially surprised that the issue did not get enough = importnace. But now it seems more like the community not knowing what trigg= ered the issue i.e. not knowing which component to fix. But I do have one overriding question - since postgres is still running on = the same hardware, wouldn't it rule out hardware as the primary suspect? _________________________________________________________________ Ready for Fall shows? Use Bing to find helpful ratings and reviews on digit= al tv's. http://www.bing.com/shopping/search?q=3Ddigital+tv's&form=3DMSHNCB&publ=3DW= LHMTAG&crea=3DTEXT_MSHNCB_Vertical_Shopping_DigitalTVs_1x1=
postgres bee <postgres_bee@live.com> writes: > But I do have one overriding question - since postgres is still running on the same hardware, wouldn't it rule out hardwareas the primary suspect? Uh, no. One of the principal characteristics of hardware problems is that they're intermittent. (When they're not, you have a totally dead machine...) There may in fact be a software bug here, but you've not only provided no evidence for that, you've gone out of your way to destroy whatever evidence might have existed. We are not going to sit around wringing our hands over the remote possibility of a bug, when there is nothing we can do to investigate. If you see it happen again, make a full filesystem-level copy of the data directory (after shutting down the postmaster and before trying to recover), and then maybe there's a chance to find something out. regards, tom lane
postgres bee wrote: > Correct me if I am wrong, but I thought one of the, if not the most, primary tasks for relational databases is to ensurethat no data loss ever occurs. Which is why I was initially surprised that the issue did not get enough importnace.But now it seems more like the community not knowing what triggered the issue i.e. not knowing which componentto fix. ... or if there is anything to fix. PostgreSQL has to trust the hardware and the OS to do their jobs. If the OS is, unbeknownst to PostgreSQL, flipping the high bit in any byte written at exactly Midnight on a tuesday, there's nothing PostgreSQL can do to prevent it. If Pg checksummed blocks and read each block back after writing it could possibly detect *immediate* write problems - but then, OS caching would probably hide the issue unless Pg bypassed the OS's caching and forced direct disk reads. To say this would perform poorly is a spectacular understatement. Even with such a scheme, there's no guarantee that data isn't being mangled after it hits disk. The RAID controller might be "helpfully" "fixing" parity errors in a RAID 5 volume using garbage being returned by a failing disk during periodic RAID scrubbing. An SSD might have a buggy wear leveling algorithm that results in blocks being misplaced. And so on. Now, in those cases you might see some benefit from an OS-level checksumming file system, but that won't save you from OS bugs. It's the OS and the hardware's job to get the data Pg writes to disk onto disk successfully and accurately, keep it there, and return it unchanged on request. If they can't do that, there's nothing Pg can do about it. > But I do have one overriding question - since postgres is still running on the same hardware, wouldn't it rule out hardwareas the primary suspect? Absolutely not. As Tom Lane noted, such faults are generally intermittent. For example: I had lots of "fun" years ago tracking down an issue caused by RAID scrubbing on a defective 3Ware 8500-8 card. The card ran fine in all my tests, and the system would remain in good condition for a week or two, but then random file system corruption would start arising. Files would be filled with garbage or with the contents of other files, the file system structure would get damaged and need fsck, files would vanish or turn up in lost+found, etc etc. It turned out that by default the controller ran a weekly parity check - which was always failing due to an defect with the controller, triggering a rebuild. The rebuild, due to the same issue with the controller, would proceed to merrily mangle the data on the array in the name of restoring parity. 3Ware replaced the controller and all was well. Now, what's PostgreSQL going to do when it's run on hardware like that? How can it protect its self? It can't. Common causes of intermittent corruption include: - OS / file system bugs - Buggy RAID drivers and cards, especially "fake raid" cards - Physically defective or failing hardware RAID cards - Defective or overheating memory / CPU, resulting in intermittent memory corruption that can affect data written to or read from disk. Doesn't always show up as crashing processes etc as well; such things can be REALLY quirky. - Some desktop hard disks, which are so desperate to ensure you don't return them as defective that they'll do scary things to remap blocks. "Meh, it was unreadable anyway, I'll just re-allocate it and return zeroes instead of reporting an error" Sometimes bugs will only arise in certain circumstances. A RAID controller bug might only be triggered by a Western Digital "Green" hard disk with a 1.0 firmware*. An issue with a 2.5" laptop SSD might only arise when a write is committed to it immediately before it's powered off as a laptop goes into sleep. A buggy disk might perform an incomplete write of a block if power from the PSU momentarily drops below optimal levels because someone turned on the microwave on the same phase as the server. The list is endless. What it comes down to, though, is that this issue manifests its self first as some corrupt blocks in one of the database segments. There's absolutely no information available about when they got corrupted or by what part of the system. It could've even been anti-virus software on the system "disinfecting" them from a suspected virus, ie something totally outside the normal parts of the system Pg is concerned with. So, unless an event is noticed that is associated with the corruption, or some way to reproduce it is found, there's no way to tell whether any given incident could be a rarely triggered Pg bug (ie: Pg writes wrong data, writes garbage to files, etc) or whether it's something external like hardware or interfering 3rd party software. Make sense? -- Craig Ringer * For example, WD Caviar disks a few years ago used to spin down without request from the OS as a power saving measure. This was Ok with most OSes, but RAID cards tended to treat them as failed and drop them from the array. Multiple disk failure array death quickly resulted. Yes, I had some of those, too - I haven't been lucky with disks. -- Craig Ringer
Craig Ringer wrote: > PostgreSQL has to trust the hardware and the OS to do their jobs. If the > OS is, unbeknownst to PostgreSQL, flipping the high bit in any byte Might not even be the OS - it could be the stars (through cosmic rays). http://www.eetimes.com/news/98/1012news/ibm.html '"This clearly indicates that because of cosmic rays, for every 256 Mbytes of memory, you'll get one soft error a month," said Tim Dell, senior design engineer for IBM Microelectronics. ' > The RAID controller might be "helpfully" "fixing" parity errors > in a RAID 5 volume using garbage being returned by a failing disk > during periodic RAID scrubbing. If your raid controller doesn't have ECC memory, and if IBM's right about those soft error stats, it might be doing more harm than good.
Craig Ringer <craig@postnewspapers.com.au> writes: > [ long summary of all the weird and wonderful ways things can break ] > ... unless an event is noticed that is associated with the corruption, or > some way to reproduce it is found, there's no way to tell whether any > given incident could be a rarely triggered Pg bug (ie: Pg writes wrong > data, writes garbage to files, etc) or whether it's something external > like hardware or interfering 3rd party software. Sometimes you can get a good clue by examining the putatively damaged blocks. For instance, we've seen cases where a block in a Postgres data file was reported corrupt, and turned out to contain text out of the system's mail spool. That's a pretty strong hint that either the filesystem or the disk drive messed up and wrote a chunk of a file at the wrong place. Isolated flipped bits would suggest memory problems (per the cosmic-rays issue). And so on. Postgres bugs tend to have fairly recognizable signatures too. But with no evidence to look at, there's nothing we can do except speculate. regards, tom lane