Thread: database corruption
Hi, We recently switch over from Solaris to Linux and we've been experiencing a couple database corruptions problems. We're using the 2.6 kernel and a XFS file system, in case there are any known problems with this setup. It's my understanding that these errors are typically b/c of hardware problems, so we're in the process of checking the disk. However, are there other things that I should look to find out what's going on? The table that was corrupted earlier this week was dropped and recreated, but it was corrupted again sometime last night. The table is pretty active, about 10GB are inserted and deleted every day. -Michael here's what vacuum comes up with INFO: vacuuming "public.tbltimeseries" WARNING: relation "tbltimeseries" TID 696892/1: OID is invalid WARNING: relation "tbltimeseries" TID 696892/2: OID is invalid WARNING: relation "tbltimeseries" TID 696892/3: OID is invalid WARNING: relation "tbltimeseries" TID 696892/4: OID is invalid WARNING: relation "tbltimeseries" TID 696892/5: OID is invalid WARNING: relation "tbltimeseries" TID 696892/6: OID is invalid WARNING: relation "tbltimeseries" TID 696893/1: OID is invalid WARNING: relation "tbltimeseries" TID 696893/2: OID is invalid WARNING: relation "tbltimeseries" TID 696893/3: OID is invalid WARNING: relation "tbltimeseries" TID 696893/4: OID is invalid WARNING: relation "tbltimeseries" TID 696893/5: OID is invalid WARNING: relation "tbltimeseries" TID 696893/6: OID is invalid WARNING: relation "tbltimeseries" TID 696894/1: OID is invalid WARNING: relation "tbltimeseries" TID 696894/2: OID is invalid WARNING: relation "tbltimeseries" TID 696894/3: OID is invalid WARNING: relation "tbltimeseries" TID 696894/4: OID is invalid WARNING: relation "tbltimeseries" TID 696895/1: OID is invalid WARNING: relation "tbltimeseries" TID 696895/2: OID is invalid WARNING: relation "tbltimeseries" TID 696895/3: OID is invalid WARNING: relation "tbltimeseries" TID 696895/4: OID is invalid WARNING: relation "tbltimeseries" TID 696895/5: OID is invalid WARNING: relation "tbltimeseries" TID 696895/6: OID is invalid WARNING: relation "tbltimeseries" TID 1124824/1: OID is invalid WARNING: relation "tbltimeseries" TID 1124824/2: OID is invalid WARNING: relation "tbltimeseries" TID 1124824/3: OID is invalid WARNING: relation "tbltimeseries" TID 1124824/4: OID is invalid WARNING: relation "tbltimeseries" TID 1124824/5: OID is invalid INFO: index "idx_timeseries2" now contains 27066056 row versions in 108521 pages DETAIL: 0 index pages have been deleted, 0 are currently reusable. CPU 3.30s/1.06u sec elapsed 17.28 sec. INFO: "tbltimeseries": found 0 removable, 27065818 nonremovable row versions in 1643991 pages DETAIL: 0 dead row versions cannot be removed yet. There were 86 unused item pointers. 0 pages are entirely empty. CPU 25.41s/4.88u sec elapsed 272.64 sec. INFO: vacuuming "pg_toast.pg_toast_1436963967" ERROR: invalid page header in block 1890530 of relation "pg_toast_1436963967"
On Fri, 2004-06-11 at 14:16, Michael Guerin wrote: > Hi, > > We recently switch over from Solaris to Linux and we've been > experiencing a couple database corruptions problems. We're using the > 2.6 kernel and a XFS file system, in case there are any known problems > with this setup. > > It's my understanding that these errors are typically b/c of hardware > problems, so we're in the process of checking the disk. However, are > there other things that I should look to find out what's going on? The > table that was corrupted earlier this week was dropped and recreated, > but it was corrupted again sometime last night. The table is pretty > active, about 10GB are inserted and deleted every day. Be sure and test your memory too. And if you have a hardware RAID controller, sometimes they can be slightly broken and cause random corruption.
Scott Marlowe wrote: >On Fri, 2004-06-11 at 14:16, Michael Guerin wrote: > > >>Hi, >> >> We recently switch over from Solaris to Linux and we've been >>experiencing a couple database corruptions problems. We're using the >>2.6 kernel and a XFS file system, in case there are any known problems >>with this setup. >> >>It's my understanding that these errors are typically b/c of hardware >>problems, so we're in the process of checking the disk. However, are >>there other things that I should look to find out what's going on? The >>table that was corrupted earlier this week was dropped and recreated, >>but it was corrupted again sometime last night. The table is pretty >>active, about 10GB are inserted and deleted every day. >> >> > >Be sure and test your memory too. And if you have a hardware RAID >controller, sometimes they can be slightly broken and cause random >corruption. > > > Switching from XFS to Reiserfs seems to have resolved our problems.
----- Original Message ----- From: "Michael Guerin" <guerin@rentec.com> To: <pgsql-novice@postgresql.org> Sent: Thursday, June 24, 2004 12:33 PM Subject: Re: [NOVICE] database corruption > Scott Marlowe wrote: > > >On Fri, 2004-06-11 at 14:16, Michael Guerin wrote: > > > > > >>Hi, > >> > >> We recently switch over from Solaris to Linux and we've been > >>experiencing a couple database corruptions problems. We're using the > >>2.6 kernel and a XFS file system, in case there are any known problems > >>with this setup. > >> > >>It's my understanding that these errors are typically b/c of hardware > >>problems, so we're in the process of checking the disk. However, are > >>there other things that I should look to find out what's going on? The > >>table that was corrupted earlier this week was dropped and recreated, > >>but it was corrupted again sometime last night. The table is pretty > >>active, about 10GB are inserted and deleted every day. > >> > >> > > > >Be sure and test your memory too. And if you have a hardware RAID > >controller, sometimes they can be slightly broken and cause random > >corruption. > > > > > > > Switching from XFS to Reiserfs seems to have resolved our problems. > That sounds like a problem we had - we were running off an XFS partition, had table corruption trouble (invalid page headers) on a large table. We were running kernel 2.6, AMD64 architecture. After the problem occurred a couple times we upgraded our raid controller firmware and driver, checked our RAM, and switched over to the JFS file system. Haven't had trouble since. We tried a couple things when fixing our problem; but I'm wondering if anyone else is having trouble with XFS, and that that might be the root cause of the trouble. -Adam