Thread: READ ONLY & I/O ERROR
Hi Folks, I am frequently getting read-only file system error on my server. We are using postgreSQL, GridSQL database. The size of database is very huge. Architecture Details: CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port 32 GB RAM assemble hardware We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error. I will appreciate you if somebody help me to get rid out of this issue. -- Thanks Sam Jas |
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samjas33@yahoo.com> wrote:
Hi Folks,
I am frequently getting read-only file system error on my server.
We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware
We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.
I will appreciate you if somebody help me to get rid out of this issue.
this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).
If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.
What's the FS database is running on ? presumably ext3 (cos it is centos5).
If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.
--
GJ
How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description. Thanks Sam Jas --- On Thu, 26/11/09, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:
|
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
2009/11/26 Sam Jas <samjas33@yahoo.com>
a) don't top post,
b) don't send emails in html,
c) man e2fsck , I am sure it is described all around net million times. it is something I haven't done in a while - so please search for instructions, for instance on redhat's website.
How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.
a) don't top post,
b) don't send emails in html,
c) man e2fsck , I am sure it is described all around net million times. it is something I haven't done in a while - so please search for instructions, for instance on redhat's website.
--
GJ
oh, and fourth - if you get filesystem errors, I would inspect drives, raid card, etc - because those usually mean that something's fishy.
On Thursday 26 November 2009, Sam Jas <samjas33@yahoo.com> wrote: > We are daily processing millions of rows and loadiing into database. We > have marked that when we create a new database it worked fine upto 20 or > 25 days. After that we are getting errors like "read only file system" , > data is corrupted. Therefore we are running fsck to remove bad blocks > from the disk. However, after running fsck also we are getting the same > error. You have a hardware problem. Get your system administrator to isolate and repair the bad hardware. -- A hybrid Escalade is missing the point much in the same way that having a diet soda with your extra large pepperoni pizza is missing the point.
On Thu, Nov 26, 2009 at 6:40 AM, Sam Jas <samjas33@yahoo.com> wrote: > > Hi Folks, > > I am frequently getting read-only file system error on my server. > > We are using postgreSQL, GridSQL database. The size of database is very huge. > Architecture Details: > CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port Areca doesn't make the high point rocket raid cards (which are medium quality RAID cards). > 32 GB RAM > assemble hardware Did you follow proper ESD precautions when building this machine?? > We are daily processing millions of rows and loadiing into database. We have marked that when we create a new databaseit worked fine upto 20 or 25 days. After that we > are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocksfrom the disk. However, after running fsck also we are getting the same error. > > I will appreciate you if somebody help me to get rid out of this issue. Sounds like your hardware is bad. Could be mobo / cpu / memory or RAID card. Does this machine "hang" every so often or anything? I'd run memtest86+ on it first to confirm good cpu / memory / mobo. Quick factoid from my days as an electronics instructor in the USAF, 95% of all ESD induced failures are latent in nature, either resulting in catastrophic failure or thermal degradation some months or years down the road.
On Fri, Nov 27, 2009 at 4:53 AM, Sam Jas <samjas33@yahoo.com> wrote: > > I will check that one. Also i have read one forum which tells that whenever you face disk i/o run "dmesg" command it willgive you detail information. Today again i face disk i/o and i have run "dmesg" it has given me below o/p. Can somebodyhelp me to explain what is it telling ? > sd 0:0:3:0: SCSI error: return code = 0x00040000 > end_request: I/O error, dev sdd, sector 16 > Buffer I/O error on device sdd, logical block 2 > Buffer I/O error on device sdd, logical block 3 > sd 0:0:3:0: SCSI error: return code = 0x00040000 > end_request: I/O error, dev sdd, sector 0 Looks like you've got a bad drive.
Scott Marlowe wrote: > Areca doesn't make the high point rocket raid cards (which are medium > quality RAID cards). > On a good day maybe. HighPoint is a pretty miserable RAID vendor--in the same league as Promise from what I've seen as far as their Linux driver support goes. In generally, and for reasons I'm not completely sure of, everyone selling "fake RAID" cards seems to be completely incompetent. The page at http://linuxmafia.com/faq/Hardware/sata.html hasn't been updated in a while, but as of 2007 all the current HighPoint cards were still based on closed-source drivers only. Completely worthless hardware IMHO. > Sounds like your hardware is bad. Could be mobo / cpu / memory or > RAID card. Does this machine "hang" every so often or anything? > It's not out of the question for this sort of problem to be caused by a bad driver too. In this case it seems more likely it's a drive failure though. -- Greg Smith 2ndQuadrant Baltimore, MD PostgreSQL Training, Services and Support greg@2ndQuadrant.com www.2ndQuadrant.com
We are getting the below errors after 20 or 25 days of database creation. ERROR: could not open relation 1919829/1152694/1921473: Read-only file system ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database. The size of database is very huge. We are loading millions of records every day and also fetching from the database is also high. Even the disks are not full. We are not dropping the old database. What is the reason for this issue? How can we ensure that it is not a database issue? We are using GridSQL: 1.1.0.9 PostgreSQL 8.3 Architecture Details: CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port 32 GB RAM -- Thanks Sam Jas --- On Mon, 30/11/09, Greg Smith <greg@2ndquadrant.com> wrote:
|
The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.
(please use text only email to the list) On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas <samjas33@yahoo.com> wrote: > > We are getting the below errors after 20 or 25 days of database creation. > > ERROR: could not open relation 1919829/1152694/1921473: Read-only file system > ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error PostgreSQL cannot make a file system read only. The OS does that. What do your system logs in /var/log have to say when this happens? There's got to be more context in there than we're getting evidence of here on the list. > If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the newdatabase. My guess is that it's not a fixed number, just what you've seen so far, could happen in a day or a month or a year. > > The size of database is very huge. We are loading millions of records every day and also fetching from the database isalso high. Even the disks are not full. We are not dropping the old database. > > What is the reason for this issue? Looks like bad hardware to me. > How can we ensure that it is not a database issue? It can't be a database number, as the database isn't capable of actually locking a file system. It can trigger an OS bug maybe that causes this problem, but given that no one else is having this issue with Centos 5.3, I'm gonna bet on bad hardware. > We are using > GridSQL: 1.1.0.9 > PostgreSQL 8.3 > Architecture Details: > CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port > 32 GB RAM I will repeat, Areca does NOT MAKE the high point rocket raid. I will also add that a Rocket Raid is not, IMHO, suitable for a production environment. If it's an actual Areca, then the model will be something like 11xx, 12xx, or 16xx numbers, not 3520.
On 2/12/2009 11:35 PM, Scott Marlowe wrote: > (please use text only email to the list) > > On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas<samjas33@yahoo.com> wrote: >> >> We are getting the below errors after 20 or 25 days of database creation. >> >> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system >> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error > > PostgreSQL cannot make a file system read only. The OS does that. > > What do your system logs in /var/log have to say when this happens? > There's got to be more context in there than we're getting evidence of > here on the list. In particular, if you're on a Linux system check the output of the "dmesg" command. I expect to see warnings about file system errors and about the file system being re-mounted read-only. I won't be surprised to see disk/raid errors either. >> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with thenew database. > > My guess is that it's not a fixed number, just what you've seen so > far, could happen in a day or a month or a year. Do you do any RAID scrubbing? On what schedule? Do you test the disks that are part of your RAID array using their internal SMART diagnostics? Is your server ever hard-reset or rebooted due to loss of power? (PostgreSQL is fine with this on a proper setup, but if you have a buggy RAID controller or one that caches writes without a battery backup, it's going to have issues). -- Craig Ringer