Thread: READ ONLY & I/O ERROR

READ ONLY & I/O ERROR

From

Sam Jas

Date:

26 November 2009, 09:40:58

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware

We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.

--
Thanks
Sam Jas

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.

Re: READ ONLY & I/O ERROR

From

Grzegorz Jaśkiewicz

Date:

26 November 2009, 09:44:37

On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samjas33@yahoo.com> wrote:

Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware

We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.

this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).

If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.

--
GJ

Re: READ ONLY & I/O ERROR

From

Sam Jas

Date:

26 November 2009, 09:54:21

How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.

Thanks
Sam Jas

--- On Thu, 26/11/09, Grzegorz Jaśkiewicz <gryzman@gmail.com> wrote:

From: Grzegorz Jaśkiewicz <gryzman@gmail.com>
Subject: Re: [GENERAL] READ ONLY & I/O ERROR
To: "Sam Jas" <samjas33@yahoo.com>
Cc: pgsql-general@postgresql.org
Date: Thursday, 26 November, 2009, 1:44 PM

On Thu, Nov 26, 2009 at 1:40 PM, Sam Jas <samjas33@yahoo.com> wrote:
Hi Folks,

I am frequently getting read-only file system error on my server.

We are using postgreSQL, GridSQL database. The size of database is very huge.
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM
assemble hardware

We are daily processing millions of rows and loadiing into database. We have marked that when we create a new database it worked fine upto 20 or 25 days. After that we
are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad blocks from the disk. However, after running fsck also we are getting the same error.

I will appreciate you if somebody help me to get rid out of this issue.
this looks more like filesystem corruption.
What's the FS database is running on ? presumably ext3 (cos it is centos5).

If possible, consider checking the root cause of FS corruption, possibly test on other FS (xfs?).
Maybe you should also try to enable journaling, if you run in ext2/3 mode.

--
GJ

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.

Re: READ ONLY & I/O ERROR

From

Grzegorz Jaśkiewicz

Date:

26 November 2009, 10:07:16

2009/11/26 Sam Jas <samjas33@yahoo.com>

How can i enable journaling as i am not so good at OS & H/W level. Can you give me some detail description.

a) don't top post,
b) don't send emails in html,
c) man e2fsck , I am sure it is described all around net million times. it is something I haven't done in a while - so please search for instructions, for instance on redhat's website.

--
GJ

Re: READ ONLY & I/O ERROR

From

Grzegorz Jaśkiewicz

Date:

26 November 2009, 10:09:16

oh, and fourth - if you get filesystem errors, I would inspect drives, raid card, etc - because those usually mean that something's fishy.

Re: READ ONLY & I/O ERROR

From

Alan Hodgson

Date:

26 November 2009, 11:39:14

On Thursday 26 November 2009, Sam Jas <samjas33@yahoo.com> wrote:

> We are daily processing millions of rows and loadiing into database. We
> have marked that when we create a new database it worked fine upto 20 or
> 25 days. After that we are getting errors like "read only file system" ,
> data is corrupted. Therefore we are running fsck to remove bad blocks
> from the disk. However, after running fsck also we are getting the same
> error.

You have a hardware problem. Get your system administrator to isolate and
repair the bad hardware.

--
A hybrid Escalade is missing the point much in the same way that having a
diet soda with your extra large pepperoni pizza is missing the point.

Re: READ ONLY & I/O ERROR

From

Scott Marlowe

Date:

26 November 2009, 12:57:57

On Thu, Nov 26, 2009 at 6:40 AM, Sam Jas <samjas33@yahoo.com> wrote:
>
> Hi Folks,
>
> I am frequently getting read-only file system error on my server.
>
> We are using postgreSQL, GridSQL database. The size of database is very huge.
> Architecture Details:
> CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port

Areca doesn't make the high point rocket raid cards (which are medium
quality RAID cards).

> 32 GB RAM
> assemble hardware

Did you follow proper ESD precautions when building this machine??

> We are daily processing millions of rows and loadiing into database. We have marked that when we create a new
databaseit worked fine upto 20 or 25 days. After that we 
> are getting errors like "read only file system" , data is corrupted. Therefore we are running fsck to remove bad
blocksfrom the disk. However, after running fsck also we are getting the same error. 
>
> I will appreciate you if somebody help me to get rid out of this issue.

Sounds like your hardware is bad.  Could be mobo / cpu / memory or
RAID card.  Does this machine "hang" every so often or anything?

I'd run memtest86+ on it first to confirm good cpu / memory / mobo.

Quick factoid from my days as an electronics instructor in the USAF,
95% of all ESD induced failures are latent in nature, either resulting
in catastrophic failure or thermal degradation some months or years
down the road.

Re: READ ONLY & I/O ERROR

From

Scott Marlowe

Date:

27 November 2009, 12:02:05

On Fri, Nov 27, 2009 at 4:53 AM, Sam Jas <samjas33@yahoo.com> wrote:
>
> I will check that one. Also i have read one forum which tells that whenever you face disk i/o run "dmesg" command it
willgive you detail information. Today again i face disk i/o and i have run "dmesg" it has given me below o/p. Can
somebodyhelp me to explain what is it telling ? 

> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 16
> Buffer I/O error on device sdd, logical block 2
> Buffer I/O error on device sdd, logical block 3
> sd 0:0:3:0: SCSI error: return code = 0x00040000
> end_request: I/O error, dev sdd, sector 0

Looks like you've got a bad drive.

Re: READ ONLY & I/O ERROR

From

Greg Smith

Date:

30 November 2009, 16:29:56

Scott Marlowe wrote:
> Areca doesn't make the high point rocket raid cards (which are medium
> quality RAID cards).
>
On a good day maybe.  HighPoint is a pretty miserable RAID vendor--in
the same league as Promise from what I've seen as far as their Linux
driver support goes.  In generally, and for reasons I'm not completely
sure of, everyone selling "fake RAID" cards seems to be completely
incompetent.  The page at http://linuxmafia.com/faq/Hardware/sata.html
hasn't been updated in a while, but as of 2007 all the current HighPoint
cards were still based on closed-source drivers only.  Completely
worthless hardware IMHO.

> Sounds like your hardware is bad.  Could be mobo / cpu / memory or
> RAID card.  Does this machine "hang" every so often or anything?
>
It's not out of the question for this sort of problem to be caused by a
bad driver too.  In this case it seems more likely it's a drive failure
though.

--
Greg Smith    2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com  www.2ndQuadrant.com

Re: READ ONLY & I/O ERROR

From

Sam Jas

Date:

02 December 2009, 10:51:42

We are getting the below errors after 20 or 25 days of database creation.

ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error

If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the new database.

The size of database is very huge. We are loading millions of records every day and also fetching from the database is also high. Even the disks are not full. We are not dropping the old database.

What is the reason for this issue?

How can we ensure that it is not a database issue?

We are using
GridSQL: 1.1.0.9
PostgreSQL 8.3
Architecture Details:
CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
32 GB RAM

--
Thanks
Sam Jas

--- On Mon, 30/11/09, Greg Smith <greg@2ndquadrant.com> wrote:

From: Greg Smith <greg@2ndquadrant.com>
Subject: Re: [GENERAL] READ ONLY & I/O ERROR
To: "Scott Marlowe" <scott.marlowe@gmail.com>
Cc: "Sam Jas" <samjas33@yahoo.com>, pgsql-general@postgresql.org
Date: Monday, 30 November, 2009, 8:29 PM

Scott Marlowe wrote:
> Areca doesn't make the high point rocket raid cards (which are medium
> quality RAID cards).
>
On a good day maybe. HighPoint is a pretty miserable RAID vendor--in the same league as Promise from what I've seen as far as their Linux driver support goes. In generally, and for reasons I'm not completely sure of, everyone selling "fake RAID" cards seems to be completely incompetent. The page at http://linuxmafia.com/faq/Hardware/sata.html hasn't been updated in a while, but as of 2007 all the current HighPoint cards were still based on closed-source drivers only. Completely worthless hardware IMHO.

> Sounds like your hardware is bad. Could be mobo / cpu / memory or
> RAID card. Does this machine "hang" every so often or anything?
>
It's not out of the question for this sort of problem to be caused by a bad driver too. In this case it seems more likely it's a drive failure though.

-- Greg Smith 2ndQuadrant   Baltimore, MD
PostgreSQL Training, Services and Support
greg@2ndQuadrant.com www.2ndQuadrant.com

-- Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

The INTERNET now has a personality. YOURS! See your Yahoo! Homepage.

Re: READ ONLY & I/O ERROR

From

Scott Marlowe

Date:

02 December 2009, 11:35:33

(please use text only email to the list)

On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas <samjas33@yahoo.com> wrote:
>
> We are getting the below errors after 20 or 25 days of database creation.
>
> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error

PostgreSQL cannot make a file system read only.  The OS does that.

What do your system logs in /var/log have to say when this happens?
There's got to be more context in there than we're getting evidence of
here on the list.

> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with the
newdatabase. 

My guess is that it's not a fixed number, just what you've seen so
far, could happen in a day or a month or a year.

>
> The size of database is very huge. We are loading millions of records every day and also fetching from the database
isalso high. Even the disks are not full. We are not dropping the old database. 
>
> What is the reason for this issue?

Looks like bad hardware to me.

> How can we ensure that it is not a database issue?

It can't be a database number, as the database isn't capable of
actually locking a file system.  It can trigger an OS bug maybe that
causes this problem, but given that no one else is having this issue
with Centos 5.3, I'm gonna bet on bad hardware.

> We are using
> GridSQL: 1.1.0.9
> PostgreSQL 8.3
> Architecture Details:
> CentOS 5.3 64 bit Areca high point rocket raid 3520 8 port
> 32 GB RAM

I will repeat, Areca does NOT MAKE the high point rocket raid.  I will
also add that a Rocket Raid is not, IMHO, suitable for a production
environment.  If it's an actual Areca, then the model will be
something like 11xx, 12xx, or 16xx numbers, not 3520.

Re: READ ONLY & I/O ERROR

From

Craig Ringer

Date:

02 December 2009, 12:17:17

On 2/12/2009 11:35 PM, Scott Marlowe wrote:
> (please use text only email to the list)
>
> On Wed, Dec 2, 2009 at 7:51 AM, Sam Jas<samjas33@yahoo.com>  wrote:
>>
>> We are getting the below errors after 20 or 25 days of database creation.
>>
>> ERROR: could not open relation 1919829/1152694/1921473: Read-only file system
>> ERROR: could not read block 312320 of relation 1964206/1152694/1981329: Input/output error
>
> PostgreSQL cannot make a file system read only.  The OS does that.
>
> What do your system logs in /var/log have to say when this happens?
> There's got to be more context in there than we're getting evidence of
> here on the list.

In particular, if you're on a Linux system check the output of the
"dmesg" command. I expect to see warnings about file system errors and
about the file system being re-mounted read-only. I won't be surprised
to see disk/raid errors either.

>> If we create a new database the problem is repeated after 20 or 25 days. Until then we don't have any issues with
thenew database. 
>
> My guess is that it's not a fixed number, just what you've seen so
> far, could happen in a day or a month or a year.

Do you do any RAID scrubbing? On what schedule? Do you test the disks
that are part of your RAID array using their internal SMART diagnostics?

Is your server ever hard-reset or rebooted due to loss of power?
(PostgreSQL is fine with this on a proper setup, but if you have a buggy
RAID controller or one that caches writes without a battery backup, it's
going to have issues).

--
Craig Ringer