Thread: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

austijc

Date:

25 September 2008, 23:33:32

Configuration:

Postgres 8.3.1
Solaris 10 Sparc System NFS mounting the database directory from a NetApp
2020 NAS device.

Mount options:

rw,bg,hard,rsize=32768,wsize=32768,vers=3,forcedirectio,nointr,proto=tcp,suid

Error:

ERROR:  unexpected data beyond EOF in block 315378 of relation "file"
HINT:  This has been seen to occur with buggy kernels; consider updating
your system.

Situation:

Occasionally under heavy insert load.

The error comes from line 225 of bufmgr.c.  The kernel bug mentioned in the
comments is an lseek bug in a Linux kernel so I don't believe that is the
case here.

The question is can anyone more familiar with this tell me what's going on
here?  I don't know if this is a Postgres, Sun, or NetApp issue.  Could it
be a work around for an old Linux bug causing an issue with acceptable
behavior of the NetApp device?

There has been some clock differences between the Solaris system and the
Netapp device.  Could postgres be confused by file modify times being in the
future by a few seconds?

--
View this message in context:
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19680438.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

Tom Lane

Date:

26 September 2008, 00:31:27

austijc <jaustin@jasononthe.net> writes:
> The question is can anyone more familiar with this tell me what's going on
> here?  I don't know if this is a Postgres, Sun, or NetApp issue.  Could it
> be a work around for an old Linux bug causing an issue with acceptable
> behavior of the NetApp device?

People who try to run databases over NFS usually regret it eventually ;-)

All I can say is that this error message has never before been reported
by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
I am not finding it too hard to believe that NFS might be vulnerable to
similar misbehavior.

            regards, tom lane

Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

austijc

Date:

28 September 2008, 17:59:38

That's going to be a problem for the continued viability of Postgres.
Clustered systems using a NAS for data is a pretty common configuration
these days.  Oracle specifically supports it and even complains if your NFS
mount options are not correct.   Our Oracle DBs run great in this same
configuration and are a good 10-20 times faster than the local disk
performance along with the quick take-over capability if a system goes belly
up.

I'll try to isolate this problem with a simple C program to tell me what
software layer to look at.  Hopefully it's just a configuration issue.

Tom Lane-2 wrote:
>
> austijc <jaustin@jasononthe.net> writes:
>> The question is can anyone more familiar with this tell me what's going
>> on
>> here?  I don't know if this is a Postgres, Sun, or NetApp issue.  Could
>> it
>> be a work around for an old Linux bug causing an issue with acceptable
>> behavior of the NetApp device?
>
> People who try to run databases over NFS usually regret it eventually ;-)
>
> All I can say is that this error message has never before been reported
> by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
> I am not finding it too hard to believe that NFS might be vulnerable to
> similar misbehavior.
>
>             regards, tom lane
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>
>

--
View this message in context:
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

David Fetter

Date:

29 September 2008, 14:16:45

On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:
>
> That's going to be a problem for the continued viability of
> Postgres.

Funny, I thought running a DBMS over a known-unreliable storage system
was a problem for the continued viability of Oracle.  When, not if,
people lose enough data to this silliness, they'll be thinking hard
about how to get Oracle out and something reliable in.

> Clustered systems using a NAS for data is a pretty common
> configuration these days.  Oracle specifically supports it and even
> complains if your NFS mount options are not correct.   Our Oracle
> DBs run great in this same configuration and are a good 10-20 times
> faster than the local disk performance along with the quick
> take-over capability if a system goes belly up.

Oracle stores more state to the disk than PostgreSQL does, which has
significant down sides.  There are more effective ways of handling
uptime requirements than jamming NFS into the picture.  Maybe it's
just my failure of imagination, but I can't think of a *less*
effective one.

> I'll try to isolate this problem with a simple C program to tell me
> what software layer to look at.  Hopefully it's just a configuration
> issue.

It's not.  The issue is that NFS is broken garbage from a DBMS, and,
it's pretty easy to argue, just about any other perspective.

Cheers,
David.

>
> Tom Lane-2 wrote:
> >
> > austijc <jaustin@jasononthe.net> writes:
> >> The question is can anyone more familiar with this tell me what's going
> >> on
> >> here?  I don't know if this is a Postgres, Sun, or NetApp issue.  Could
> >> it
> >> be a work around for an old Linux bug causing an issue with acceptable
> >> behavior of the NetApp device?
> >
> > People who try to run databases over NFS usually regret it eventually ;-)
> >
> > All I can say is that this error message has never before been reported
> > by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
> > I am not finding it too hard to believe that NFS might be vulnerable to
> > similar misbehavior.
> >
> >             regards, tom lane
> >
> > --
> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> > To make changes to your subscription:
> > http://www.postgresql.org/mailpref/pgsql-bugs
> >
> >
>
> --
> View this message in context:
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
> Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
>
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs

--
David Fetter <david@fetter.org> http://fetter.org/
Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
Skype: davidfetter      XMPP: david.fetter@gmail.com

Remember to vote!
Consider donating to Postgres: http://www.postgresql.org/about/donate

Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

austijc

Date:

29 September 2008, 14:55:01

Okay, I see the maturity level is too low here.  I'll take this elsewhere.
If anyone has a similar problem and would like to know the status please
email me.



David Fetter wrote:
>
> On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:
>>
>> That's going to be a problem for the continued viability of
>> Postgres.
>
> Funny, I thought running a DBMS over a known-unreliable storage system
> was a problem for the continued viability of Oracle.  When, not if,
> people lose enough data to this silliness, they'll be thinking hard
> about how to get Oracle out and something reliable in.
>
>> Clustered systems using a NAS for data is a pretty common
>> configuration these days.  Oracle specifically supports it and even
>> complains if your NFS mount options are not correct.   Our Oracle
>> DBs run great in this same configuration and are a good 10-20 times
>> faster than the local disk performance along with the quick
>> take-over capability if a system goes belly up.
>
> Oracle stores more state to the disk than PostgreSQL does, which has
> significant down sides.  There are more effective ways of handling
> uptime requirements than jamming NFS into the picture.  Maybe it's
> just my failure of imagination, but I can't think of a *less*
> effective one.
>
>> I'll try to isolate this problem with a simple C program to tell me
>> what software layer to look at.  Hopefully it's just a configuration
>> issue.
>
> It's not.  The issue is that NFS is broken garbage from a DBMS, and,
> it's pretty easy to argue, just about any other perspective.
>
> Cheers,
> David.
>
>>
>> Tom Lane-2 wrote:
>> >
>> > austijc <jaustin@jasononthe.net> writes:
>> >> The question is can anyone more familiar with this tell me what's
>> going
>> >> on
>> >> here?  I don't know if this is a Postgres, Sun, or NetApp issue.
>> Could
>> >> it
>> >> be a work around for an old Linux bug causing an issue with acceptable
>> >> behavior of the NetApp device?
>> >
>> > People who try to run databases over NFS usually regret it eventually
>> ;-)
>> >
>> > All I can say is that this error message has never before been reported
>> > by anyone who wasn't exposed to that lseek-inconsistency kernel bug.
>> > I am not finding it too hard to believe that NFS might be vulnerable to
>> > similar misbehavior.
>> >
>> >             regards, tom lane
>> >
>> > --
>> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> > To make changes to your subscription:
>> > http://www.postgresql.org/mailpref/pgsql-bugs
>> >
>> >
>>
>> --
>> View this message in context:
>>
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html
>> Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
>>
>>
>> --
>> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-bugs
>
> --
> David Fetter <david@fetter.org> http://fetter.org/
> Phone: +1 415 235 3778  AIM: dfetter666  Yahoo!: dfetter
> Skype: davidfetter      XMPP: david.fetter@gmail.com
>
> Remember to vote!
> Consider donating to Postgres: http://www.postgresql.org/about/donate
>
> --
> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org)
> To make changes to your subscription:
> http://www.postgresql.org/mailpref/pgsql-bugs
>
>

--
View this message in context:
http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19728120.html
Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.

Re: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"

From

Peter Eisentraut

Date:

29 September 2008, 18:47:47

David Fetter wrote:
> On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote:
>> That's going to be a problem for the continued viability of
>> Postgres.
>
> Funny, I thought running a DBMS over a known-unreliable storage system
> was a problem for the continued viability of Oracle.  When, not if,
> people lose enough data to this silliness, they'll be thinking hard
> about how to get Oracle out and something reliable in.

NFS is not "unreliable", it is just different in some respects from
other file systems.  That paired with some poor NFS implementations in
certain operating systems and this evident general misunderstanding make
it a poor fit for PostgreSQL.