Thread: ERROR: unexpected data beyond EOF in block XXXXX of relation "file"
Configuration: Postgres 8.3.1 Solaris 10 Sparc System NFS mounting the database directory from a NetApp 2020 NAS device. Mount options: rw,bg,hard,rsize=32768,wsize=32768,vers=3,forcedirectio,nointr,proto=tcp,suid Error: ERROR: unexpected data beyond EOF in block 315378 of relation "file" HINT: This has been seen to occur with buggy kernels; consider updating your system. Situation: Occasionally under heavy insert load. The error comes from line 225 of bufmgr.c. The kernel bug mentioned in the comments is an lseek bug in a Linux kernel so I don't believe that is the case here. The question is can anyone more familiar with this tell me what's going on here? I don't know if this is a Postgres, Sun, or NetApp issue. Could it be a work around for an old Linux bug causing an issue with acceptable behavior of the NetApp device? There has been some clock differences between the Solaris system and the Netapp device. Could postgres be confused by file modify times being in the future by a few seconds? -- View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19680438.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
austijc <jaustin@jasononthe.net> writes: > The question is can anyone more familiar with this tell me what's going on > here? I don't know if this is a Postgres, Sun, or NetApp issue. Could it > be a work around for an old Linux bug causing an issue with acceptable > behavior of the NetApp device? People who try to run databases over NFS usually regret it eventually ;-) All I can say is that this error message has never before been reported by anyone who wasn't exposed to that lseek-inconsistency kernel bug. I am not finding it too hard to believe that NFS might be vulnerable to similar misbehavior. regards, tom lane
That's going to be a problem for the continued viability of Postgres. Clustered systems using a NAS for data is a pretty common configuration these days. Oracle specifically supports it and even complains if your NFS mount options are not correct. Our Oracle DBs run great in this same configuration and are a good 10-20 times faster than the local disk performance along with the quick take-over capability if a system goes belly up. I'll try to isolate this problem with a simple C program to tell me what software layer to look at. Hopefully it's just a configuration issue. Tom Lane-2 wrote: > > austijc <jaustin@jasononthe.net> writes: >> The question is can anyone more familiar with this tell me what's going >> on >> here? I don't know if this is a Postgres, Sun, or NetApp issue. Could >> it >> be a work around for an old Linux bug causing an issue with acceptable >> behavior of the NetApp device? > > People who try to run databases over NFS usually regret it eventually ;-) > > All I can say is that this error message has never before been reported > by anyone who wasn't exposed to that lseek-inconsistency kernel bug. > I am not finding it too hard to believe that NFS might be vulnerable to > similar misbehavior. > > regards, tom lane > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > > -- View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote: > > That's going to be a problem for the continued viability of > Postgres. Funny, I thought running a DBMS over a known-unreliable storage system was a problem for the continued viability of Oracle. When, not if, people lose enough data to this silliness, they'll be thinking hard about how to get Oracle out and something reliable in. > Clustered systems using a NAS for data is a pretty common > configuration these days. Oracle specifically supports it and even > complains if your NFS mount options are not correct. Our Oracle > DBs run great in this same configuration and are a good 10-20 times > faster than the local disk performance along with the quick > take-over capability if a system goes belly up. Oracle stores more state to the disk than PostgreSQL does, which has significant down sides. There are more effective ways of handling uptime requirements than jamming NFS into the picture. Maybe it's just my failure of imagination, but I can't think of a *less* effective one. > I'll try to isolate this problem with a simple C program to tell me > what software layer to look at. Hopefully it's just a configuration > issue. It's not. The issue is that NFS is broken garbage from a DBMS, and, it's pretty easy to argue, just about any other perspective. Cheers, David. > > Tom Lane-2 wrote: > > > > austijc <jaustin@jasononthe.net> writes: > >> The question is can anyone more familiar with this tell me what's going > >> on > >> here? I don't know if this is a Postgres, Sun, or NetApp issue. Could > >> it > >> be a work around for an old Linux bug causing an issue with acceptable > >> behavior of the NetApp device? > > > > People who try to run databases over NFS usually regret it eventually ;-) > > > > All I can say is that this error message has never before been reported > > by anyone who wasn't exposed to that lseek-inconsistency kernel bug. > > I am not finding it too hard to believe that NFS might be vulnerable to > > similar misbehavior. > > > > regards, tom lane > > > > -- > > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > > To make changes to your subscription: > > http://www.postgresql.org/mailpref/pgsql-bugs > > > > > > -- > View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html > Sent from the PostgreSQL - bugs mailing list archive at Nabble.com. > > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs -- David Fetter <david@fetter.org> http://fetter.org/ Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter Skype: davidfetter XMPP: david.fetter@gmail.com Remember to vote! Consider donating to Postgres: http://www.postgresql.org/about/donate
Okay, I see the maturity level is too low here. I'll take this elsewhere. If anyone has a similar problem and would like to know the status please email me. David Fetter wrote: > > On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote: >> >> That's going to be a problem for the continued viability of >> Postgres. > > Funny, I thought running a DBMS over a known-unreliable storage system > was a problem for the continued viability of Oracle. When, not if, > people lose enough data to this silliness, they'll be thinking hard > about how to get Oracle out and something reliable in. > >> Clustered systems using a NAS for data is a pretty common >> configuration these days. Oracle specifically supports it and even >> complains if your NFS mount options are not correct. Our Oracle >> DBs run great in this same configuration and are a good 10-20 times >> faster than the local disk performance along with the quick >> take-over capability if a system goes belly up. > > Oracle stores more state to the disk than PostgreSQL does, which has > significant down sides. There are more effective ways of handling > uptime requirements than jamming NFS into the picture. Maybe it's > just my failure of imagination, but I can't think of a *less* > effective one. > >> I'll try to isolate this problem with a simple C program to tell me >> what software layer to look at. Hopefully it's just a configuration >> issue. > > It's not. The issue is that NFS is broken garbage from a DBMS, and, > it's pretty easy to argue, just about any other perspective. > > Cheers, > David. > >> >> Tom Lane-2 wrote: >> > >> > austijc <jaustin@jasononthe.net> writes: >> >> The question is can anyone more familiar with this tell me what's >> going >> >> on >> >> here? I don't know if this is a Postgres, Sun, or NetApp issue. >> Could >> >> it >> >> be a work around for an old Linux bug causing an issue with acceptable >> >> behavior of the NetApp device? >> > >> > People who try to run databases over NFS usually regret it eventually >> ;-) >> > >> > All I can say is that this error message has never before been reported >> > by anyone who wasn't exposed to that lseek-inconsistency kernel bug. >> > I am not finding it too hard to believe that NFS might be vulnerable to >> > similar misbehavior. >> > >> > regards, tom lane >> > >> > -- >> > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) >> > To make changes to your subscription: >> > http://www.postgresql.org/mailpref/pgsql-bugs >> > >> > >> >> -- >> View this message in context: >> http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19713228.html >> Sent from the PostgreSQL - bugs mailing list archive at Nabble.com. >> >> >> -- >> Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) >> To make changes to your subscription: >> http://www.postgresql.org/mailpref/pgsql-bugs > > -- > David Fetter <david@fetter.org> http://fetter.org/ > Phone: +1 415 235 3778 AIM: dfetter666 Yahoo!: dfetter > Skype: davidfetter XMPP: david.fetter@gmail.com > > Remember to vote! > Consider donating to Postgres: http://www.postgresql.org/about/donate > > -- > Sent via pgsql-bugs mailing list (pgsql-bugs@postgresql.org) > To make changes to your subscription: > http://www.postgresql.org/mailpref/pgsql-bugs > > -- View this message in context: http://www.nabble.com/ERROR%3A--unexpected-data-beyond-EOF-in-block-XXXXX-of-relation-%22file%22-tp19680438p19728120.html Sent from the PostgreSQL - bugs mailing list archive at Nabble.com.
David Fetter wrote: > On Sun, Sep 28, 2008 at 11:51:49AM -0700, austijc wrote: >> That's going to be a problem for the continued viability of >> Postgres. > > Funny, I thought running a DBMS over a known-unreliable storage system > was a problem for the continued viability of Oracle. When, not if, > people lose enough data to this silliness, they'll be thinking hard > about how to get Oracle out and something reliable in. NFS is not "unreliable", it is just different in some respects from other file systems. That paired with some poor NFS implementations in certain operating systems and this evident general misunderstanding make it a poor fit for PostgreSQL.