Thread: PostgreSQL, NetBSD and NFS

PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
I have posted before about this but I am now posting to both NetBSD and 
PostgreSQL since it seems to be some sort of interaction between the two.  I 
have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
with no problem whatsoever but as soon as I try to open the database the NFS 
mount hangs and I can't do any operations on that mounted drive without 
hanging.  Other things continue to run but the minute I do a df or an ls on 
that drive that terminal is lost.

On the NetBSD side I get a "server not responding" error.  On the filer I see 
no problems at all.  A reboot of the filer doesn't correct anything.

Since NetBSD works just fine with this until I start PostgreSQL and 
PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
there is something out of the ordinary about PostgreSQL's disk access that is 
triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
at all?  Perhaps that's the difference between PostgreSQL and other 
applications.

The NetApp people are being very helpful and are willing to follow up any 
leads people might have and may even suggest fixes if necessary.  I have 
Bcc'd the engineer on this message and will send anything I get to them.

-- 
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> I have posted before about this but I am now posting to both NetBSD and 
> PostgreSQL since it seems to be some sort of interaction between the two.  I 
> have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
> PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
> with no problem whatsoever but as soon as I try to open the database the NFS 
> mount hangs and I can't do any operations on that mounted drive without 
> hanging.

That's darn odd.  But please be more specific: what's "open the
database"?  Start the postmaster?  Start a psql?  Issue a query?

> Does the shared memory stuff use disk at all?

No, I can't see that there would be any connection there.

Perhaps the next thing to do is to strace (ktrace, trace, truss,
whatever system-call tracing utility you got) the postmaster and
child processes.  If we could determine what system call is hanging up,
we might be a little closer to solving the mystery.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
mlw
Date:
Forgive my stupidity, are you running PostgreSQL with the data on an NFS 
share?


D'Arcy J.M. Cain wrote:

>I have posted before about this but I am now posting to both NetBSD and 
>PostgreSQL since it seems to be some sort of interaction between the two.  I 
>have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
>PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
>with no problem whatsoever but as soon as I try to open the database the NFS 
>mount hangs and I can't do any operations on that mounted drive without 
>hanging.  Other things continue to run but the minute I do a df or an ls on 
>that drive that terminal is lost.
>
>On the NetBSD side I get a "server not responding" error.  On the filer I see 
>no problems at all.  A reboot of the filer doesn't correct anything.
>
>Since NetBSD works just fine with this until I start PostgreSQL and 
>PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
>there is something out of the ordinary about PostgreSQL's disk access that is 
>triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
>at all?  Perhaps that's the difference between PostgreSQL and other 
>applications.
>
>The NetApp people are being very helpful and are willing to follow up any 
>leads people might have and may even suggest fixes if necessary.  I have 
>Bcc'd the engineer on this message and will send anything I get to them.
>
>  
>




Re: PostgreSQL, NetBSD and NFS

From
Greg Copeland
Date:
That was going to be my question too.

I thought NFS didn't have some of the requisite file system behaviors
(locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
reliably.

Please correct as needed.

Regards,
Greg


On Thu, 2003-01-30 at 13:02, mlw wrote:
> Forgive my stupidity, are you running PostgreSQL with the data on an NFS 
> share?
> 
> 
> D'Arcy J.M. Cain wrote:
> 
> >I have posted before about this but I am now posting to both NetBSD and 
> >PostgreSQL since it seems to be some sort of interaction between the two.  I 
> >have a NetAPP filer on which I am putting a PostgreSQL database.  I run 
> >PostgreSQL on a NetBSD box.  I used rsync to get the database onto the filer 
> >with no problem whatsoever but as soon as I try to open the database the NFS 
> >mount hangs and I can't do any operations on that mounted drive without 
> >hanging.  Other things continue to run but the minute I do a df or an ls on 
> >that drive that terminal is lost.
> >
> >On the NetBSD side I get a "server not responding" error.  On the filer I see 
> >no problems at all.  A reboot of the filer doesn't correct anything.
> >
> >Since NetBSD works just fine with this until I start PostgreSQL and 
> >PostgreSQL, from all reports, works well with the NetApp filer, I assume that 
> >there is something out of the ordinary about PostgreSQL's disk access that is 
> >triggering some subtle bug in NetBSD.  Does the shared memory stuff use disk 
> >at all?  Perhaps that's the difference between PostgreSQL and other 
> >applications.
> >
> >The NetApp people are being very helpful and are willing to follow up any 
> >leads people might have and may even suggest fixes if necessary.  I have 
> >Bcc'd the engineer on this message and will send anything I get to them.
> >
> >  
> >
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
> 
> http://www.postgresql.org/users-lounge/docs/faq.html
-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
Greg Copeland <greg@CopelandConsulting.Net> writes:
> That was going to be my question too.
> I thought NFS didn't have some of the requisite file system behaviors
> (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
> reliably.

Whether the thing is trustworthy is a different issue ;-).  I was just
surprised that it didn't seem to work at all.

In practice, if the NFS server never goes down then you probably haven't
got a problem.  I'm not sure you could count on the database not getting
scrambled if the NFS server crashes.  But that wasn't the question...
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Larry Rosenman
Date:

--On Thursday, January 30, 2003 16:02:17 -0500 Tom Lane <tgl@sss.pgh.pa.us> 
wrote:

> Greg Copeland <greg@CopelandConsulting.Net> writes:
>> That was going to be my question too.
>> I thought NFS didn't have some of the requisite file system behaviors
>> (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
>> reliably.
>
> Whether the thing is trustworthy is a different issue ;-).  I was just
> surprised that it didn't seem to work at all.
>
> In practice, if the NFS server never goes down then you probably haven't
> got a problem.  I'm not sure you could count on the database not getting
> scrambled if the NFS server crashes.  But that wasn't the question...
FWIW I use a netapp filer for my databases here for traffic analysis and IP 
management.

The NETAPP has battery backed NVRAM and will replay the right stuff on it's 
own.

Just another datapoint.

LER

>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>



-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 972-414-9812                 E-Mail: ler@lerctr.org
US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749





Re: PostgreSQL, NetBSD and NFS

From
Curt Sampson
Date:
On Thu, 30 Jan 2003, D'Arcy J.M. Cain wrote:

> Does the shared memory stuff use disk at all? Perhaps that's the
> difference between PostgreSQL and other applications.

Shared memory in NetBSD is just an interface to mmap'd pages, so it can
be swapped to disk. But I assume your swap is not on NFS....

A ktrace would be helpful. Also, it would be helpful if you tried doing
an initdb to a directory on the filer to see if you can even create a
database cluster, and tried doing that or rsyncing and accessing your
data over NFS with a NetBSD system as the NFS server.

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote:
> Is postgreSQL trying to lock a file perhaps?  Would seem a sensible thing
> for it to be doing...

Is that a problem?  FWIW I am running statd and lockd on the NetBSD box.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Bill Studenmund
Date:
On Fri, 31 Jan 2003, D'Arcy J.M. Cain wrote:

> On Thursday 30 January 2003 12:07, Tom Lane wrote:
> > Perhaps the next thing to do is to strace (ktrace, trace, truss,
> > whatever system-call tracing utility you got) the postmaster and
> > child processes.  If we could determine what system call is hanging up,
> > we might be a little closer to solving the mystery.
>
> Ktrace.  Yes, am doing another test at the moment - using 100Mb to 100Mb and
> TCP option to the mount.  Before I was using the default UDP and going 100Mb
> to 1000 Mb.  If this works I will try my "guaranteed" fail next and will add
> ktrace.  In fact, I will do that regardless.

Look at the -t option to ktrace. It controls what ktrace looks at
(syscalls, NAMEI lookups, etc.). Most importantly, you might want to NOT
include the 'i' option in there, which is in there by default. It logs the
data of all i/o transfers, which baloons the logs. While you may need the
data in the end, tracing w/o 'i' could show you the syscalls around the
failure which might be enough.

Take care,

Bill



Re: PostgreSQL, NetBSD and NFS

From
mlw
Date:
D'Arcy J.M. Cain wrote:

>On Thursday 30 January 2003 14:02, mlw wrote:
>  
>
>>Forgive my stupidity, are you running PostgreSQL with the data on an NFS
>>share?
>>    
>>
>
>Yes, sorry.  PostgreSQL is running from the local disk but the data is on the 
>mounted drive.
>
I'm not sure, I guess it could work, but NFS is a pretty poor file 
system. There are always issues with file locking across various 
platforms. I recall reading about mmap issues across NFS a while ago 
(forget the platform, sorry). Depending on the NFS server, there may be 
problems there. The NFS client may also have isses with locking, fsync, 
and mmap.

If possible, look for a network block device protocol. The file level 
NFS is probably inadequate for PostgreSQL.



Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Thursday 30 January 2003 14:27, Greg Copeland wrote:
> That was going to be my question too.
>
> I thought NFS didn't have some of the requisite file system behaviors
> (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
> reliably.
>
> Please correct as needed.

Yes, doubly so here please.  I think I remember someone else saying that they
use PostgreSQL over NFS so hopefully this is not the situation.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Thursday 30 January 2003 14:02, mlw wrote:
> Forgive my stupidity, are you running PostgreSQL with the data on an NFS
> share?

Yes, sorry.  PostgreSQL is running from the local disk but the data is on the
mounted drive.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Thursday 30 January 2003 12:07, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > I have posted before about this but I am now posting to both NetBSD and
> > PostgreSQL since it seems to be some sort of interaction between the two.
> >  I have a NetAPP filer on which I am putting a PostgreSQL database.  I
> > run PostgreSQL on a NetBSD box.  I used rsync to get the database onto
> > the filer with no problem whatsoever but as soon as I try to open the
> > database the NFS mount hangs and I can't do any operations on that
> > mounted drive without hanging.
>
> That's darn odd.  But please be more specific: what's "open the
> database"?  Start the postmaster?  Start a psql?  Issue a query?

Start the postmaster.  It is possible that I have a corrupted database but I
was using that as a debugging tool because I still don't think that the whole
NFS subsystem should lock up.  The other time I tested it took hours to fail
and I found it useful to have an immediate fail.

> Perhaps the next thing to do is to strace (ktrace, trace, truss,
> whatever system-call tracing utility you got) the postmaster and
> child processes.  If we could determine what system call is hanging up,
> we might be a little closer to solving the mystery.

Ktrace.  Yes, am doing another test at the moment - using 100Mb to 100Mb and
TCP option to the mount.  Before I was using the default UDP and going 100Mb
to 1000 Mb.  If this works I will try my "guaranteed" fail next and will add
ktrace.  In fact, I will do that regardless.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> 100Mb instead of 100Mb -->1000Mb.  I tried mounting with and without the TCP 
> option and it seemed to act the same but it was better than before.  Now it 
> doesn't crash but trying to load a large table hangs.  It gets to a point 
> where it is calling semop over and over getting a 0 return.  It does that 81 
> times in 0.989004 seconds and then hangs in the PostgreSQL code.  It must be 
> in some sort of busy loop because there are no further system calls after the
> last semop return and the CPU usage continues to climb.

Very bizarre.  Looks like the last page it read was block 104
(851968/8192) in file "/source/data/cert/base/16556/17063".  Could you
provide a formatted dump of that page?  I'm partial to pg_filedump which
you can get from http://sources.redhat.com/rhdb/tools.html.  Use
switches -f -i to get a reasonably complete dump.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Saturday 01 February 2003 13:09, Tom Lane wrote:
> Very bizarre.  Looks like the last page it read was block 104
> (851968/8192) in file "/source/data/cert/base/16556/17063".  Could you
> provide a formatted dump of that page?  I'm partial to pg_filedump which
> you can get from http://sources.redhat.com/rhdb/tools.html.  Use
> switches -f -i to get a reasonably complete dump.

That's a 4.7 MB file.  The dump might be quite huge.  I can send you the file
itself (privately) if you want.  Wouldn't that be even better?

I can tell you what the file is.  It is the primary key file for the
certificate database which is the 8 million record table that I am trying to
load.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> That's a 4.7 MB file.  The dump might be quite huge.

I really just want to see the dump of that one page, and maybe the pages
before and after it for comparison's sake.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
What was the query it failed on, exactly?  That last page it read
seems to be an empty index page --- it should have moved on to the
next index page, I'd think, rather than doing anything that could
hang up.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Saturday 01 February 2003 14:00, Tom Lane wrote:
> What was the query it failed on, exactly?  That last page it read
> seems to be an empty index page --- it should have moved on to the
> next index page, I'd think, rather than doing anything that could
> hang up.

Here's the log.  As you can see, nothing was logged after the COPY command.

It's possible that the file was corrupted.  I will do a new test from scratch
now that I am not switching speeds.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.

Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Here's the log.  As you can see, nothing was logged after the COPY command.

What else was going on?  As far as I can see, the code never does a
semop unless it's waiting for some other backend process.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Saturday 01 February 2003 14:43, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Here's the log.  As you can see, nothing was logged after the COPY
> > command.
>
> What else was going on?  As far as I can see, the code never does a
> semop unless it's waiting for some other backend process.

Nothing except the standard background processes are running.  The ktrace.out
I gave the ftp address for has everything that that instance of PostgreSQL
was doing.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> On Saturday 01 February 2003 14:43, Tom Lane wrote:
>> What else was going on?  As far as I can see, the code never does a
>> semop unless it's waiting for some other backend process.

> Nothing except the standard background processes are running.

More and more bizarre.  What is the hardware platform --- does it have TAS?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Curt Sampson
Date:
On Fri, 31 Jan 2003, mlw wrote:

> . There are always issues with file locking across various
> platforms. I recall reading about mmap issues across NFS a while ago...

Postgres uses neither of these, IIRC, so that should be fine. (Actually,
postgres does effectively use mmap for shared memory on NetBSD, but
that's not mapping data on the NFS filesystem, so it's not an issue.)

> The NFS client may also have isses with locking, fsync, and mmap.

Any fsync problems would affect data integrity during a crash, but
nothing otherwise.

(Of course, I'm happy to be corrected on any of these issues, if someone
can point out particular parts of postgres that would fail over NFS.)

cjs
-- 
Curt Sampson  <cjs@cynic.net>   +81 90 7737 2974   http://www.netbsd.org   Don't you know, in this new Dark Age, we're
alllight.  --XTC
 


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Saturday 01 February 2003 15:48, Tom Lane wrote:
> More and more bizarre.  What is the hardware platform --- does it have TAS?

NetBSD on a Pentium (i386 port) so yes, it does have TAS.  I assume you were
thinking about the spinlock emulation.

I have been looking through backend/storage/lmgr/lwlock.c and
backend/storage/lmgr/spin.c myself and can't find any place that it can get
into an infinite loop without making a system call within the loop.  It's
very odd.  Also odd, why would running over NFS have any bearing on it if we
could find such a place?

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Also odd, why would running over NFS have any bearing on it if we 
> could find such a place?

Yup, 'tis the question.  The only theory I have been able to come up
with is that there's something flaky about your network hardware,
such that Postgres sometimes reads bad data from the NFS server.
But the glaring problem with that theory is that bad data coming
from a regular disk drive generally results in error messages or
core dumps.  Silent hangs would be a new behavior AFAIR.

At this point I think you need to rebuild with --enable-debug and
--enable-cassert (if you didn't already) and then capture some
stack traces from the stuck backend.  We have to find out what the
backend thinks it's doing.

BTW: *are* we certain it's associated with NFS, and not a hardware
problem on your NetBSD box?  Can you perform the same tests running
the database off a local disk?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Sunday 02 February 2003 12:26, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Also odd, why would running over NFS have any bearing on it if we
> > could find such a place?
>
> Yup, 'tis the question.  The only theory I have been able to come up
> with is that there's something flaky about your network hardware,

Possible but two separate networks?

> At this point I think you need to rebuild with --enable-debug and
> --enable-cassert (if you didn't already) and then capture some
> stack traces from the stuck backend.  We have to find out what the
> backend thinks it's doing.

That was going to be my next step.

> BTW: *are* we certain it's associated with NFS, and not a hardware
> problem on your NetBSD box?  Can you perform the same tests running
> the database off a local disk?

That box is running 5 production database engines on 5 different ports.  This
is the 6th one and the only difference is that it is running from the NFS
mounted drive.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Sunday 02 February 2003 12:26, Tom Lane wrote:
> At this point I think you need to rebuild with --enable-debug and
> --enable-cassert (if you didn't already) and then capture some
> stack traces from the stuck backend.  We have to find out what the
> backend thinks it's doing.

Well, it does appear to be working but it never finishes.  Here are two
backtraces.  One was taken while it was running and the other after a kill
-9.  The primary key file should have had 322846720 bytes based on the
database that I was copying in but it only had 4603904 after running the
restore for 12 hours.  The file seems to get to a static size and just stays
there.  I am running another test to confirm that.


(gdb) bt
#0  LWLockAcquire (lockid=7272, mode=LW_SHARED) at lwlock.c:236
#1  0x8110417 in LockBuffer (buffer=3626, mode=1) at bufmgr.c:2004
#2  0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=6, access=1) at
nbtpage.c:321
#3  0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,   scankey=0x83b90c0, access=1) at nbtsearch.c:159
#4  0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,   bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#5  0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,   index_is_unique=1 '\001', heapRel=0x83a6b78) at
nbtinsert.c:101
#6  0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#7  0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,   arg2=3217017956, arg3=3217017940, arg4=138124076,
arg5=138046328)  at fmgr.c:1247 
#8  0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,   nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c,
heapRel=0x83a6b78)  at indexam.c:193 
#9  0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,   estate=0x83b9a48, is_update=0) at
execUtils.c:668
#10 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',   fp=0x0, delim=0x8193d36 "\t",
null_print=0x8193d38"\\N") at copy.c:927 
#11 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',   oids=0 '\000', from=1 '\001', pipe=1
'\001',filename=0x0,   delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336 
#12 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,   completionTag=0xbfbfcdfc "") at utility.c:341
#13 0x811cc46 in pg_exec_query_string (   query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
parse_context=0x83676a0)at postgres.c:766 
#14 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,   username=0x833c525 "darcy") at postgres.c:1926
#15 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#16 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#17 0x8101bbf in ServerLoop () at postmaster.c:995
#18 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#19 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#20 0x8067559 in ___start ()
(gdb) cont
Continuing.

Program received signal SIGKILL, Killed.
0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
lwlock.c:199: No such file or directory.
(gdb) bt
#0  0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199
#1  0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=404, access=1)   at nbtpage.c:321
#2  0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1,   scankey=0x83b90c0, access=1) at nbtsearch.c:159
#3  0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0,   bufP=0xbfbfcb04, access=2) at nbtsearch.c:105
#4  0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c,   index_is_unique=1 '\001', heapRel=0x83a6b78) at
nbtinsert.c:101
#5  0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283
#6  0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360,   arg2=3217017956, arg3=3217017940, arg4=138124076,
arg5=138046328)  at fmgr.c:1247 
#7  0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64,   nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c,
heapRel=0x83a6b78)  at indexam.c:193 
#8  0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c,   estate=0x83b9a48, is_update=0) at
execUtils.c:668
#9  0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000',   fp=0x0, delim=0x8193d36 "\t",
null_print=0x8193d38"\\N") at copy.c:927 
#10 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000',   oids=0 '\000', from=1 '\001', pipe=1
'\001',filename=0x0,   delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336 
#11 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote,   completionTag=0xbfbfcdfc "") at utility.c:341
#12 0x811cc46 in pg_exec_query_string (   query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote,
parse_context=0x83676a0)at postgres.c:766 
#13 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008,   username=0x833c525 "darcy") at postgres.c:1926
#14 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243
#15 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874
#16 0x8101bbf in ServerLoop () at postmaster.c:995
#17 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771
#18 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206
#19 0x8067559 in ___start ()

>
> BTW: *are* we certain it's associated with NFS, and not a hardware
> problem on your NetBSD box?  Can you perform the same tests running
> the database off a local disk?
>
>             regards, tom lane

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Well, it does appear to be working but it never finishes.  Here are two 
> backtraces.  One was taken while it was running and the other after a kill 
> -9.  The primary key file should have had 322846720 bytes based on the 
> database that I was copying in but it only had 4603904 after running the 
> restore for 12 hours.  The file seems to get to a static size and just stays 
> there.  I am running another test to confirm that.

Hmm --- seems like it must be getting into an infinite loop, but where
and why?  Here is a test plan:

1. Run it, let it reach the point where the file size stops growing.

2. Attach to process with gdb.  Repeatedly do 'fin' to finish out current
function call, until the prompt doesn't come back any more.  Whichever
level of function didn't finish reasonably quickly is the one that's
looping.

3. Control-C to get control back in gdb.  Do 'fin' enough times to get
back to the looping function, but not the extra time to let it run.
Now, use 'next' repeatedly to see just what lines it's circling around
in, and print out the values of its local variables as it does so.

That info should move the investigation forward ...

From looking at your existing dumps I will hazard a guess that
_bt_moveright is looping ... but why?  And why should that happen
only with NFS?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Wednesday 05 February 2003 10:12, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Well, it does appear to be working but it never finishes.  Here are two
> > backtraces.  One was taken while it was running and the other after a
> > kill -9.  The primary key file should have had 322846720 bytes based on
> > the database that I was copying in but it only had 4603904 after running
> > the restore for 12 hours.  The file seems to get to a static size and
> > just stays there.  I am running another test to confirm that.
>
> Hmm --- seems like it must be getting into an infinite loop, but where
> and why?  Here is a test plan:

Hmm.  This time it passed that point but this happened:

COPY "certificate" FROM stdin;
NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page
unfound - fixing branch
ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item
order(1) (need to recreate index)
lost synchronization with server, resetting connection

It then continued on.  It is currently stuck on the next largest table in our
system.  I will try this if it hangs on that other table.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> Hmm.  This time it passed that point but this happened:

> COPY "certificate" FROM stdin;
> NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page 
> unfound - fixing branch
> ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item 
> order(1) (need to recreate index)

Hoo boy.  I was already suspecting data corruption in the index, and
this looks like more of the same.  My thoughts are definitely straying
in the direction of "the NFS server is dropping bits, somehow".

Both this and the (admittedly unproven) bt_moveright loop suggest
corrupted values in the cross-page links that exist at the very end of
each btree index page.  I wonder if it is possible that, every so often,
you are losing just the last few bytes of an NFS transfer?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Wednesday 05 February 2003 11:49, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > Hmm.  This time it passed that point but this happened:
> >
> > COPY "certificate" FROM stdin;
> > NOTICE:  copy: line 253677, bt_insertonpg[certificate_pkey]: parent page
> > unfound - fixing branch
> > ERROR:  copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item
> > order(1) (need to recreate index)
>
> Hoo boy.  I was already suspecting data corruption in the index, and
> this looks like more of the same.  My thoughts are definitely straying
> in the direction of "the NFS server is dropping bits, somehow".
>
> Both this and the (admittedly unproven) bt_moveright loop suggest
> corrupted values in the cross-page links that exist at the very end of
> each btree index page.  I wonder if it is possible that, every so often,
> you are losing just the last few bytes of an NFS transfer?

Yah, that's kind of what it looked like when I tried this before Christmas too
although the actual errors differd.  At that time I got a PostgreSQL error
that implied that something that was just written was not there when it went
back.  Almost like a flushing issue.

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> On Wednesday 05 February 2003 11:49, Tom Lane wrote:
>> I wonder if it is possible that, every so often,
>> you are losing just the last few bytes of an NFS transfer?

> Yah, that's kind of what it looked like when I tried this before
> Christmas too although the actual errors differd.

The observed behavior could vary wildly depending on what data happened to
get "read".

Wild thought here: can you reduce the MTU on the LAN linking the NFS
server to the NetBSD box?  If so, does it help?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Justin Clift
Date:
Tom Lane wrote:
<snip>
> Hoo boy.  I was already suspecting data corruption in the index, and
> this looks like more of the same.  My thoughts are definitely straying
> in the direction of "the NFS server is dropping bits, somehow".
> 
> Both this and the (admittedly unproven) bt_moveright loop suggest
> corrupted values in the cross-page links that exist at the very end of
> each btree index page.  I wonder if it is possible that, every so often,
> you are losing just the last few bytes of an NFS transfer?

Hmmm... does anyone remember the name of that NFS testing tool the 
FreeBSD guys were using?  Think it came from Apple.  They used it to 
find and isolate bugs in the FreeBSD code a while ago.

Sounds like it might be useful here.

:-)

Regards and best wishes,

Justin Clift


>             regards, tom lane


-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



Re: PostgreSQL, NetBSD and NFS

From
Greg Copeland
Date:
On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > On Wednesday 05 February 2003 11:49, Tom Lane wrote:
> >> I wonder if it is possible that, every so often,
> >> you are losing just the last few bytes of an NFS transfer?
> 
> > Yah, that's kind of what it looked like when I tried this before
> > Christmas too although the actual errors differd.
> 
> The observed behavior could vary wildly depending on what data happened to
> get "read".
> 
> Wild thought here: can you reduce the MTU on the LAN linking the NFS
> server to the NetBSD box?  If so, does it help?
> 

Tom,

I'm curious as to why you think adjusting the MTU may have an effect on
this.  Lowering the MTU may actually increase fragmentation, lower
efficiency, and even exacerbate the situation.

Is this purely a diagnostic suggestion?


Regards,

-- 
Greg Copeland <greg@copelandconsulting.net>
Copeland Computer Consulting



Re: PostgreSQL, NetBSD and NFS

From
James Hubbard
Date:
Justin Clift wrote:
> Hmmm... does anyone remember the name of that NFS testing tool the 
> FreeBSD guys were using?  Think it came from Apple.  They used it to 
> find and isolate bugs in the FreeBSD code a while ago.
> 
> Sounds like it might be useful here.
> 
> :-)
> 

You can find a write about it here:
http://kerneltrap.org/node.php?id=327

The actual link to the source
http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/

James




Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
Greg Copeland <greg@copelandconsulting.net> writes:
> On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
>> Wild thought here: can you reduce the MTU on the LAN linking the NFS
>> server to the NetBSD box?  If so, does it help?

> I'm curious as to why you think adjusting the MTU may have an effect on
> this.  Lowering the MTU may actually increase fragmentation, lower
> efficiency, and even exacerbate the situation.

I'm thinking maybe one or both LAN cards have a problem with packets
exceeding a certain size.

> Is this purely a diagnostic suggestion?

Well, if it changes anything then it would definitely show there's a
hardware problem to fix...
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Justin Clift
Date:
James Hubbard wrote:
> Justin Clift wrote:
> 
>> Hmmm... does anyone remember the name of that NFS testing tool the 
>> FreeBSD guys were using?  Think it came from Apple.  They used it to 
>> find and isolate bugs in the FreeBSD code a while ago.
>>
>> Sounds like it might be useful here.
>>
>> :-)
> 
> You can find a write about it here:
> http://kerneltrap.org/node.php?id=327
> 
> The actual link to the source
> http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/

Thanks James.

That's definitely the one.

D'Arcy, if you want to test if your NFS layer is stable, this might 
really help.  It's a single C file that get compiled, and you run it 
against a remote NFS file.

This is supposed to be one of those tools that will try to trip up the 
NFS layer in every possible way, without violating the spec, etc.

Hope this is useful.

:)

Regards and best wishes,

Justin Clift

> James


-- 
"My grandfather once told me that there are two kinds of people: those
who work and those who take the credit. He told me to try to be in the
first group; there was less competition there."
- Indira Gandhi



Re: PostgreSQL, NetBSD and NFS

From
Kevin Brown
Date:
Tom Lane wrote:
> Greg Copeland <greg@copelandconsulting.net> writes:
> > On Wed, 2003-02-05 at 11:18, Tom Lane wrote:
> >> Wild thought here: can you reduce the MTU on the LAN linking the NFS
> >> server to the NetBSD box?  If so, does it help?
> 
> > I'm curious as to why you think adjusting the MTU may have an effect on
> > this.  Lowering the MTU may actually increase fragmentation, lower
> > efficiency, and even exacerbate the situation.
> 
> I'm thinking maybe one or both LAN cards have a problem with packets
> exceeding a certain size.

But he's using NFS over TCP, so any traffic that gets truncated or
dropped should simply result in a TCP retransmit (since the packet's
data won't match its checksum anymore, and it'll get dropped on the
floor).

Of course, if the NFS layer is actually transferring data via UDP
despite explicitly being told to mount via TCP, that's something else.
It might be useful to verify via netstat that an actual TCP connection
to the NFS server is being established and used.


Makes me wonder if this might be a problem at the NFS protocol
layer...



-- 
Kevin Brown                          kevin@sysexperts.com


Re: PostgreSQL, NetBSD and NFS

From
"D'Arcy J.M. Cain"
Date:
On Wednesday 05 February 2003 13:04, Ian Fry wrote:
> > Wild thought here: can you reduce the MTU on the LAN linking the NFS
> > server to the NetBSD box?  If so, does it help?
>
> How about adjusting the read and write-size used by the NetBSD machine? I
> think the default is 32k for both read and write on i386 machines now.
> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
> IIRC)

Hey!  That did it.  I hadn't tried that before because I had tried using the
tcp option to mount and the docs suggested that that would do more than
reducing the block size.  Besides, the man page didn't give the defaults and
I was uncomfortable changing something when I didn't know from what.

So, why does this fix it?  It seems to me that it should have worked anyway.
This feels rather fragile.  I doubt that it is hardware related because I dad
tried it on the other ethernet interface in the machine which was on a
completely different network than the one I am on now.

What is the implication of smaller read and write size?  Will I necessarily
take a performance hit?

--
D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
http://www.druid.net/darcy/                |  and a sheep voting on
+1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
"D'Arcy J.M. Cain" <darcy@druid.net> writes:
> On Wednesday 05 February 2003 13:04, Ian Fry wrote:
>> How about adjusting the read and write-size used by the NetBSD machine? I
>> think the default is 32k for both read and write on i386 machines now.
>> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
>> IIRC)

> Hey!  That did it.

Hot diggety!

> So, why does this fix it?

I think now you file a bug report with the NetBSD kernel folk.  My
thoughts are running in the direction of a bug having to do with
scattering a 32K read into multiple kernel disk-cache buffers or
gathering together multiple cache buffer contents to form a 32K write.
Unless NetBSD has changed from its heritage, the kernel disk cache
buffers are 8K, and so an 8K NFS read or write would never cross a
cache buffer boundary.  But 32K would.

Or it could be a similar bug on the NFS server's side?
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
Tom Lane
Date:
Thor Lancelot Simon <tls@rek.tjls.com> writes:
>> Unless NetBSD has changed from its heritage, the kernel disk cache
>> buffers are 8K, and so an 8K NFS read or write would never cross a
>> cache buffer boundary.  But 32K would.

> I don't know what "heritage" you're referring to, but it has never been
> the case that NetBSD's buffer cache has used fixed-size 8K disk buffers,
> and I don't believe that it was ever the case for any Net2 or 4.4-derived
> system.

Could be.  By "heritage" I meant BSD-without-any-adjective.  It is
perfectly clear from Leffler, McKusick et al. (_The Design and
Implementation of the 4.3BSD UNIX Operating System_) that back then,
8K was the standard filesystem block size.

However, I was just guessing that that might have anything to do with
the problem.  It does seem clear now that we are looking at a kernel
or network bug, though.
        regards, tom lane


Re: PostgreSQL, NetBSD and NFS

From
"Christopher Kings-Lynne"
Date:
> Hmmm... does anyone remember the name of that NFS testing tool the 
> FreeBSD guys were using?  Think it came from Apple.  They used it to 
> find and isolate bugs in the FreeBSD code a while ago.

fsx

Chris



Re: PostgreSQL, NetBSD and NFS

From
Andrew Gillham
Date:
On Wed, Feb 05, 2003 at 09:24:48PM +0000, David Laight wrote:
> > If he is using UDP rather than TCP
> > as the transport layer, another potential issue is that 32K requests will
> > end up as IP packets with a very large number of fragments, potentially
> > exposing some kind of network stack bug in which the last fragment is
> > dropped or corrupted.
> 
> Actually it is worse that that, and IMHO 32k UDP requests are asking for
> trouble.
> 
> A 32k UDP datagram is about 22 ethernet packets.  If ANY of them is
> lost on the network, then the entire datagram is lost.  NFS must
> regenerate the request on a timeout.  The receiving system won't
> report that it is missing a fragment.

As he stated several times, he has tested with TCP mounts and observed
the same issue.  So the above issue shouldn't be related.

> There are also an lot of ethernet cards out there which don't have
> enough buffer space for 32k of receive data.   Not to mention the
> fact that NFS can easily (at least on some systems) generate
> concurrent requests for different parts of the same file.
> 
> I would suggest reducing the size back to 8k, even that causes
> trouble with some cards.

If NetBSD as an NFS client is this fragile we have problems.  The default
read/write size shouldn't be 32kB if that is not going to work reliably.

> It should also be realised that transmitting 22 full sized, back
> to back frames on the ethernet doesn't do anything for sharing
> the bandwidth betweenn different users.  The MAC layer has to very
> aggressive in order to get a packet in edgeways (so to speak).

So what?  If it is a switched network, which I assume it is since he was
talking to the NetApp gigabit port earlier, then this is irrelevant.  Even
the $40 Fry's switches are more or less non-blocking. 

Even if he is saturating the local *hub*, it shouldn't cause NetBSD to fail,
it would just be rude. :-)

There could be some packet mangling on the network, checking the amount
of retransmissions on either end of the TCP connection should give you an
idea about that.

-Andrew


Re: PostgreSQL, NetBSD and NFS

From
Byron Servies
Date:
On February 06, 2003 at 03:50, Justin Clift wrote:
> Tom Lane wrote:
> <snip>
> >Hoo boy.  I was already suspecting data corruption in the index, and
> >this looks like more of the same.  My thoughts are definitely straying
> >in the direction of "the NFS server is dropping bits, somehow".
> >
> >Both this and the (admittedly unproven) bt_moveright loop suggest
> >corrupted values in the cross-page links that exist at the very end of
> >each btree index page.  I wonder if it is possible that, every so often,
> >you are losing just the last few bytes of an NFS transfer?
> 
> Hmmm... does anyone remember the name of that NFS testing tool the 
> FreeBSD guys were using?  Think it came from Apple.  They used it to 
> find and isolate bugs in the FreeBSD code a while ago.
> 
> Sounds like it might be useful here.
> 
> :-)
> 

fsx.  See also <http://www.connectathon.org>

hth,

Byron


Re: PostgreSQL, NetBSD and NFS

From
Ian Fry
Date:
On Wed, Feb 05, 2003 at 12:18:29PM -0500, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > On Wednesday 05 February 2003 11:49, Tom Lane wrote:
> >> I wonder if it is possible that, every so often,
> >> you are losing just the last few bytes of an NFS transfer?
> > Yah, that's kind of what it looked like when I tried this before
> > Christmas too although the actual errors differd.
> Wild thought here: can you reduce the MTU on the LAN linking the NFS
> server to the NetBSD box?  If so, does it help?

How about adjusting the read and write-size used by the NetBSD machine? I think
the default is 32k for both read and write on i386 machines now. Perhaps try
setting them back to 8k (it's the -r and -w flags to mount_nfs, IIRC)

Ian.



Re: PostgreSQL, NetBSD and NFS

From
Thor Lancelot Simon
Date:
On Wed, Feb 05, 2003 at 03:45:11PM -0500, Tom Lane wrote:
> Thor Lancelot Simon <tls@rek.tjls.com> writes:
> >> Unless NetBSD has changed from its heritage, the kernel disk cache
> >> buffers are 8K, and so an 8K NFS read or write would never cross a
> >> cache buffer boundary.  But 32K would.
> 
> > I don't know what "heritage" you're referring to, but it has never been
> > the case that NetBSD's buffer cache has used fixed-size 8K disk buffers,
> > and I don't believe that it was ever the case for any Net2 or 4.4-derived
> > system.
> 
> Could be.  By "heritage" I meant BSD-without-any-adjective.  It is
> perfectly clear from Leffler, McKusick et al. (_The Design and
> Implementation of the 4.3BSD UNIX Operating System_) that back then,
> 8K was the standard filesystem block size.

FWIW, the fact that the default block size for one particular on-disk
filesystem happens to be 8K doesn't really imply anything about the
design or implementation of the buffer cache, certainly not that it
uses fixed-size buffers that are each 8K in size.  This is particularly
evident in this case, since there's not even any on-disk filesystem
involved, and NFS doesn't really have a "block size" in the sense in
which you seem to be using that term.

I don't have my copy of the 4.3 book here for comparison, but the 4.4
book makes the data structures associated with the old-style buffer
cache pretty clear: buffers of fixed virtual but variable physical size, 
each with memory pages attached as necessary, so a single buffer may
cache anywhere from a single page to MAXPHYS (usually 64K) of data.  This
code isn't used for ordinary file data in NetBSD any longer, but it is
in fact the way the Berkeley code works and it's how things worked in its
various descendants for a long time.

Of course, the way the 4.4BSD NFS code interfaces to the buffer cache is
pretty strange, twisty, and horrible and there have been and probably
still are bugs there.  The one you suggest seems pretty odd, however,
since there are no physical disk blocks involved as this system is an
NFS *client*.

-- Thor Lancelot Simon                                          tls@rek.tjls.com  But as he knew no bad language, he
hadcalled him all the names of commonobjects that he could think of, and had screamed: "You lamp!  You towel!
Youplate!"and so on.              --Sigmund Freud
 


Re: PostgreSQL, NetBSD and NFS

From
Thor Lancelot Simon
Date:
On Wed, Feb 05, 2003 at 03:09:09PM -0500, Tom Lane wrote:
> "D'Arcy J.M. Cain" <darcy@druid.net> writes:
> > On Wednesday 05 February 2003 13:04, Ian Fry wrote:
> >> How about adjusting the read and write-size used by the NetBSD machine? I
> >> think the default is 32k for both read and write on i386 machines now.
> >> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs,
> >> IIRC)
> 
> > Hey!  That did it.
> 
> Hot diggety!
> 
> > So, why does this fix it?

Who knows.  One thing that I'd be interested to know is whether Darcy is
using NFSv2 or NFSv3 -- 32k requests are not, strictly speaking, within
the bounds of the v2 specification.  If he is using UDP rather than TCP
as the transport layer, another potential issue is that 32K requests will
end up as IP packets with a very large number of fragments, potentially
exposing some kind of network stack bug in which the last fragment is
dropped or corrupted (I would suspect that the likelihood of such a bug
in the NetApp stack is quite low, however).  If feasible, it is probably
better to use TCP as the transport and let it handle segmentation whether
the request size is 8K or 32K.

> I think now you file a bug report with the NetBSD kernel folk.  My
> thoughts are running in the direction of a bug having to do with
> scattering a 32K read into multiple kernel disk-cache buffers or
> gathering together multiple cache buffer contents to form a 32K write.

That doesn't make much sense to me.  Pages on i386 are 4K, so whether he
does 8K writes or 32K writes, it will always come from multiple pages in
the pagecache.

> Unless NetBSD has changed from its heritage, the kernel disk cache
> buffers are 8K, and so an 8K NFS read or write would never cross a
> cache buffer boundary.  But 32K would.

I don't know what "heritage" you're referring to, but it has never been
the case that NetBSD's buffer cache has used fixed-size 8K disk buffers,
and I don't believe that it was ever the case for any Net2 or 4.4-derived
system.

> Or it could be a similar bug on the NFS server's side?

That's concievable.  Of course, a client bug is quite possible, as well,
but I don't think the mechanism you suggest is likely.

-- Thor Lancelot Simon                                          tls@rek.tjls.com  But as he knew no bad language, he
hadcalled him all the names of commonobjects that he could think of, and had screamed: "You lamp!  You towel!
Youplate!"and so on.              --Sigmund Freud
 


Re: PostgreSQL, NetBSD and NFS

From
David Laight
Date:
> If he is using UDP rather than TCP
> as the transport layer, another potential issue is that 32K requests will
> end up as IP packets with a very large number of fragments, potentially
> exposing some kind of network stack bug in which the last fragment is
> dropped or corrupted.

Actually it is worse that that, and IMHO 32k UDP requests are asking for
trouble.

A 32k UDP datagram is about 22 ethernet packets.  If ANY of them is
lost on the network, then the entire datagram is lost.  NFS must
regenerate the request on a timeout.  The receiving system won't
report that it is missing a fragment.

If fragments are being lost, the receiving system also starts to hit
a buffer crisis because of all the incomplete requests it is still
hoping it might receive the missing fragment for.  After all the IP
layer won't know the retransmittion is anything special.

There are also an lot of ethernet cards out there which don't have
enough buffer space for 32k of receive data.   Not to mention the
fact that NFS can easily (at least on some systems) generate
concurrent requests for different parts of the same file.

I would suggest reducing the size back to 8k, even that causes
trouble with some cards.

It should also be realised that transmitting 22 full sized, back
to back frames on the ethernet doesn't do anything for sharing
the bandwidth betweenn different users.  The MAC layer has to very
aggressive in order to get a packet in edgeways (so to speak).
David

-- 
David Laight: david@l8s.co.uk


Re: PostgreSQL, NetBSD and NFS

From
Greywolf
Date:
On Wed, 5 Feb 2003, Tom Lane wrote:

[TL: Could be.  By "heritage" I meant BSD-without-any-adjective.  It is
[TL: perfectly clear from Leffler, McKusick et al. (_The Design and
[TL: Implementation of the 4.3BSD UNIX Operating System_) that back then,
[TL: 8K was the standard filesystem block size.

"FS block size" !=  "Disk Buffer Size".  Though 8k might have been the
standard FS block size, it was possible -- and occasionally practiced
-- to do 4k/512 filesystems, or 16k/2k filesystems, or M/N filesystems
where { 4k < M < 16k (maybe 32k), log2(M) == int(log2(M)),
log2(N) == int(log2(N)) and M/N <= 8 }.

            --*greywolf;
--
NetBSD: making all computer hardware a commodity.



Re: PostgreSQL, NetBSD and NFS

From
Greywolf
Date:
On Wed, 5 Feb 2003, D'Arcy J.M. Cain wrote:

[DJC: This feels rather fragile.  I doubt that it is hardware related because I dad
[DJC: tried it on the other ethernet interface in the machine which was on a
[DJC: completely different network than the one I am on now.

All I can offer up is that at one point I had to reduce to 16k NFSIO
when I replaced a switch (you didn't replace a switch, did you?) between
my i386 and my sparc (my le0 and the switch didn't play nicely together;
once I got the hme0 in, everything was happy as a clam).

[DJC: What is the implication of smaller read and write size?  Will I
[DJC: necessarily take a performance hit?

I didn't start noticing observable degradation across 100TX until I
dropped NFSIO to 4k (which I did purely for benchmarking statistics).

The differences between 8k, 16k and 32k have not been noticeable
to me.  32k IO would hang my system at one point; since that time,
something appears to have been fixed.

[DJC: --
[DJC: D'Arcy J.M. Cain <darcy@{druid|vex}.net>   |  Democracy is three wolves
[DJC: http://www.druid.net/darcy/                |  and a sheep voting on
[DJC: +1 416 425 1212     (DoD#0082)    (eNTP)   |  what's for dinner.
[DJC:

            --*greywolf;
--
NetBSD: Servers' choice!



Re: PostgreSQL, NetBSD and NFS

From
"Michael Hertrick"
Date:
I've been watching this thread since the beginning, and now that y'all
brought up networking, I believe I may have some useful suggestions in that
arena.
Tom Lane <tgl@sss.pgh.pa.us> writes:
> I'm thinking maybe one or both LAN cards have a problem with packets
> exceeding a certain size.
>

Are all the intermediate network devices at layer 2 (switches)?  If so, a
simple look at counters for those ports involved would rule out or in any
problems with those network devices.
I'm sure that if you have an MTU of 1500 bytes across the board (on the
hosts and the switch(es)) then you will not have a problem with
fragmentation at that layer on 100 Mbit Ethernet.  Make sure you're at
100baseTX-FDX.

If you are using hubs, DO NOT use full duplex on your hosts.  A hub can not
function at full duplex, only half.

If there are any intermediate layer 3 devices (routers), it's possible for
them to fragment your packets.  Verify the MTU on any of these devices as
well as the appropriate duplex setting.

Run netstat -s after passing a good bit of traffic between the hosts in
question.  Don't forget to do the math to determine error percentages.
tcpdump could also reveal much about the packets such as their size and
contents, whether they are fragments, if the DF bit is set, which host was
the last to communicate, etc...  A tcpdump along with your application trace
may show you just the insight you needed to see.

Do you have any packet filters between the devices?  Make sure they're not
dropping anything you need.  I don't remember if NFS is one of these, but
some things like to talk from high-port to high-port for [certain] things
and high-port to low-port for other [certain] things.

One thing I'd try that is a surefire way to determine if your network
hardware is to blame, that is if you don't want to do all that crap above:
Run your scenerio with your two devices connected via an ethernet crossover
cable and NICs hard-coded to 100baseTX-FDX.  It'll rule out everything
except that cable and your NICs.

Speaking of NICs, some [really old] NICs may report they are running at
full-duplex when they really are not and can not.  Incrementing port error
counters (specifically, frame-check-sequence and collisions) will give this
away, though.


> > Is this purely a diagnostic suggestion?
>
> Well, if it changes anything then it would definitely show there's a
> hardware problem to fix...
>


--peace,
~~Mike.



Re: PostgreSQL, NetBSD and NFS

From
Manuel Bouyer
Date:
On Thu, Jan 30, 2003 at 01:27:59PM -0600, Greg Copeland wrote:
> That was going to be my question too.
> 
> I thought NFS didn't have some of the requisite file system behaviors
> (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or
> reliably.

I don't know what locking sheme PostgreSQL use, but in theory it should
be possible to use it over NFS:
- a fflush()/msync() should work the same way on a NFS filesystem as on a local filesystem, provided the client and
serverimplements the NFS protocol properly
 
- locking via temp files works over NFS, again provided the client and server implements the NFS protocol properly
(thisis why you can safely read your mailbox over NFS, for example). If PostgreSQL uses flock or fcntl, it's a
problem.

-- 
Manuel Bouyer <bouyer@antioche.eu.org>    NetBSD: 24 ans d'experience feront toujours la difference
--


Re: PostgreSQL, NetBSD and NFS

From
"Greg A. Woods"
Date:
[ On Friday, January 31, 2003 at 11:54:27 (-0500), D'Arcy J.M. Cain wrote: ]
> Subject: Re: PostgreSQL, NetBSD and NFS
>
> On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote:
> > Is postgreSQL trying to lock a file perhaps?  Would seem a sensible thing
> > for it to be doing...
> 
> Is that a problem?  FWIW I am running statd and lockd on the NetBSD box.

NetBSD's NFS implementation only supports locking as a _server_, not a
client.

http://www.unixcircle.com/features/nfs.php
  Optional for file locking (lockd+statd):
  lockd:
  Rpc.lockd is a daemon which provides file and record-locking services  in an NFS environment.
  FreeBSD, NetBSD and OpenBSD file locking is only supported on server  side.

NFS server support for locking was introduced in NetBSD-1.5:

http://www.netbsd.org/Releases/formal-1.5/NetBSD-1.5.html
    * Server part of NFS locking (implemented by rpc.lockd(8)) now works.      

and as you can also see from rcp.lockd/lockd.c:

----------------------------
revision 1.5
date: 2000/06/07 14:34:40;  author: bouyer;  state: Exp;  lines: +67 -25
Implement file locking in lockd. All the stuff is done in userland, using
fhopen() and flock(). This means that if you kill lockd, all locks will
be relased (but you're supposed to kill statd at the same time, so
remote hosts will know it and re-establish the lock).
Tested against solaris 2.7 and linux 2.2.14 clients.
Shared lock are not handled efficiently, they're serialised in lockd when they
could be granted.
----------------------------


Terry Lambert has some proposed fixes to add NFS client level locking to
the FreeBSD kernel:

http://www.freebsd.org/~terry/DIFF.LOCKS.txthttp://www.freebsd.org/~terry/DIFF.LOCKS.MANhttp://www.freebsd.org/~terry/DIFF.LOCKS

--                             Greg A. Woods

+1 416 218-0098;            <g.a.woods@ieee.org>;           <woods@robohack.ca>
Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>