Thread: PostgreSQL, NetBSD and NFS
I have posted before about this but I am now posting to both NetBSD and PostgreSQL since it seems to be some sort of interaction between the two. I have a NetAPP filer on which I am putting a PostgreSQL database. I run PostgreSQL on a NetBSD box. I used rsync to get the database onto the filer with no problem whatsoever but as soon as I try to open the database the NFS mount hangs and I can't do any operations on that mounted drive without hanging. Other things continue to run but the minute I do a df or an ls on that drive that terminal is lost. On the NetBSD side I get a "server not responding" error. On the filer I see no problems at all. A reboot of the filer doesn't correct anything. Since NetBSD works just fine with this until I start PostgreSQL and PostgreSQL, from all reports, works well with the NetApp filer, I assume that there is something out of the ordinary about PostgreSQL's disk access that is triggering some subtle bug in NetBSD. Does the shared memory stuff use disk at all? Perhaps that's the difference between PostgreSQL and other applications. The NetApp people are being very helpful and are willing to follow up any leads people might have and may even suggest fixes if necessary. I have Bcc'd the engineer on this message and will send anything I get to them. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > I have posted before about this but I am now posting to both NetBSD and > PostgreSQL since it seems to be some sort of interaction between the two. I > have a NetAPP filer on which I am putting a PostgreSQL database. I run > PostgreSQL on a NetBSD box. I used rsync to get the database onto the filer > with no problem whatsoever but as soon as I try to open the database the NFS > mount hangs and I can't do any operations on that mounted drive without > hanging. That's darn odd. But please be more specific: what's "open the database"? Start the postmaster? Start a psql? Issue a query? > Does the shared memory stuff use disk at all? No, I can't see that there would be any connection there. Perhaps the next thing to do is to strace (ktrace, trace, truss, whatever system-call tracing utility you got) the postmaster and child processes. If we could determine what system call is hanging up, we might be a little closer to solving the mystery. regards, tom lane
Forgive my stupidity, are you running PostgreSQL with the data on an NFS share? D'Arcy J.M. Cain wrote: >I have posted before about this but I am now posting to both NetBSD and >PostgreSQL since it seems to be some sort of interaction between the two. I >have a NetAPP filer on which I am putting a PostgreSQL database. I run >PostgreSQL on a NetBSD box. I used rsync to get the database onto the filer >with no problem whatsoever but as soon as I try to open the database the NFS >mount hangs and I can't do any operations on that mounted drive without >hanging. Other things continue to run but the minute I do a df or an ls on >that drive that terminal is lost. > >On the NetBSD side I get a "server not responding" error. On the filer I see >no problems at all. A reboot of the filer doesn't correct anything. > >Since NetBSD works just fine with this until I start PostgreSQL and >PostgreSQL, from all reports, works well with the NetApp filer, I assume that >there is something out of the ordinary about PostgreSQL's disk access that is >triggering some subtle bug in NetBSD. Does the shared memory stuff use disk >at all? Perhaps that's the difference between PostgreSQL and other >applications. > >The NetApp people are being very helpful and are willing to follow up any >leads people might have and may even suggest fixes if necessary. I have >Bcc'd the engineer on this message and will send anything I get to them. > > >
That was going to be my question too. I thought NFS didn't have some of the requisite file system behaviors (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or reliably. Please correct as needed. Regards, Greg On Thu, 2003-01-30 at 13:02, mlw wrote: > Forgive my stupidity, are you running PostgreSQL with the data on an NFS > share? > > > D'Arcy J.M. Cain wrote: > > >I have posted before about this but I am now posting to both NetBSD and > >PostgreSQL since it seems to be some sort of interaction between the two. I > >have a NetAPP filer on which I am putting a PostgreSQL database. I run > >PostgreSQL on a NetBSD box. I used rsync to get the database onto the filer > >with no problem whatsoever but as soon as I try to open the database the NFS > >mount hangs and I can't do any operations on that mounted drive without > >hanging. Other things continue to run but the minute I do a df or an ls on > >that drive that terminal is lost. > > > >On the NetBSD side I get a "server not responding" error. On the filer I see > >no problems at all. A reboot of the filer doesn't correct anything. > > > >Since NetBSD works just fine with this until I start PostgreSQL and > >PostgreSQL, from all reports, works well with the NetApp filer, I assume that > >there is something out of the ordinary about PostgreSQL's disk access that is > >triggering some subtle bug in NetBSD. Does the shared memory stuff use disk > >at all? Perhaps that's the difference between PostgreSQL and other > >applications. > > > >The NetApp people are being very helpful and are willing to follow up any > >leads people might have and may even suggest fixes if necessary. I have > >Bcc'd the engineer on this message and will send anything I get to them. > > > > > > > > > > ---------------------------(end of broadcast)--------------------------- > TIP 5: Have you checked our extensive FAQ? > > http://www.postgresql.org/users-lounge/docs/faq.html -- Greg Copeland <greg@copelandconsulting.net> Copeland Computer Consulting
Greg Copeland <greg@CopelandConsulting.Net> writes: > That was going to be my question too. > I thought NFS didn't have some of the requisite file system behaviors > (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or > reliably. Whether the thing is trustworthy is a different issue ;-). I was just surprised that it didn't seem to work at all. In practice, if the NFS server never goes down then you probably haven't got a problem. I'm not sure you could count on the database not getting scrambled if the NFS server crashes. But that wasn't the question... regards, tom lane
--On Thursday, January 30, 2003 16:02:17 -0500 Tom Lane <tgl@sss.pgh.pa.us> wrote: > Greg Copeland <greg@CopelandConsulting.Net> writes: >> That was going to be my question too. >> I thought NFS didn't have some of the requisite file system behaviors >> (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or >> reliably. > > Whether the thing is trustworthy is a different issue ;-). I was just > surprised that it didn't seem to work at all. > > In practice, if the NFS server never goes down then you probably haven't > got a problem. I'm not sure you could count on the database not getting > scrambled if the NFS server crashes. But that wasn't the question... FWIW I use a netapp filer for my databases here for traffic analysis and IP management. The NETAPP has battery backed NVRAM and will replay the right stuff on it's own. Just another datapoint. LER > > regards, tom lane > > ---------------------------(end of broadcast)--------------------------- > TIP 2: you can get off all lists at once with the unregister command > (send "unregister YourEmailAddressHere" to majordomo@postgresql.org) > -- Larry Rosenman http://www.lerctr.org/~ler Phone: +1 972-414-9812 E-Mail: ler@lerctr.org US Mail: 1905 Steamboat Springs Drive, Garland, TX 75044-6749
On Thu, 30 Jan 2003, D'Arcy J.M. Cain wrote: > Does the shared memory stuff use disk at all? Perhaps that's the > difference between PostgreSQL and other applications. Shared memory in NetBSD is just an interface to mmap'd pages, so it can be swapped to disk. But I assume your swap is not on NFS.... A ktrace would be helpful. Also, it would be helpful if you tried doing an initdb to a directory on the filer to see if you can even create a database cluster, and tried doing that or rsyncing and accessing your data over NFS with a NetBSD system as the NFS server. cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote: > Is postgreSQL trying to lock a file perhaps? Would seem a sensible thing > for it to be doing... Is that a problem? FWIW I am running statd and lockd on the NetBSD box. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Fri, 31 Jan 2003, D'Arcy J.M. Cain wrote: > On Thursday 30 January 2003 12:07, Tom Lane wrote: > > Perhaps the next thing to do is to strace (ktrace, trace, truss, > > whatever system-call tracing utility you got) the postmaster and > > child processes. If we could determine what system call is hanging up, > > we might be a little closer to solving the mystery. > > Ktrace. Yes, am doing another test at the moment - using 100Mb to 100Mb and > TCP option to the mount. Before I was using the default UDP and going 100Mb > to 1000 Mb. If this works I will try my "guaranteed" fail next and will add > ktrace. In fact, I will do that regardless. Look at the -t option to ktrace. It controls what ktrace looks at (syscalls, NAMEI lookups, etc.). Most importantly, you might want to NOT include the 'i' option in there, which is in there by default. It logs the data of all i/o transfers, which baloons the logs. While you may need the data in the end, tracing w/o 'i' could show you the syscalls around the failure which might be enough. Take care, Bill
D'Arcy J.M. Cain wrote: >On Thursday 30 January 2003 14:02, mlw wrote: > > >>Forgive my stupidity, are you running PostgreSQL with the data on an NFS >>share? >> >> > >Yes, sorry. PostgreSQL is running from the local disk but the data is on the >mounted drive. > I'm not sure, I guess it could work, but NFS is a pretty poor file system. There are always issues with file locking across various platforms. I recall reading about mmap issues across NFS a while ago (forget the platform, sorry). Depending on the NFS server, there may be problems there. The NFS client may also have isses with locking, fsync, and mmap. If possible, look for a network block device protocol. The file level NFS is probably inadequate for PostgreSQL.
On Thursday 30 January 2003 14:27, Greg Copeland wrote: > That was going to be my question too. > > I thought NFS didn't have some of the requisite file system behaviors > (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or > reliably. > > Please correct as needed. Yes, doubly so here please. I think I remember someone else saying that they use PostgreSQL over NFS so hopefully this is not the situation. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Thursday 30 January 2003 14:02, mlw wrote: > Forgive my stupidity, are you running PostgreSQL with the data on an NFS > share? Yes, sorry. PostgreSQL is running from the local disk but the data is on the mounted drive. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Thursday 30 January 2003 12:07, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > I have posted before about this but I am now posting to both NetBSD and > > PostgreSQL since it seems to be some sort of interaction between the two. > > I have a NetAPP filer on which I am putting a PostgreSQL database. I > > run PostgreSQL on a NetBSD box. I used rsync to get the database onto > > the filer with no problem whatsoever but as soon as I try to open the > > database the NFS mount hangs and I can't do any operations on that > > mounted drive without hanging. > > That's darn odd. But please be more specific: what's "open the > database"? Start the postmaster? Start a psql? Issue a query? Start the postmaster. It is possible that I have a corrupted database but I was using that as a debugging tool because I still don't think that the whole NFS subsystem should lock up. The other time I tested it took hours to fail and I found it useful to have an immediate fail. > Perhaps the next thing to do is to strace (ktrace, trace, truss, > whatever system-call tracing utility you got) the postmaster and > child processes. If we could determine what system call is hanging up, > we might be a little closer to solving the mystery. Ktrace. Yes, am doing another test at the moment - using 100Mb to 100Mb and TCP option to the mount. Before I was using the default UDP and going 100Mb to 1000 Mb. If this works I will try my "guaranteed" fail next and will add ktrace. In fact, I will do that regardless. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > 100Mb instead of 100Mb -->1000Mb. I tried mounting with and without the TCP > option and it seemed to act the same but it was better than before. Now it > doesn't crash but trying to load a large table hangs. It gets to a point > where it is calling semop over and over getting a 0 return. It does that 81 > times in 0.989004 seconds and then hangs in the PostgreSQL code. It must be > in some sort of busy loop because there are no further system calls after the > last semop return and the CPU usage continues to climb. Very bizarre. Looks like the last page it read was block 104 (851968/8192) in file "/source/data/cert/base/16556/17063". Could you provide a formatted dump of that page? I'm partial to pg_filedump which you can get from http://sources.redhat.com/rhdb/tools.html. Use switches -f -i to get a reasonably complete dump. regards, tom lane
On Saturday 01 February 2003 13:09, Tom Lane wrote: > Very bizarre. Looks like the last page it read was block 104 > (851968/8192) in file "/source/data/cert/base/16556/17063". Could you > provide a formatted dump of that page? I'm partial to pg_filedump which > you can get from http://sources.redhat.com/rhdb/tools.html. Use > switches -f -i to get a reasonably complete dump. That's a 4.7 MB file. The dump might be quite huge. I can send you the file itself (privately) if you want. Wouldn't that be even better? I can tell you what the file is. It is the primary key file for the certificate database which is the 8 million record table that I am trying to load. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > That's a 4.7 MB file. The dump might be quite huge. I really just want to see the dump of that one page, and maybe the pages before and after it for comparison's sake. regards, tom lane
What was the query it failed on, exactly? That last page it read seems to be an empty index page --- it should have moved on to the next index page, I'd think, rather than doing anything that could hang up. regards, tom lane
On Saturday 01 February 2003 14:00, Tom Lane wrote: > What was the query it failed on, exactly? That last page it read > seems to be an empty index page --- it should have moved on to the > next index page, I'd think, rather than doing anything that could > hang up. Here's the log. As you can see, nothing was logged after the COPY command. It's possible that the file was corrupted. I will do a new test from scratch now that I am not switching speeds. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > Here's the log. As you can see, nothing was logged after the COPY command. What else was going on? As far as I can see, the code never does a semop unless it's waiting for some other backend process. regards, tom lane
On Saturday 01 February 2003 14:43, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > Here's the log. As you can see, nothing was logged after the COPY > > command. > > What else was going on? As far as I can see, the code never does a > semop unless it's waiting for some other backend process. Nothing except the standard background processes are running. The ktrace.out I gave the ftp address for has everything that that instance of PostgreSQL was doing. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > On Saturday 01 February 2003 14:43, Tom Lane wrote: >> What else was going on? As far as I can see, the code never does a >> semop unless it's waiting for some other backend process. > Nothing except the standard background processes are running. More and more bizarre. What is the hardware platform --- does it have TAS? regards, tom lane
On Fri, 31 Jan 2003, mlw wrote: > . There are always issues with file locking across various > platforms. I recall reading about mmap issues across NFS a while ago... Postgres uses neither of these, IIRC, so that should be fine. (Actually, postgres does effectively use mmap for shared memory on NetBSD, but that's not mapping data on the NFS filesystem, so it's not an issue.) > The NFS client may also have isses with locking, fsync, and mmap. Any fsync problems would affect data integrity during a crash, but nothing otherwise. (Of course, I'm happy to be corrected on any of these issues, if someone can point out particular parts of postgres that would fail over NFS.) cjs -- Curt Sampson <cjs@cynic.net> +81 90 7737 2974 http://www.netbsd.org Don't you know, in this new Dark Age, we're alllight. --XTC
On Saturday 01 February 2003 15:48, Tom Lane wrote: > More and more bizarre. What is the hardware platform --- does it have TAS? NetBSD on a Pentium (i386 port) so yes, it does have TAS. I assume you were thinking about the spinlock emulation. I have been looking through backend/storage/lmgr/lwlock.c and backend/storage/lmgr/spin.c myself and can't find any place that it can get into an infinite loop without making a system call within the loop. It's very odd. Also odd, why would running over NFS have any bearing on it if we could find such a place? -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > Also odd, why would running over NFS have any bearing on it if we > could find such a place? Yup, 'tis the question. The only theory I have been able to come up with is that there's something flaky about your network hardware, such that Postgres sometimes reads bad data from the NFS server. But the glaring problem with that theory is that bad data coming from a regular disk drive generally results in error messages or core dumps. Silent hangs would be a new behavior AFAIR. At this point I think you need to rebuild with --enable-debug and --enable-cassert (if you didn't already) and then capture some stack traces from the stuck backend. We have to find out what the backend thinks it's doing. BTW: *are* we certain it's associated with NFS, and not a hardware problem on your NetBSD box? Can you perform the same tests running the database off a local disk? regards, tom lane
On Sunday 02 February 2003 12:26, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > Also odd, why would running over NFS have any bearing on it if we > > could find such a place? > > Yup, 'tis the question. The only theory I have been able to come up > with is that there's something flaky about your network hardware, Possible but two separate networks? > At this point I think you need to rebuild with --enable-debug and > --enable-cassert (if you didn't already) and then capture some > stack traces from the stuck backend. We have to find out what the > backend thinks it's doing. That was going to be my next step. > BTW: *are* we certain it's associated with NFS, and not a hardware > problem on your NetBSD box? Can you perform the same tests running > the database off a local disk? That box is running 5 production database engines on 5 different ports. This is the 6th one and the only difference is that it is running from the NFS mounted drive. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
On Sunday 02 February 2003 12:26, Tom Lane wrote: > At this point I think you need to rebuild with --enable-debug and > --enable-cassert (if you didn't already) and then capture some > stack traces from the stuck backend. We have to find out what the > backend thinks it's doing. Well, it does appear to be working but it never finishes. Here are two backtraces. One was taken while it was running and the other after a kill -9. The primary key file should have had 322846720 bytes based on the database that I was copying in but it only had 4603904 after running the restore for 12 hours. The file seems to get to a static size and just stays there. I am running another test to confirm that. (gdb) bt #0 LWLockAcquire (lockid=7272, mode=LW_SHARED) at lwlock.c:236 #1 0x8110417 in LockBuffer (buffer=3626, mode=1) at bufmgr.c:2004 #2 0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=6, access=1) at nbtpage.c:321 #3 0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1, scankey=0x83b90c0, access=1) at nbtsearch.c:159 #4 0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0, bufP=0xbfbfcb04, access=2) at nbtsearch.c:105 #5 0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c, index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101 #6 0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283 #7 0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360, arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328) at fmgr.c:1247 #8 0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64, nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78) at indexam.c:193 #9 0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c, estate=0x83b9a48, is_update=0) at execUtils.c:668 #10 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000', fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38"\\N") at copy.c:927 #11 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000', oids=0 '\000', from=1 '\001', pipe=1 '\001',filename=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336 #12 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote, completionTag=0xbfbfcdfc "") at utility.c:341 #13 0x811cc46 in pg_exec_query_string ( query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote, parse_context=0x83676a0)at postgres.c:766 #14 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008, username=0x833c525 "darcy") at postgres.c:1926 #15 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243 #16 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874 #17 0x8101bbf in ServerLoop () at postmaster.c:995 #18 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771 #19 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206 #20 0x8067559 in ___start () (gdb) cont Continuing. Program received signal SIGKILL, Killed. 0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199 lwlock.c:199: No such file or directory. (gdb) bt #0 0x8119a5d in LWLockAcquire (lockid=3587, mode=LW_SHARED) at lwlock.c:199 #1 0x80828ec in _bt_getbuf (rel=0x83a86f0, blkno=404, access=1) at nbtpage.c:321 #2 0x808559d in _bt_moveright (rel=0x83a86f0, buf=3538, keysz=1, scankey=0x83b90c0, access=1) at nbtsearch.c:159 #3 0x8085412 in _bt_search (rel=0x83a86f0, keysz=1, scankey=0x83b90c0, bufP=0xbfbfcb04, access=2) at nbtsearch.c:105 #4 0x807da06 in _bt_doinsert (rel=0x83a86f0, btitem=0x83ba12c, index_is_unique=1 '\001', heapRel=0x83a6b78) at nbtinsert.c:101 #5 0x8082f84 in btinsert (fcinfo=0xbfbfcb58) at nbtree.c:283 #6 0x815e7cd in OidFunctionCall5 (functionId=331, arg1=138053360, arg2=3217017956, arg3=3217017940, arg4=138124076, arg5=138046328) at fmgr.c:1247 #7 0x807c8f4 in index_insert (relation=0x83a86f0, datum=0xbfbfcc64, nulls=0xbfbfcc54 " ", heap_t_ctid=0x83b9b2c, heapRel=0x83a6b78) at indexam.c:193 #8 0x80d3d47 in ExecInsertIndexTuples (slot=0x83b9068, tupleid=0x83b9b2c, estate=0x83b9a48, is_update=0) at execUtils.c:668 #9 0x80b8645 in CopyFrom (rel=0x83a6b78, binary=0 '\000', oids=0 '\000', fp=0x0, delim=0x8193d36 "\t", null_print=0x8193d38"\\N") at copy.c:927 #10 0x80b75cb in DoCopy (relname=0x83b11d0 "certificate", binary=0 '\000', oids=0 '\000', from=1 '\001', pipe=1 '\001',filename=0x0, delim=0x8193d36 "\t", null_print=0x8193d38 "\\N") at copy.c:336 #11 0x811ea7d in ProcessUtility (parsetree=0x83b11ec, dest=Remote, completionTag=0xbfbfcdfc "") at utility.c:341 #12 0x811cc46 in pg_exec_query_string ( query_string=0x83b1038 "COPY \"certificate\" FROM stdin;", dest=Remote, parse_context=0x83676a0)at postgres.c:766 #13 0x811dce8 in PostgresMain (argc=5, argv=0xbfbfd008, username=0x833c525 "darcy") at postgres.c:1926 #14 0x8102e9f in DoBackend (port=0x833c400) at postmaster.c:2243 #15 0x8102859 in BackendStartup (port=0x833c400) at postmaster.c:1874 #16 0x8101bbf in ServerLoop () at postmaster.c:995 #17 0x8101782 in PostmasterMain (argc=1, argv=0x832d030) at postmaster.c:771 #18 0x80e188f in main (argc=1, argv=0xbfbfd780) at main.c:206 #19 0x8067559 in ___start () > > BTW: *are* we certain it's associated with NFS, and not a hardware > problem on your NetBSD box? Can you perform the same tests running > the database off a local disk? > > regards, tom lane -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > Well, it does appear to be working but it never finishes. Here are two > backtraces. One was taken while it was running and the other after a kill > -9. The primary key file should have had 322846720 bytes based on the > database that I was copying in but it only had 4603904 after running the > restore for 12 hours. The file seems to get to a static size and just stays > there. I am running another test to confirm that. Hmm --- seems like it must be getting into an infinite loop, but where and why? Here is a test plan: 1. Run it, let it reach the point where the file size stops growing. 2. Attach to process with gdb. Repeatedly do 'fin' to finish out current function call, until the prompt doesn't come back any more. Whichever level of function didn't finish reasonably quickly is the one that's looping. 3. Control-C to get control back in gdb. Do 'fin' enough times to get back to the looping function, but not the extra time to let it run. Now, use 'next' repeatedly to see just what lines it's circling around in, and print out the values of its local variables as it does so. That info should move the investigation forward ... From looking at your existing dumps I will hazard a guess that _bt_moveright is looping ... but why? And why should that happen only with NFS? regards, tom lane
On Wednesday 05 February 2003 10:12, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > Well, it does appear to be working but it never finishes. Here are two > > backtraces. One was taken while it was running and the other after a > > kill -9. The primary key file should have had 322846720 bytes based on > > the database that I was copying in but it only had 4603904 after running > > the restore for 12 hours. The file seems to get to a static size and > > just stays there. I am running another test to confirm that. > > Hmm --- seems like it must be getting into an infinite loop, but where > and why? Here is a test plan: Hmm. This time it passed that point but this happened: COPY "certificate" FROM stdin; NOTICE: copy: line 253677, bt_insertonpg[certificate_pkey]: parent page unfound - fixing branch ERROR: copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item order(1) (need to recreate index) lost synchronization with server, resetting connection It then continued on. It is currently stuck on the next largest table in our system. I will try this if it hangs on that other table. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > Hmm. This time it passed that point but this happened: > COPY "certificate" FROM stdin; > NOTICE: copy: line 253677, bt_insertonpg[certificate_pkey]: parent page > unfound - fixing branch > ERROR: copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item > order(1) (need to recreate index) Hoo boy. I was already suspecting data corruption in the index, and this looks like more of the same. My thoughts are definitely straying in the direction of "the NFS server is dropping bits, somehow". Both this and the (admittedly unproven) bt_moveright loop suggest corrupted values in the cross-page links that exist at the very end of each btree index page. I wonder if it is possible that, every so often, you are losing just the last few bytes of an NFS transfer? regards, tom lane
On Wednesday 05 February 2003 11:49, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > Hmm. This time it passed that point but this happened: > > > > COPY "certificate" FROM stdin; > > NOTICE: copy: line 253677, bt_insertonpg[certificate_pkey]: parent page > > unfound - fixing branch > > ERROR: copy: line 253677, bt_fixlevel[certificate_pkey]: invalid item > > order(1) (need to recreate index) > > Hoo boy. I was already suspecting data corruption in the index, and > this looks like more of the same. My thoughts are definitely straying > in the direction of "the NFS server is dropping bits, somehow". > > Both this and the (admittedly unproven) bt_moveright loop suggest > corrupted values in the cross-page links that exist at the very end of > each btree index page. I wonder if it is possible that, every so often, > you are losing just the last few bytes of an NFS transfer? Yah, that's kind of what it looked like when I tried this before Christmas too although the actual errors differd. At that time I got a PostgreSQL error that implied that something that was just written was not there when it went back. Almost like a flushing issue. -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > On Wednesday 05 February 2003 11:49, Tom Lane wrote: >> I wonder if it is possible that, every so often, >> you are losing just the last few bytes of an NFS transfer? > Yah, that's kind of what it looked like when I tried this before > Christmas too although the actual errors differd. The observed behavior could vary wildly depending on what data happened to get "read". Wild thought here: can you reduce the MTU on the LAN linking the NFS server to the NetBSD box? If so, does it help? regards, tom lane
Tom Lane wrote: <snip> > Hoo boy. I was already suspecting data corruption in the index, and > this looks like more of the same. My thoughts are definitely straying > in the direction of "the NFS server is dropping bits, somehow". > > Both this and the (admittedly unproven) bt_moveright loop suggest > corrupted values in the cross-page links that exist at the very end of > each btree index page. I wonder if it is possible that, every so often, > you are losing just the last few bytes of an NFS transfer? Hmmm... does anyone remember the name of that NFS testing tool the FreeBSD guys were using? Think it came from Apple. They used it to find and isolate bugs in the FreeBSD code a while ago. Sounds like it might be useful here. :-) Regards and best wishes, Justin Clift > regards, tom lane -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
On Wed, 2003-02-05 at 11:18, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > On Wednesday 05 February 2003 11:49, Tom Lane wrote: > >> I wonder if it is possible that, every so often, > >> you are losing just the last few bytes of an NFS transfer? > > > Yah, that's kind of what it looked like when I tried this before > > Christmas too although the actual errors differd. > > The observed behavior could vary wildly depending on what data happened to > get "read". > > Wild thought here: can you reduce the MTU on the LAN linking the NFS > server to the NetBSD box? If so, does it help? > Tom, I'm curious as to why you think adjusting the MTU may have an effect on this. Lowering the MTU may actually increase fragmentation, lower efficiency, and even exacerbate the situation. Is this purely a diagnostic suggestion? Regards, -- Greg Copeland <greg@copelandconsulting.net> Copeland Computer Consulting
Justin Clift wrote: > Hmmm... does anyone remember the name of that NFS testing tool the > FreeBSD guys were using? Think it came from Apple. They used it to > find and isolate bugs in the FreeBSD code a while ago. > > Sounds like it might be useful here. > > :-) > You can find a write about it here: http://kerneltrap.org/node.php?id=327 The actual link to the source http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/ James
Greg Copeland <greg@copelandconsulting.net> writes: > On Wed, 2003-02-05 at 11:18, Tom Lane wrote: >> Wild thought here: can you reduce the MTU on the LAN linking the NFS >> server to the NetBSD box? If so, does it help? > I'm curious as to why you think adjusting the MTU may have an effect on > this. Lowering the MTU may actually increase fragmentation, lower > efficiency, and even exacerbate the situation. I'm thinking maybe one or both LAN cards have a problem with packets exceeding a certain size. > Is this purely a diagnostic suggestion? Well, if it changes anything then it would definitely show there's a hardware problem to fix... regards, tom lane
James Hubbard wrote: > Justin Clift wrote: > >> Hmmm... does anyone remember the name of that NFS testing tool the >> FreeBSD guys were using? Think it came from Apple. They used it to >> find and isolate bugs in the FreeBSD code a while ago. >> >> Sounds like it might be useful here. >> >> :-) > > You can find a write about it here: > http://kerneltrap.org/node.php?id=327 > > The actual link to the source > http://www.freebsd.org/cgi/cvsweb.cgi/src/tools/regression/fsx/ Thanks James. That's definitely the one. D'Arcy, if you want to test if your NFS layer is stable, this might really help. It's a single C file that get compiled, and you run it against a remote NFS file. This is supposed to be one of those tools that will try to trip up the NFS layer in every possible way, without violating the spec, etc. Hope this is useful. :) Regards and best wishes, Justin Clift > James -- "My grandfather once told me that there are two kinds of people: those who work and those who take the credit. He told me to try to be in the first group; there was less competition there." - Indira Gandhi
Tom Lane wrote: > Greg Copeland <greg@copelandconsulting.net> writes: > > On Wed, 2003-02-05 at 11:18, Tom Lane wrote: > >> Wild thought here: can you reduce the MTU on the LAN linking the NFS > >> server to the NetBSD box? If so, does it help? > > > I'm curious as to why you think adjusting the MTU may have an effect on > > this. Lowering the MTU may actually increase fragmentation, lower > > efficiency, and even exacerbate the situation. > > I'm thinking maybe one or both LAN cards have a problem with packets > exceeding a certain size. But he's using NFS over TCP, so any traffic that gets truncated or dropped should simply result in a TCP retransmit (since the packet's data won't match its checksum anymore, and it'll get dropped on the floor). Of course, if the NFS layer is actually transferring data via UDP despite explicitly being told to mount via TCP, that's something else. It might be useful to verify via netstat that an actual TCP connection to the NFS server is being established and used. Makes me wonder if this might be a problem at the NFS protocol layer... -- Kevin Brown kevin@sysexperts.com
On Wednesday 05 February 2003 13:04, Ian Fry wrote: > > Wild thought here: can you reduce the MTU on the LAN linking the NFS > > server to the NetBSD box? If so, does it help? > > How about adjusting the read and write-size used by the NetBSD machine? I > think the default is 32k for both read and write on i386 machines now. > Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs, > IIRC) Hey! That did it. I hadn't tried that before because I had tried using the tcp option to mount and the docs suggested that that would do more than reducing the block size. Besides, the man page didn't give the defaults and I was uncomfortable changing something when I didn't know from what. So, why does this fix it? It seems to me that it should have worked anyway. This feels rather fragile. I doubt that it is hardware related because I dad tried it on the other ethernet interface in the machine which was on a completely different network than the one I am on now. What is the implication of smaller read and write size? Will I necessarily take a performance hit? -- D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves http://www.druid.net/darcy/ | and a sheep voting on +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner.
"D'Arcy J.M. Cain" <darcy@druid.net> writes: > On Wednesday 05 February 2003 13:04, Ian Fry wrote: >> How about adjusting the read and write-size used by the NetBSD machine? I >> think the default is 32k for both read and write on i386 machines now. >> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs, >> IIRC) > Hey! That did it. Hot diggety! > So, why does this fix it? I think now you file a bug report with the NetBSD kernel folk. My thoughts are running in the direction of a bug having to do with scattering a 32K read into multiple kernel disk-cache buffers or gathering together multiple cache buffer contents to form a 32K write. Unless NetBSD has changed from its heritage, the kernel disk cache buffers are 8K, and so an 8K NFS read or write would never cross a cache buffer boundary. But 32K would. Or it could be a similar bug on the NFS server's side? regards, tom lane
Thor Lancelot Simon <tls@rek.tjls.com> writes: >> Unless NetBSD has changed from its heritage, the kernel disk cache >> buffers are 8K, and so an 8K NFS read or write would never cross a >> cache buffer boundary. But 32K would. > I don't know what "heritage" you're referring to, but it has never been > the case that NetBSD's buffer cache has used fixed-size 8K disk buffers, > and I don't believe that it was ever the case for any Net2 or 4.4-derived > system. Could be. By "heritage" I meant BSD-without-any-adjective. It is perfectly clear from Leffler, McKusick et al. (_The Design and Implementation of the 4.3BSD UNIX Operating System_) that back then, 8K was the standard filesystem block size. However, I was just guessing that that might have anything to do with the problem. It does seem clear now that we are looking at a kernel or network bug, though. regards, tom lane
> Hmmm... does anyone remember the name of that NFS testing tool the > FreeBSD guys were using? Think it came from Apple. They used it to > find and isolate bugs in the FreeBSD code a while ago. fsx Chris
On Wed, Feb 05, 2003 at 09:24:48PM +0000, David Laight wrote: > > If he is using UDP rather than TCP > > as the transport layer, another potential issue is that 32K requests will > > end up as IP packets with a very large number of fragments, potentially > > exposing some kind of network stack bug in which the last fragment is > > dropped or corrupted. > > Actually it is worse that that, and IMHO 32k UDP requests are asking for > trouble. > > A 32k UDP datagram is about 22 ethernet packets. If ANY of them is > lost on the network, then the entire datagram is lost. NFS must > regenerate the request on a timeout. The receiving system won't > report that it is missing a fragment. As he stated several times, he has tested with TCP mounts and observed the same issue. So the above issue shouldn't be related. > There are also an lot of ethernet cards out there which don't have > enough buffer space for 32k of receive data. Not to mention the > fact that NFS can easily (at least on some systems) generate > concurrent requests for different parts of the same file. > > I would suggest reducing the size back to 8k, even that causes > trouble with some cards. If NetBSD as an NFS client is this fragile we have problems. The default read/write size shouldn't be 32kB if that is not going to work reliably. > It should also be realised that transmitting 22 full sized, back > to back frames on the ethernet doesn't do anything for sharing > the bandwidth betweenn different users. The MAC layer has to very > aggressive in order to get a packet in edgeways (so to speak). So what? If it is a switched network, which I assume it is since he was talking to the NetApp gigabit port earlier, then this is irrelevant. Even the $40 Fry's switches are more or less non-blocking. Even if he is saturating the local *hub*, it shouldn't cause NetBSD to fail, it would just be rude. :-) There could be some packet mangling on the network, checking the amount of retransmissions on either end of the TCP connection should give you an idea about that. -Andrew
On February 06, 2003 at 03:50, Justin Clift wrote: > Tom Lane wrote: > <snip> > >Hoo boy. I was already suspecting data corruption in the index, and > >this looks like more of the same. My thoughts are definitely straying > >in the direction of "the NFS server is dropping bits, somehow". > > > >Both this and the (admittedly unproven) bt_moveright loop suggest > >corrupted values in the cross-page links that exist at the very end of > >each btree index page. I wonder if it is possible that, every so often, > >you are losing just the last few bytes of an NFS transfer? > > Hmmm... does anyone remember the name of that NFS testing tool the > FreeBSD guys were using? Think it came from Apple. They used it to > find and isolate bugs in the FreeBSD code a while ago. > > Sounds like it might be useful here. > > :-) > fsx. See also <http://www.connectathon.org> hth, Byron
On Wed, Feb 05, 2003 at 12:18:29PM -0500, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > On Wednesday 05 February 2003 11:49, Tom Lane wrote: > >> I wonder if it is possible that, every so often, > >> you are losing just the last few bytes of an NFS transfer? > > Yah, that's kind of what it looked like when I tried this before > > Christmas too although the actual errors differd. > Wild thought here: can you reduce the MTU on the LAN linking the NFS > server to the NetBSD box? If so, does it help? How about adjusting the read and write-size used by the NetBSD machine? I think the default is 32k for both read and write on i386 machines now. Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs, IIRC) Ian.
On Wed, Feb 05, 2003 at 03:45:11PM -0500, Tom Lane wrote: > Thor Lancelot Simon <tls@rek.tjls.com> writes: > >> Unless NetBSD has changed from its heritage, the kernel disk cache > >> buffers are 8K, and so an 8K NFS read or write would never cross a > >> cache buffer boundary. But 32K would. > > > I don't know what "heritage" you're referring to, but it has never been > > the case that NetBSD's buffer cache has used fixed-size 8K disk buffers, > > and I don't believe that it was ever the case for any Net2 or 4.4-derived > > system. > > Could be. By "heritage" I meant BSD-without-any-adjective. It is > perfectly clear from Leffler, McKusick et al. (_The Design and > Implementation of the 4.3BSD UNIX Operating System_) that back then, > 8K was the standard filesystem block size. FWIW, the fact that the default block size for one particular on-disk filesystem happens to be 8K doesn't really imply anything about the design or implementation of the buffer cache, certainly not that it uses fixed-size buffers that are each 8K in size. This is particularly evident in this case, since there's not even any on-disk filesystem involved, and NFS doesn't really have a "block size" in the sense in which you seem to be using that term. I don't have my copy of the 4.3 book here for comparison, but the 4.4 book makes the data structures associated with the old-style buffer cache pretty clear: buffers of fixed virtual but variable physical size, each with memory pages attached as necessary, so a single buffer may cache anywhere from a single page to MAXPHYS (usually 64K) of data. This code isn't used for ordinary file data in NetBSD any longer, but it is in fact the way the Berkeley code works and it's how things worked in its various descendants for a long time. Of course, the way the 4.4BSD NFS code interfaces to the buffer cache is pretty strange, twisty, and horrible and there have been and probably still are bugs there. The one you suggest seems pretty odd, however, since there are no physical disk blocks involved as this system is an NFS *client*. -- Thor Lancelot Simon tls@rek.tjls.com But as he knew no bad language, he hadcalled him all the names of commonobjects that he could think of, and had screamed: "You lamp! You towel! Youplate!"and so on. --Sigmund Freud
On Wed, Feb 05, 2003 at 03:09:09PM -0500, Tom Lane wrote: > "D'Arcy J.M. Cain" <darcy@druid.net> writes: > > On Wednesday 05 February 2003 13:04, Ian Fry wrote: > >> How about adjusting the read and write-size used by the NetBSD machine? I > >> think the default is 32k for both read and write on i386 machines now. > >> Perhaps try setting them back to 8k (it's the -r and -w flags to mount_nfs, > >> IIRC) > > > Hey! That did it. > > Hot diggety! > > > So, why does this fix it? Who knows. One thing that I'd be interested to know is whether Darcy is using NFSv2 or NFSv3 -- 32k requests are not, strictly speaking, within the bounds of the v2 specification. If he is using UDP rather than TCP as the transport layer, another potential issue is that 32K requests will end up as IP packets with a very large number of fragments, potentially exposing some kind of network stack bug in which the last fragment is dropped or corrupted (I would suspect that the likelihood of such a bug in the NetApp stack is quite low, however). If feasible, it is probably better to use TCP as the transport and let it handle segmentation whether the request size is 8K or 32K. > I think now you file a bug report with the NetBSD kernel folk. My > thoughts are running in the direction of a bug having to do with > scattering a 32K read into multiple kernel disk-cache buffers or > gathering together multiple cache buffer contents to form a 32K write. That doesn't make much sense to me. Pages on i386 are 4K, so whether he does 8K writes or 32K writes, it will always come from multiple pages in the pagecache. > Unless NetBSD has changed from its heritage, the kernel disk cache > buffers are 8K, and so an 8K NFS read or write would never cross a > cache buffer boundary. But 32K would. I don't know what "heritage" you're referring to, but it has never been the case that NetBSD's buffer cache has used fixed-size 8K disk buffers, and I don't believe that it was ever the case for any Net2 or 4.4-derived system. > Or it could be a similar bug on the NFS server's side? That's concievable. Of course, a client bug is quite possible, as well, but I don't think the mechanism you suggest is likely. -- Thor Lancelot Simon tls@rek.tjls.com But as he knew no bad language, he hadcalled him all the names of commonobjects that he could think of, and had screamed: "You lamp! You towel! Youplate!"and so on. --Sigmund Freud
> If he is using UDP rather than TCP > as the transport layer, another potential issue is that 32K requests will > end up as IP packets with a very large number of fragments, potentially > exposing some kind of network stack bug in which the last fragment is > dropped or corrupted. Actually it is worse that that, and IMHO 32k UDP requests are asking for trouble. A 32k UDP datagram is about 22 ethernet packets. If ANY of them is lost on the network, then the entire datagram is lost. NFS must regenerate the request on a timeout. The receiving system won't report that it is missing a fragment. If fragments are being lost, the receiving system also starts to hit a buffer crisis because of all the incomplete requests it is still hoping it might receive the missing fragment for. After all the IP layer won't know the retransmittion is anything special. There are also an lot of ethernet cards out there which don't have enough buffer space for 32k of receive data. Not to mention the fact that NFS can easily (at least on some systems) generate concurrent requests for different parts of the same file. I would suggest reducing the size back to 8k, even that causes trouble with some cards. It should also be realised that transmitting 22 full sized, back to back frames on the ethernet doesn't do anything for sharing the bandwidth betweenn different users. The MAC layer has to very aggressive in order to get a packet in edgeways (so to speak). David -- David Laight: david@l8s.co.uk
On Wed, 5 Feb 2003, Tom Lane wrote: [TL: Could be. By "heritage" I meant BSD-without-any-adjective. It is [TL: perfectly clear from Leffler, McKusick et al. (_The Design and [TL: Implementation of the 4.3BSD UNIX Operating System_) that back then, [TL: 8K was the standard filesystem block size. "FS block size" != "Disk Buffer Size". Though 8k might have been the standard FS block size, it was possible -- and occasionally practiced -- to do 4k/512 filesystems, or 16k/2k filesystems, or M/N filesystems where { 4k < M < 16k (maybe 32k), log2(M) == int(log2(M)), log2(N) == int(log2(N)) and M/N <= 8 }. --*greywolf; -- NetBSD: making all computer hardware a commodity.
On Wed, 5 Feb 2003, D'Arcy J.M. Cain wrote: [DJC: This feels rather fragile. I doubt that it is hardware related because I dad [DJC: tried it on the other ethernet interface in the machine which was on a [DJC: completely different network than the one I am on now. All I can offer up is that at one point I had to reduce to 16k NFSIO when I replaced a switch (you didn't replace a switch, did you?) between my i386 and my sparc (my le0 and the switch didn't play nicely together; once I got the hme0 in, everything was happy as a clam). [DJC: What is the implication of smaller read and write size? Will I [DJC: necessarily take a performance hit? I didn't start noticing observable degradation across 100TX until I dropped NFSIO to 4k (which I did purely for benchmarking statistics). The differences between 8k, 16k and 32k have not been noticeable to me. 32k IO would hang my system at one point; since that time, something appears to have been fixed. [DJC: -- [DJC: D'Arcy J.M. Cain <darcy@{druid|vex}.net> | Democracy is three wolves [DJC: http://www.druid.net/darcy/ | and a sheep voting on [DJC: +1 416 425 1212 (DoD#0082) (eNTP) | what's for dinner. [DJC: --*greywolf; -- NetBSD: Servers' choice!
I've been watching this thread since the beginning, and now that y'all brought up networking, I believe I may have some useful suggestions in that arena. Tom Lane <tgl@sss.pgh.pa.us> writes: > I'm thinking maybe one or both LAN cards have a problem with packets > exceeding a certain size. > Are all the intermediate network devices at layer 2 (switches)? If so, a simple look at counters for those ports involved would rule out or in any problems with those network devices. I'm sure that if you have an MTU of 1500 bytes across the board (on the hosts and the switch(es)) then you will not have a problem with fragmentation at that layer on 100 Mbit Ethernet. Make sure you're at 100baseTX-FDX. If you are using hubs, DO NOT use full duplex on your hosts. A hub can not function at full duplex, only half. If there are any intermediate layer 3 devices (routers), it's possible for them to fragment your packets. Verify the MTU on any of these devices as well as the appropriate duplex setting. Run netstat -s after passing a good bit of traffic between the hosts in question. Don't forget to do the math to determine error percentages. tcpdump could also reveal much about the packets such as their size and contents, whether they are fragments, if the DF bit is set, which host was the last to communicate, etc... A tcpdump along with your application trace may show you just the insight you needed to see. Do you have any packet filters between the devices? Make sure they're not dropping anything you need. I don't remember if NFS is one of these, but some things like to talk from high-port to high-port for [certain] things and high-port to low-port for other [certain] things. One thing I'd try that is a surefire way to determine if your network hardware is to blame, that is if you don't want to do all that crap above: Run your scenerio with your two devices connected via an ethernet crossover cable and NICs hard-coded to 100baseTX-FDX. It'll rule out everything except that cable and your NICs. Speaking of NICs, some [really old] NICs may report they are running at full-duplex when they really are not and can not. Incrementing port error counters (specifically, frame-check-sequence and collisions) will give this away, though. > > Is this purely a diagnostic suggestion? > > Well, if it changes anything then it would definitely show there's a > hardware problem to fix... > --peace, ~~Mike.
On Thu, Jan 30, 2003 at 01:27:59PM -0600, Greg Copeland wrote: > That was going to be my question too. > > I thought NFS didn't have some of the requisite file system behaviors > (locking, flushing, etc. IIRC) for PostgreSQL to function correctly or > reliably. I don't know what locking sheme PostgreSQL use, but in theory it should be possible to use it over NFS: - a fflush()/msync() should work the same way on a NFS filesystem as on a local filesystem, provided the client and serverimplements the NFS protocol properly - locking via temp files works over NFS, again provided the client and server implements the NFS protocol properly (thisis why you can safely read your mailbox over NFS, for example). If PostgreSQL uses flock or fcntl, it's a problem. -- Manuel Bouyer <bouyer@antioche.eu.org> NetBSD: 24 ans d'experience feront toujours la difference --
[ On Friday, January 31, 2003 at 11:54:27 (-0500), D'Arcy J.M. Cain wrote: ] > Subject: Re: PostgreSQL, NetBSD and NFS > > On Thursday 30 January 2003 18:32, Simon J. Gerraty wrote: > > Is postgreSQL trying to lock a file perhaps? Would seem a sensible thing > > for it to be doing... > > Is that a problem? FWIW I am running statd and lockd on the NetBSD box. NetBSD's NFS implementation only supports locking as a _server_, not a client. http://www.unixcircle.com/features/nfs.php Optional for file locking (lockd+statd): lockd: Rpc.lockd is a daemon which provides file and record-locking services in an NFS environment. FreeBSD, NetBSD and OpenBSD file locking is only supported on server side. NFS server support for locking was introduced in NetBSD-1.5: http://www.netbsd.org/Releases/formal-1.5/NetBSD-1.5.html * Server part of NFS locking (implemented by rpc.lockd(8)) now works. and as you can also see from rcp.lockd/lockd.c: ---------------------------- revision 1.5 date: 2000/06/07 14:34:40; author: bouyer; state: Exp; lines: +67 -25 Implement file locking in lockd. All the stuff is done in userland, using fhopen() and flock(). This means that if you kill lockd, all locks will be relased (but you're supposed to kill statd at the same time, so remote hosts will know it and re-establish the lock). Tested against solaris 2.7 and linux 2.2.14 clients. Shared lock are not handled efficiently, they're serialised in lockd when they could be granted. ---------------------------- Terry Lambert has some proposed fixes to add NFS client level locking to the FreeBSD kernel: http://www.freebsd.org/~terry/DIFF.LOCKS.txthttp://www.freebsd.org/~terry/DIFF.LOCKS.MANhttp://www.freebsd.org/~terry/DIFF.LOCKS -- Greg A. Woods +1 416 218-0098; <g.a.woods@ieee.org>; <woods@robohack.ca> Planix, Inc. <woods@planix.com>; VE3TCP; Secrets of the Weird <woods@weird.com>