Thread: Winsock error 10035 while trying to upgrade from 8.0 to 8.2
I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to postgres 8.2 on windows 2000 Server (both version running on the same machine on different ports). During the migration process, I always get an error at some point (never the same) : LOG: could not receive data from client: Unknown winsock error 10035 which is followed by LOG: incomplete message from client ERROR: unexpected EOF on a client connexion FATAL: invalid frontend message type 53 psql -U postgres -p 5433 Moving the 8.2 postgres instance to a winxp pro machine, the migration is successfull. I've searched google but didn't find anything related to postgres. cyril Source database : postgres 8.0.9 Destination database : postgres 8.2.4 OS : Windows 2000 Server SP4 migration is done on linux using 8.2.4 binaries (since piping pg_dump output on windows stop on the first ctrl-Z) with "pg_dump -h YYY XXX | psql -h YYY -p 5433 XXX"
Cyril VELTER wrote: > > I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to > postgres 8.2 on windows 2000 Server (both version running on the same machine > on different ports). During the migration process, I always get an error at > some point (never the same) : Interesting. 10035 is "A non-blocking socket operation could not be completed immediatly". Question: Does this error come fromthe 8.0 or the 8.2 server? Also, do you use SSL? //Magnus
> magnus@hagander.net wrote : > Cyril VELTER wrote: > > > > I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to > > postgres 8.2 on windows 2000 Server (both version running on the same machine > > on different ports). During the migration process, I always get an error at > > some point (never the same) : > > Interesting. 10035 is "A non-blocking socket operation could not be > completed immediatly". > Question: Does this error come fromthe 8.0 or the 8.2 server? It comes from the 8.2 server message log > > Also, do you use SSL? No I'm not. It's not even complied in the server nor in the pg_dump binary. The server is built on windows using MSYS simply with ./configure && make all && make install I've been able to reproduce the problem 6 times (at random points in the process, but it never complete successfully). Is there any test I can do to help investigate the problem ? cyril > > //Magnus >
Cyril VELTER wrote: > No I'm not. It's not even complied in the server nor in the pg_dump binary. > > The server is built on windows using MSYS simply with ./configure && make all > && make install > > > I've been able to reproduce the problem 6 times (at random points in the > process, but it never complete successfully). Is there any test I can do to > help investigate the problem ? Sorry I haven't gotten back to you for a while. Yeah, if you can attach a debugger to the backend (assuming you have a predictable backend it happens to - but if you're loading, you are using a single session, I assume?), add a breakpoint around the area of the problem, and get a backtrace from exactly where it shows up, that would help. //Magnus
[Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2
From
"Cyril VELTER"
Date:
> Cyril VELTER wrote: > > No I'm not. It's not even complied in the server nor in the pg_dump binary. > > > > The server is built on windows using MSYS simply with ./configure && make all > > && make install > > > > > > I've been able to reproduce the problem 6 times (at random points in the > > process, but it never complete successfully). Is there any test I can do to > > help investigate the problem ? > > Sorry I haven't gotten back to you for a while. > > Yeah, if you can attach a debugger to the backend (assuming you have a > predictable backend it happens to - but if you're loading, you are using > a single session, I assume?), add a breakpoint around the area of the > problem, and get a backtrace from exactly where it shows up, that would > help. Thanks for your reply. I'll try to do this. I've installed gdb on the problematic machine and recompiled postgres with debug symbols (configure --enable-debug) I'm not very familiar with gdb. Could you give some direction on setting the breakpoint. After running gdb on the postgres.exe file, I'm not able to set the breakpoint (b socket.c:574 give me an error). Searching the source files, it seems the error message is generated in port/win32/socket.c line 594. Thanks, cyril
Re: [Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2
From
Magnus Hagander
Date:
Cyril VELTER wrote: >> Cyril VELTER wrote: >>> No I'm not. It's not even complied in the server nor in the pg_dump > binary. >>> The server is built on windows using MSYS simply with ./configure && make > all >>> && make install >>> >>> >>> I've been able to reproduce the problem 6 times (at random points in the >>> process, but it never complete successfully). Is there any test I can do to > >>> help investigate the problem ? >> Sorry I haven't gotten back to you for a while. >> >> Yeah, if you can attach a debugger to the backend (assuming you have a >> predictable backend it happens to - but if you're loading, you are using >> a single session, I assume?), add a breakpoint around the area of the >> problem, and get a backtrace from exactly where it shows up, that would >> help. > > > Thanks for your reply. I'll try to do this. I've installed gdb on the > problematic machine and recompiled postgres with debug symbols (configure > --enable-debug) > > I'm not very familiar with gdb. Could you give some direction on setting the > breakpoint. After running gdb on the postgres.exe file, I'm not able to set the > breakpoint (b socket.c:574 give me an error). Hmm, I keep forgetting that. There is some serious black magic required to get gdb to even approach working state on win32. I'm too used to working with the msvc build now. I've never actually got it working myself, but I know others have. Hopefully someone can speak up here? :-) > Searching the source files, it seems the error message is generated in > port/win32/socket.c line 594. Right, but the important thing is which path down to that function is it generated in. Which is why a backtrace would help. Looking at the code, the problem is probably somewhere in pgwin32_recv(). Now, it really shouldn't end up doing what you're seeing, but obviously it is. Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK? Thoughts? //Magnus
[Re] Re: [Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2
From
"Cyril VELTER"
Date:
> Cyril VELTER wrote: > >> Cyril VELTER wrote: > >>> No I'm not. It's not even complied in the server nor in the pg_dump > > binary. > >>> The server is built on windows using MSYS simply with ./configure && make > > all > >>> && make install > >>> > >>> > >>> I've been able to reproduce the problem 6 times (at random points in the > >>> process, but it never complete successfully). Is there any test I can do to > > > >>> help investigate the problem ? > >> Sorry I haven't gotten back to you for a while. > >> > >> Yeah, if you can attach a debugger to the backend (assuming you have a > >> predictable backend it happens to - but if you're loading, you are using > >> a single session, I assume?), add a breakpoint around the area of the > >> problem, and get a backtrace from exactly where it shows up, that would > >> help. > > > > > > Thanks for your reply. I'll try to do this. I've installed gdb on the > > problematic machine and recompiled postgres with debug symbols (configure > > --enable-debug) > > > > I'm not very familiar with gdb. Could you give some direction on setting the > > breakpoint. After running gdb on the postgres.exe file, I'm not able to set the > > breakpoint (b socket.c:574 give me an error). > > Hmm, I keep forgetting that. There is some serious black magic required > to get gdb to even approach working state on win32. I'm too used to > working with the msvc build now. I've never actually got it working > myself, but I know others have. Hopefully someone can speak up here? :-) > I don't have msvc available. > > > Searching the source files, it seems the error message is generated in > > port/win32/socket.c line 594. > > Right, but the important thing is which path down to that function is it > generated in. Which is why a backtrace would help. Yes, I understand that. > > Looking at the code, the problem is probably somewhere in > pgwin32_recv(). Now, it really shouldn't end up doing what you're > seeing, but obviously it is. After looking at the code of pgwin32_recv(), I don't understand why pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument. > > Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK? > Thoughts? I've modified pgwin32_recv() to do that (repeat the pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and not raising this error. I've an upgrade running right now (I will have the result in the next hours). cyril
> > Cyril VELTER wrote: > > > > > > Searching the source files, it seems the error message is generated in > > > port/win32/socket.c line 594. > > > > Right, but the important thing is which path down to that function is it > > generated in. Which is why a backtrace would help. > > Yes, I understand that. > > > > > Looking at the code, the problem is probably somewhere in > > pgwin32_recv(). Now, it really shouldn't end up doing what you're > > seeing, but obviously it is. > > > After looking at the code of pgwin32_recv(), I don't understand why > pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument. > > > > > Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK? > > Thoughts? > > I've modified pgwin32_recv() to do that (repeat the > pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and > not raising this error. I've an upgrade running right now (I will have the > result in the next hours). Replying to myself, the upgrade is not finished yet, but I can confirm that there is cases where pgwin32_waitforsinglesocket() return and the WSARecv immediatly fail. I-ve modified the end of pgwin32_recv() : /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */ for(;;) { if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT, INFINITE) == 0) return -1; r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL); if (r == SOCKET_ERROR) { printf("SOCKERROR"); if (WSAGetLastError() != WSAEWOULDBLOCK) { TranslateSocketError(); return -1; } } else { return b; } } The printf("SOCKERROR") line have been hit two times. Any though ? Once this upgrade is finished, I will make another try removing FD_ACCEPT from the pgwin32_waitforsinglesocket() call. cyril
Cyril VELTER wrote: > >>> Cyril VELTER wrote: >>>> Searching the source files, it seems the error message is generated in >>>> port/win32/socket.c line 594. >>> Right, but the important thing is which path down to that function is it >>> generated in. Which is why a backtrace would help. >> Yes, I understand that. >> >>> Looking at the code, the problem is probably somewhere in >>> pgwin32_recv(). Now, it really shouldn't end up doing what you're >>> seeing, but obviously it is. >> >> After looking at the code of pgwin32_recv(), I don't understand why >> pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument. >> >>> Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK? >>> Thoughts? >> I've modified pgwin32_recv() to do that (repeat the >> pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and > > >> not raising this error. I've an upgrade running right now (I will have the >> result in the next hours). > > > Replying to myself, the upgrade is not finished yet, but I can confirm that > there is cases where pgwin32_waitforsinglesocket() return and the WSARecv > immediatly fail. I-ve modified the end of pgwin32_recv() : > > > /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */ > > for(;;) { > if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT, > INFINITE) == 0) > return -1; > > r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL); > if (r == SOCKET_ERROR) > { > printf("SOCKERROR"); > if (WSAGetLastError() != WSAEWOULDBLOCK) > { > TranslateSocketError(); > return -1; > } > } > else > { > return b; > } > } > > > The printf("SOCKERROR") line have been hit two times. > > Any though ? > > Once this upgrade is finished, I will make another try removing FD_ACCEPT from > > the pgwin32_waitforsinglesocket() call. Hmm. That really isn't supposed to happen, but seems it is. Does it work when you add that loop, though? Spits out the message and works, or does it spit out the message and still not work? I'm also a bit worried about it getting caught in a tight loop if the error codes are wrong, but probably it just goes back into waitfor.. and blocks the second time. Otherwise, you'd see screenfuls of that message. Can you determine if it was hit two times right after each other, or if there was time between them? //Magnus
> Cyril VELTER wrote: > > > >>> Cyril VELTER wrote: > >>>> Searching the source files, it seems the error message is generated in > >>>> port/win32/socket.c line 594. > >>> Right, but the important thing is which path down to that function is it > >>> generated in. Which is why a backtrace would help. > >> Yes, I understand that. > >> > >>> Looking at the code, the problem is probably somewhere in > >>> pgwin32_recv(). Now, it really shouldn't end up doing what you're > >>> seeing, but obviously it is. > >> > >> After looking at the code of pgwin32_recv(), I don't understand why > >> pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument. > >> > >>> Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK? > >>> Thoughts? > >> I've modified pgwin32_recv() to do that (repeat the > >> pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and > > > > > >> not raising this error. I've an upgrade running right now (I will have the > >> result in the next hours). > > > > > > Replying to myself, the upgrade is not finished yet, but I can confirm that > > there is cases where pgwin32_waitforsinglesocket() return and the WSARecv > > immediatly fail. I-ve modified the end of pgwin32_recv() : > > > > > > /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */ > > > > for(;;) { > > if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT, > > INFINITE) == 0) > > return -1; > > > > r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL); > > if (r == SOCKET_ERROR) > > { > > printf("SOCKERROR"); > > if (WSAGetLastError() != WSAEWOULDBLOCK) > > { > > TranslateSocketError(); > > return -1; > > } > > } > > else > > { > > return b; > > } > > } > > > > > > The printf("SOCKERROR") line have been hit two times. > > > > Any though ? > > > > Once this upgrade is finished, I will make another try removing FD_ACCEPT from > > > > the pgwin32_waitforsinglesocket() call. > > Hmm. That really isn't supposed to happen, but seems it is. Does it work > when you add that loop, though? Spits out the message and works, or does > it spit out the message and still not work? OK, I've the results of my tests : With the previous code, then message "SOCKERROR" is printed 5 times during the whole process (100 Gb dump import with psql). There one group of three and one group of two, but I don't have timestamps and am not sure if they are printing in the same loop or not. The import is finally successful. The second test I have done is to remove FD_ACCEPT I still have the message one times, but it still happen. The import is also sucessfull. > > I'm also a bit worried about it getting caught in a tight loop if the > error codes are wrong, but probably it just goes back into waitfor.. and > blocks the second time. Otherwise, you'd see screenfuls of that message. > > Can you determine if it was hit two times right after each other, or if > there was time between them? For the first test I don't known the amount of time between them (I have two groups separeted in the logs with other messages). What do you think ? may be a bug in the windows server installation I have (this machines have not been updated for some times, perhaps I should try to do that and see if the problem is still there. In the long run, I plan to upgrade to windows 2003). cyril
Cyril VELTER wrote: > OK, I've the results of my tests : > > With the previous code, then message "SOCKERROR" is printed 5 times during the > whole process (100 Gb dump import with psql). There one group of three and one > group of two, but I don't have timestamps and am not sure if they are printing > in the same loop or not. The import is finally successful. Ok. > The second test I have done is to remove FD_ACCEPT I still have the message > one times, but it still happen. The import is also sucessfull. Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be. >> I'm also a bit worried about it getting caught in a tight loop if the >> error codes are wrong, but probably it just goes back into waitfor.. and >> blocks the second time. Otherwise, you'd see screenfuls of that message. >> >> Can you determine if it was hit two times right after each other, or if >> there was time between them? > > For the first test I don't known the amount of time between them (I have two > groups separeted in the logs with other messages). Ok. I'm thinking of just sticking a minimal wait in there to protect against absolute runaway, but that should be enough I think. > What do you think ? may be a bug in the windows server installation I have > (this machines have not been updated for some times, perhaps I should try to do > that and see if the problem is still there. In the long run, I plan to upgrade > to windows 2003). I don't *think* it should be a bug with your version, it doesn't look like it. but if you're not on the latest service pack, that's certainly possible. Please update to latest servicepack + updates from Windows Update / WSUS, and let me know if the problem persists. Meanwhile, I'll try to cook up a patch. //Magnus
[Re] Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2
From
"Cyril VELTER"
Date:
De : mailto:magnus@hagander.net > Cyril VELTER wrote: > > OK, I've the results of my tests : > > > > With the previous code, then message "SOCKERROR" is printed 5 times during the > > whole process (100 Gb dump import with psql). There one group of three and one > > group of two, but I don't have timestamps and am not sure if they are printing > > in the same loop or not. The import is finally successful. > > Ok. > > > > The second test I have done is to remove FD_ACCEPT I still have the message > > one times, but it still happen. The import is also sucessfull. > > Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be. > > > >> I'm also a bit worried about it getting caught in a tight loop if the > >> error codes are wrong, but probably it just goes back into waitfor.. and > >> blocks the second time. Otherwise, you'd see screenfuls of that message. > >> > >> Can you determine if it was hit two times right after each other, or if > >> there was time between them? > > > > For the first test I don't known the amount of time between them (I have two > > groups separeted in the logs with other messages). > > Ok. I'm thinking of just sticking a minimal wait in there to protect > against absolute runaway, but that should be enough I think. > > > > What do you think ? may be a bug in the windows server installation I have > > (this machines have not been updated for some times, perhaps I should try to do > > that and see if the problem is still there. In the long run, I plan to upgrade > > to windows 2003). > > I don't *think* it should be a bug with your version, it doesn't look > like it. but if you're not on the latest service pack, that's certainly > possible. Please update to latest servicepack + updates from Windows > Update / WSUS, and let me know if the problem persists. I AM on the latest service pack (on 2k it would be VERY OLD otherwise), but I only do an update with windows update once in a year. I'll schedule an update in the next weeks and keep you informed about the results. > Meanwhile, I'll try to cook up a patch. thanks for your help cyril
On Tue, May 29, 2007 at 11:25:30PM +0200, Magnus Hagander wrote: > > What do you think ? may be a bug in the windows server installation I have > > (this machines have not been updated for some times, perhaps I should try to do > > that and see if the problem is still there. In the long run, I plan to upgrade > > to windows 2003). > > I don't *think* it should be a bug with your version, it doesn't look > like it. but if you're not on the latest service pack, that's certainly > possible. Please update to latest servicepack + updates from Windows > Update / WSUS, and let me know if the problem persists. > > Meanwhile, I'll try to cook up a patch. I have applied a patch for this to HEAD and 8.2. It includes a small wait so we don't hit it too hard, and a limit on 5 retries before we simply give up - so we don't end up in an infinite loop. //Magnus