Thread: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
"Cyril VELTER"
Date:

    I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
postgres 8.2 on windows 2000 Server (both version running on the same machine
on different ports). During the migration process, I always get an error at
some point (never the same) :

    LOG: could not receive data from client: Unknown winsock error 10035

    which is followed by

    LOG: incomplete message from client
    ERROR: unexpected EOF on a client connexion
    FATAL: invalid frontend message type 53 psql -U postgres -p 5433


    Moving the 8.2 postgres instance to a winxp pro machine, the migration is
successfull.

    I've searched google but didn't find anything related to postgres.


    cyril


    Source database : postgres 8.0.9
    Destination database : postgres 8.2.4
    OS : Windows 2000 Server SP4
    migration is done on linux using 8.2.4 binaries (since piping pg_dump output
on windows stop on the first ctrl-Z) with "pg_dump -h YYY XXX | psql -h YYY -p
5433 XXX"


Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
Magnus Hagander
Date:
Cyril VELTER wrote:
>
>     I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
> postgres 8.2 on windows 2000 Server (both version running on the same machine
> on different ports). During the migration process, I always get an error at
> some point (never the same) :

Interesting. 10035 is "A non-blocking socket operation could not be
completed immediatly".
Question: Does this error come fromthe 8.0 or the 8.2 server?

Also, do you use SSL?

//Magnus


[Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
"Cyril VELTER"
Date:
> magnus@hagander.net wrote :
> Cyril VELTER wrote:
> >
> >     I'm trying to upgrade a pretty big database (60G) from postgres 8.0 to
> > postgres 8.2 on windows 2000 Server (both version running on the same
machine
> > on different ports). During the migration process, I always get an error at

> > some point (never the same) :
>
> Interesting. 10035 is "A non-blocking socket operation could not be
> completed immediatly".
> Question: Does this error come fromthe 8.0 or the 8.2 server?


    It comes from the 8.2 server message log


>
> Also, do you use SSL?


    No I'm not. It's not even complied in the server nor in the pg_dump binary.

    The server is built on windows using MSYS simply with ./configure && make all
&& make install


    I've been able to reproduce the problem 6 times (at random points in the
process, but it never complete successfully). Is there any test I can do to
help investigate the problem ?


    cyril
>
> //Magnus
>


Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
Magnus Hagander
Date:
Cyril VELTER wrote:
>     No I'm not. It's not even complied in the server nor in the pg_dump binary.
>
>     The server is built on windows using MSYS simply with ./configure && make all
> && make install
>
>
>     I've been able to reproduce the problem 6 times (at random points in the
> process, but it never complete successfully). Is there any test I can do to
> help investigate the problem ?

Sorry I haven't gotten back to you for a while.

Yeah, if you can attach a debugger to the backend (assuming you have a
predictable backend it happens to - but if you're loading, you are using
a single session, I assume?), add a breakpoint around the area of the
problem, and get a backtrace from exactly where it shows up, that would
help.

//Magnus


> Cyril VELTER wrote:
> >     No I'm not. It's not even complied in the server nor in the pg_dump
binary.
> >
> >     The server is built on windows using MSYS simply with ./configure && make
all
> > && make install
> >
> >
> >     I've been able to reproduce the problem 6 times (at random points in the
> > process, but it never complete successfully). Is there any test I can do to

> > help investigate the problem ?
>
> Sorry I haven't gotten back to you for a while.
>
> Yeah, if you can attach a debugger to the backend (assuming you have a
> predictable backend it happens to - but if you're loading, you are using
> a single session, I assume?), add a breakpoint around the area of the
> problem, and get a backtrace from exactly where it shows up, that would
> help.


    Thanks for your reply. I'll try to do this. I've installed gdb on the
problematic machine and recompiled postgres with debug symbols (configure
--enable-debug)

    I'm not very familiar with gdb. Could you give some direction on setting the
breakpoint. After running gdb on the postgres.exe file, I'm not able to set the
breakpoint (b socket.c:574 give me an error).

    Searching the source files, it seems the error message is generated in
port/win32/socket.c line 594.

    Thanks,

    cyril


Cyril VELTER wrote:
>> Cyril VELTER wrote:
>>>     No I'm not. It's not even complied in the server nor in the pg_dump
> binary.
>>>     The server is built on windows using MSYS simply with ./configure && make
> all
>>> && make install
>>>
>>>
>>>     I've been able to reproduce the problem 6 times (at random points in the
>>> process, but it never complete successfully). Is there any test I can do to
>
>>> help investigate the problem ?
>> Sorry I haven't gotten back to you for a while.
>>
>> Yeah, if you can attach a debugger to the backend (assuming you have a
>> predictable backend it happens to - but if you're loading, you are using
>> a single session, I assume?), add a breakpoint around the area of the
>> problem, and get a backtrace from exactly where it shows up, that would
>> help.
>
>
>     Thanks for your reply. I'll try to do this. I've installed gdb on the
> problematic machine and recompiled postgres with debug symbols (configure
> --enable-debug)
>
>     I'm not very familiar with gdb. Could you give some direction on setting the
> breakpoint. After running gdb on the postgres.exe file, I'm not able to set the
> breakpoint (b socket.c:574 give me an error).

Hmm, I keep forgetting that. There is some serious black magic required
to get gdb to even approach working state on win32. I'm too used to
working with the msvc build now. I've never actually got it working
myself, but I know others have. Hopefully someone can speak up here? :-)


>     Searching the source files, it seems the error message is generated in
> port/win32/socket.c line 594.

Right, but the important thing is which path down to that function is it
generated in. Which is why a backtrace would help.

Looking at the code, the problem is probably somewhere in
pgwin32_recv(). Now, it really shouldn't end up doing what you're
seeing, but obviously it is.

Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
Thoughts?

//Magnus


> Cyril VELTER wrote:
> >> Cyril VELTER wrote:
> >>>     No I'm not. It's not even complied in the server nor in the pg_dump
> > binary.
> >>>     The server is built on windows using MSYS simply with ./configure &&
make
> > all
> >>> && make install
> >>>
> >>>
> >>>     I've been able to reproduce the problem 6 times (at random points in the

> >>> process, but it never complete successfully). Is there any test I can do
to
> >
> >>> help investigate the problem ?
> >> Sorry I haven't gotten back to you for a while.
> >>
> >> Yeah, if you can attach a debugger to the backend (assuming you have a
> >> predictable backend it happens to - but if you're loading, you are using
> >> a single session, I assume?), add a breakpoint around the area of the
> >> problem, and get a backtrace from exactly where it shows up, that would
> >> help.
> >
> >
> >     Thanks for your reply. I'll try to do this. I've installed gdb on the
> > problematic machine and recompiled postgres with debug symbols (configure
> > --enable-debug)
> >
> >     I'm not very familiar with gdb. Could you give some direction on setting
the
> > breakpoint. After running gdb on the postgres.exe file, I'm not able to set
the
> > breakpoint (b socket.c:574 give me an error).
>
> Hmm, I keep forgetting that. There is some serious black magic required
> to get gdb to even approach working state on win32. I'm too used to
> working with the msvc build now. I've never actually got it working
> myself, but I know others have. Hopefully someone can speak up here? :-)
>

    I don't have msvc available.

>
> >     Searching the source files, it seems the error message is generated in
> > port/win32/socket.c line 594.
>
> Right, but the important thing is which path down to that function is it
> generated in. Which is why a backtrace would help.

    Yes, I understand that.

>
> Looking at the code, the problem is probably somewhere in
> pgwin32_recv(). Now, it really shouldn't end up doing what you're
> seeing, but obviously it is.


    After looking at the code of pgwin32_recv(), I don't understand why
pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.

>
> Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
> Thoughts?

    I've modified pgwin32_recv() to do that (repeat the
pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and
not raising this error. I've an upgrade running right now (I will have the
result in the next hours).


    cyril


Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
"Cyril VELTER"
Date:

> > Cyril VELTER wrote:
>
> >
> > >     Searching the source files, it seems the error message is generated in
> > > port/win32/socket.c line 594.
> >
> > Right, but the important thing is which path down to that function is it
> > generated in. Which is why a backtrace would help.
>
>     Yes, I understand that.
>
> >
> > Looking at the code, the problem is probably somewhere in
> > pgwin32_recv(). Now, it really shouldn't end up doing what you're
> > seeing, but obviously it is.
>
>
>     After looking at the code of pgwin32_recv(), I don't understand why
> pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.
>
> >
> > Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
> > Thoughts?
>
>     I've modified pgwin32_recv() to do that (repeat the
> pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and


> not raising this error. I've an upgrade running right now (I will have the
> result in the next hours).


    Replying to myself, the upgrade is not finished yet, but I can confirm that
there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
immediatly fail. I-ve modified the end of pgwin32_recv() :


    /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */

    for(;;) {
        if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
                                        INFINITE) == 0)
            return -1;

        r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
        if (r == SOCKET_ERROR)
        {
            printf("SOCKERROR");
            if (WSAGetLastError() != WSAEWOULDBLOCK)
            {
                TranslateSocketError();
                return -1;
            }
        }
        else
        {
            return b;
        }
    }


    The printf("SOCKERROR") line have been hit two times.

    Any though ?

    Once this upgrade is finished, I will make another try removing FD_ACCEPT from

the pgwin32_waitforsinglesocket() call.


    cyril


Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
Magnus Hagander
Date:
Cyril VELTER wrote:
>
>>> Cyril VELTER wrote:
>>>>     Searching the source files, it seems the error message is generated in
>>>> port/win32/socket.c line 594.
>>> Right, but the important thing is which path down to that function is it
>>> generated in. Which is why a backtrace would help.
>>     Yes, I understand that.
>>
>>> Looking at the code, the problem is probably somewhere in
>>> pgwin32_recv(). Now, it really shouldn't end up doing what you're
>>> seeing, but obviously it is.
>>
>>     After looking at the code of pgwin32_recv(), I don't understand why
>> pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.
>>
>>> Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
>>> Thoughts?
>>     I've modified pgwin32_recv() to do that (repeat the
>> pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK and
>
>
>> not raising this error. I've an upgrade running right now (I will have the
>> result in the next hours).
>
>
>     Replying to myself, the upgrade is not finished yet, but I can confirm that
> there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
> immediatly fail. I-ve modified the end of pgwin32_recv() :
>
>
>     /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */
>
>     for(;;) {
>         if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
>                                         INFINITE) == 0)
>             return -1;
>
>         r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
>         if (r == SOCKET_ERROR)
>         {
>             printf("SOCKERROR");
>             if (WSAGetLastError() != WSAEWOULDBLOCK)
>             {
>                 TranslateSocketError();
>                 return -1;
>             }
>         }
>         else
>         {
>             return b;
>         }
>     }
>
>
>     The printf("SOCKERROR") line have been hit two times.
>
>     Any though ?
>
>     Once this upgrade is finished, I will make another try removing FD_ACCEPT from
>
> the pgwin32_waitforsinglesocket() call.

Hmm. That really isn't supposed to happen, but seems it is. Does it work
when you add that loop, though? Spits out the message and works, or does
it spit out the message and still not work?

I'm also a bit worried about it getting caught in a tight loop if the
error codes are wrong, but probably it just goes back into waitfor.. and
blocks the second time. Otherwise, you'd see screenfuls of that message.

Can you determine if it was hit two times right after each other, or if
there was time between them?

//Magnus

[Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
"Cyril VELTER"
Date:
> Cyril VELTER wrote:
> >
> >>> Cyril VELTER wrote:
> >>>>     Searching the source files, it seems the error message is generated in
> >>>> port/win32/socket.c line 594.
> >>> Right, but the important thing is which path down to that function is it
> >>> generated in. Which is why a backtrace would help.
> >>     Yes, I understand that.
> >>
> >>> Looking at the code, the problem is probably somewhere in
> >>> pgwin32_recv(). Now, it really shouldn't end up doing what you're
> >>> seeing, but obviously it is.
> >>
> >>     After looking at the code of pgwin32_recv(), I don't understand why
> >> pgwin32_waitforsinglesocket() is called with the FD_ACCEPT argument.
> >>
> >>> Perhaps we just need to have it retry if it gets the WSAEWOULDBLOCK?
> >>> Thoughts?
> >>     I've modified pgwin32_recv() to do that (repeat the
> >> pgwin32_waitforsinglesocket() / WSARecv while the error is WSAEWOULDBLOCK
and
> >
> >
> >> not raising this error. I've an upgrade running right now (I will have the

> >> result in the next hours).
> >
> >
> >     Replying to myself, the upgrade is not finished yet, but I can confirm
that
> > there is cases where pgwin32_waitforsinglesocket() return and the WSARecv
> > immediatly fail. I-ve modified the end of pgwin32_recv() :
> >
> >
> >     /* No error, zero bytes (win2000+) or error+WSAEWOULDBLOCK (<=nt4) */
> >
> >     for(;;) {
> >         if (pgwin32_waitforsinglesocket(s, FD_READ | FD_CLOSE | FD_ACCEPT,
> >                                         INFINITE) == 0)
> >             return -1;
> >
> >         r = WSARecv(s, &wbuf, 1, &b, &flags, NULL, NULL);
> >         if (r == SOCKET_ERROR)
> >         {
> >             printf("SOCKERROR");
> >             if (WSAGetLastError() != WSAEWOULDBLOCK)
> >             {
> >                 TranslateSocketError();
> >                 return -1;
> >             }
> >         }
> >         else
> >         {
> >             return b;
> >         }
> >     }
> >
> >
> >     The printf("SOCKERROR") line have been hit two times.
> >
> >     Any though ?
> >
> >     Once this upgrade is finished, I will make another try removing FD_ACCEPT
from
> >
> > the pgwin32_waitforsinglesocket() call.
>
> Hmm. That really isn't supposed to happen, but seems it is. Does it work
> when you add that loop, though? Spits out the message and works, or does
> it spit out the message and still not work?


    OK, I've the results of my tests :

    With the previous code, then message "SOCKERROR" is printed 5 times during the
whole process (100 Gb dump import with psql). There one group of three and one
group of two, but I don't have timestamps and am not sure if they are printing
in the same loop or not. The import is finally successful.

    The second test I have done is to remove FD_ACCEPT I still have the message
one times, but it still happen. The import is also sucessfull.


>
> I'm also a bit worried about it getting caught in a tight loop if the
> error codes are wrong, but probably it just goes back into waitfor.. and
> blocks the second time. Otherwise, you'd see screenfuls of that message.
>
> Can you determine if it was hit two times right after each other, or if
> there was time between them?

    For the first test I don't known the amount of time between them (I have two
groups separeted in the logs with other messages).


    What do you think ? may be a bug in the windows server installation I have
(this machines have not been updated for some times, perhaps I should try to do
that and see if the problem is still there. In the long run, I plan to upgrade
to windows 2003).


    cyril


Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
Magnus Hagander
Date:
Cyril VELTER wrote:
>     OK, I've the results of my tests :
>
>     With the previous code, then message "SOCKERROR" is printed 5 times during the
> whole process (100 Gb dump import with psql). There one group of three and one
> group of two, but I don't have timestamps and am not sure if they are printing
> in the same loop or not. The import is finally successful.

Ok.


>     The second test I have done is to remove FD_ACCEPT I still have the message
> one times, but it still happen. The import is also sucessfull.

Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be.


>> I'm also a bit worried about it getting caught in a tight loop if the
>> error codes are wrong, but probably it just goes back into waitfor.. and
>> blocks the second time. Otherwise, you'd see screenfuls of that message.
>>
>> Can you determine if it was hit two times right after each other, or if
>> there was time between them?
>
>     For the first test I don't known the amount of time between them (I have two
> groups separeted in the logs with other messages).

Ok. I'm thinking of just sticking a minimal wait in there to protect
against absolute runaway, but that should be enough I think.


>     What do you think ? may be a bug in the windows server installation I have
> (this machines have not been updated for some times, perhaps I should try to do
> that and see if the problem is still there. In the long run, I plan to upgrade
> to windows 2003).

I don't *think* it should be a bug with your version, it doesn't look
like it. but if you're not on the latest service pack, that's certainly
possible. Please update to latest servicepack + updates from Windows
Update / WSUS, and let me know if the problem persists.

Meanwhile, I'll try to cook up a patch.

//Magnus

De : mailto:magnus@hagander.net
> Cyril VELTER wrote:
> >     OK, I've the results of my tests :
> >
> >     With the previous code, then message "SOCKERROR" is printed 5 times during
the
> > whole process (100 Gb dump import with psql). There one group of three and
one
> > group of two, but I don't have timestamps and am not sure if they are
printing
> > in the same loop or not. The import is finally successful.
>
> Ok.
>
>
> >     The second test I have done is to remove FD_ACCEPT I still have the
message
> > one times, but it still happen. The import is also sucessfull.
>
> Ok. So FD_ACCEPT is not the fix. Good, I didn't think it would be.
>
>
> >> I'm also a bit worried about it getting caught in a tight loop if the
> >> error codes are wrong, but probably it just goes back into waitfor.. and
> >> blocks the second time. Otherwise, you'd see screenfuls of that message.
> >>
> >> Can you determine if it was hit two times right after each other, or if
> >> there was time between them?
> >
> >     For the first test I don't known the amount of time between them (I have
two
> > groups separeted in the logs with other messages).
>
> Ok. I'm thinking of just sticking a minimal wait in there to protect
> against absolute runaway, but that should be enough I think.
>
>
> >     What do you think ? may be a bug in the windows server installation I have

> > (this machines have not been updated for some times, perhaps I should try
to do
> > that and see if the problem is still there. In the long run, I plan to
upgrade
> > to windows 2003).
>
> I don't *think* it should be a bug with your version, it doesn't look
> like it. but if you're not on the latest service pack, that's certainly
> possible. Please update to latest servicepack + updates from Windows
> Update / WSUS, and let me know if the problem persists.

    I AM on the latest service pack (on 2k it would be VERY OLD otherwise), but I
only do an update with windows update once in a year. I'll schedule an update
in the next weeks and keep you informed about the results.

> Meanwhile, I'll try to cook up a patch.


    thanks for your help

    cyril


Re: [Re] Re: Winsock error 10035 while trying to upgrade from 8.0 to 8.2

From
Magnus Hagander
Date:
On Tue, May 29, 2007 at 11:25:30PM +0200, Magnus Hagander wrote:
> >     What do you think ? may be a bug in the windows server installation I have
> > (this machines have not been updated for some times, perhaps I should try to do
> > that and see if the problem is still there. In the long run, I plan to upgrade
> > to windows 2003).
>
> I don't *think* it should be a bug with your version, it doesn't look
> like it. but if you're not on the latest service pack, that's certainly
> possible. Please update to latest servicepack + updates from Windows
> Update / WSUS, and let me know if the problem persists.
>
> Meanwhile, I'll try to cook up a patch.

I have applied a patch for this to HEAD and 8.2. It includes a small wait
so we don't hit it too hard, and a limit on 5 retries before we simply give
up - so we don't end up in an infinite loop.

//Magnus