Thread: "incomplete startup packet" on SGI

"incomplete startup packet" on SGI

From
David Rysdam
Date:
I have a working 8.1 server running on Linux and I can connect to it
from other Linux clients.  I built postgresql 8.1 on an SGI (using
--without-readline but otherwise stock) and it compiled OK and installed
fine.  But when I try to connect to the Linux server I get "could not
send startup packet: transport endpoint is not connected" on the client
end and "incomplete startup packet" on the server end.  Connectivity
between the two machines is working.

I could find basically no useful references to the former and the only
references to the latter were portscans and the like.

Browsing the source, I see a couple places that message could come
from.  One relates to SSL, which the output from configure says is
turned off on both client and server.  The other is just a generic comm
error--but would could cause a partial failure like that?

Re: "incomplete startup packet" on SGI

From
David Rysdam
Date:
Just finished building and installing on *Sun* (also
"--without-readline", not that I think that could be the issue): Works
fine.  So it's something to do with the SGI build in particular.

David Rysdam wrote:

> I have a working 8.1 server running on Linux and I can connect to it
> from other Linux clients.  I built postgresql 8.1 on an SGI (using
> --without-readline but otherwise stock) and it compiled OK and
> installed fine.  But when I try to connect to the Linux server I get
> "could not send startup packet: transport endpoint is not connected"
> on the client end and "incomplete startup packet" on the server end.
> Connectivity between the two machines is working.
>
> I could find basically no useful references to the former and the only
> references to the latter were portscans and the like.
>
> Browsing the source, I see a couple places that message could come
> from.  One relates to SSL, which the output from configure says is
> turned off on both client and server.  The other is just a generic
> comm error--but would could cause a partial failure like that?
>
> ---------------------------(end of broadcast)---------------------------
> TIP 3: Have you checked our extensive FAQ?
>
>               http://www.postgresql.org/docs/faq
>
>


Re: "incomplete startup packet" on SGI

From
Douglas McNaught
Date:
David Rysdam <drysdam@ll.mit.edu> writes:

> Just finished building and installing on *Sun* (also
> "--without-readline", not that I think that could be the issue): Works
> fine.  So it's something to do with the SGI build in particular.

IRIX buggy, film at 11.  :)

-Doug

Re: "incomplete startup packet" on SGI

From
Tom Lane
Date:
David Rysdam <drysdam@ll.mit.edu> writes:
> Just finished building and installing on *Sun* (also
> "--without-readline", not that I think that could be the issue): Works
> fine.  So it's something to do with the SGI build in particular.

More likely it's something to do with weird behavior of the SGI kernel's
TCP stack.  I did a little googling for "transport endpoint is not
connected" without turning up anything obviously related, but that or
ENOTCONN is probably what you need to search on.

            regards, tom lane

Re: "incomplete startup packet" on SGI

From
David Rysdam
Date:
Tom Lane wrote:

>David Rysdam <drysdam@ll.mit.edu> writes:
>
>
>>Just finished building and installing on *Sun* (also
>>"--without-readline", not that I think that could be the issue): Works
>>fine.  So it's something to do with the SGI build in particular.
>>
>>
>
>More likely it's something to do with weird behavior of the SGI kernel's
>TCP stack.  I did a little googling for "transport endpoint is not
>connected" without turning up anything obviously related, but that or
>ENOTCONN is probably what you need to search on.
>
>            regards, tom lane
>
>---------------------------(end of broadcast)---------------------------
>TIP 2: Don't 'kill -9' the postmaster
>
>
>
>
It's acting like a race condition or pointer problem.  When I add random
debug printfs/PQflushs to libpq it sometimes works.

Re: "incomplete startup packet" on SGI

From
David Rysdam
Date:
David Rysdam wrote:

> Tom Lane wrote:
>
>> David Rysdam <drysdam@ll.mit.edu> writes:
>>
>>
>>> Just finished building and installing on *Sun* (also
>>> "--without-readline", not that I think that could be the issue):
>>> Works fine.  So it's something to do with the SGI build in particular.
>>>
>>
>>
>> More likely it's something to do with weird behavior of the SGI kernel's
>> TCP stack.  I did a little googling for "transport endpoint is not
>> connected" without turning up anything obviously related, but that or
>> ENOTCONN is probably what you need to search on.
>>
>>             regards, tom lane
>>
>> ---------------------------(end of broadcast)---------------------------
>> TIP 2: Don't 'kill -9' the postmaster
>>
>>
>>
>>
> It's acting like a race condition or pointer problem.  When I add
> random debug printfs/PQflushs to libpq it sometimes works.
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>
Not a race condition: No threads
Not a memory leak: Electric fence says nothing.  And it works when
electric fence is running, whereas a binary that uses the same libpq
without linking efence does not work.


Re: "incomplete startup packet" on SGI

From
David Rysdam
Date:
David Rysdam wrote:

> David Rysdam wrote:
>
>> Tom Lane wrote:
>>
>>> David Rysdam <drysdam@ll.mit.edu> writes:
>>>
>>>
>>>> Just finished building and installing on *Sun* (also
>>>> "--without-readline", not that I think that could be the issue):
>>>> Works fine.  So it's something to do with the SGI build in particular.
>>>>
>>>
>>>
>>>
>>> More likely it's something to do with weird behavior of the SGI
>>> kernel's
>>> TCP stack.  I did a little googling for "transport endpoint is not
>>> connected" without turning up anything obviously related, but that or
>>> ENOTCONN is probably what you need to search on.
>>>
>>>             regards, tom lane
>>>
>>> ---------------------------(end of
>>> broadcast)---------------------------
>>> TIP 2: Don't 'kill -9' the postmaster
>>>
>>>
>>>
>>>
>> It's acting like a race condition or pointer problem.  When I add
>> random debug printfs/PQflushs to libpq it sometimes works.
>> ---------------------------(end of broadcast)---------------------------
>> TIP 9: In versions below 8.0, the planner will ignore your desire to
>>       choose an index scan if your joining column's datatypes do not
>>       match
>>
> Not a race condition: No threads
> Not a memory leak: Electric fence says nothing.  And it works when
> electric fence is running, whereas a binary that uses the same libpq
> without linking efence does not work.
>
I know nobody is interested in this, but I think I should document the
"solution" for anyone who finds this thread in the archives:  My theory
is that Irix is unable to keep up with how fast the postgresql client is
going and that the debug statements/efence stuff are slowing it down
enough that Irix can catch up and make sure the socket really is there,
connected and working.  To that end, I inserted a sleep(1) in
fe-connect.c just before the pqPacketSend(...startpacket...) stuff.
It's stupid and hacky, but gets me where I need to be and maybe this
hint will inspire somebody who knows (and cares) about Irix to find a
real fix.