Thread: "incomplete startup packet" on SGI
I have a working 8.1 server running on Linux and I can connect to it from other Linux clients. I built postgresql 8.1 on an SGI (using --without-readline but otherwise stock) and it compiled OK and installed fine. But when I try to connect to the Linux server I get "could not send startup packet: transport endpoint is not connected" on the client end and "incomplete startup packet" on the server end. Connectivity between the two machines is working. I could find basically no useful references to the former and the only references to the latter were portscans and the like. Browsing the source, I see a couple places that message could come from. One relates to SSL, which the output from configure says is turned off on both client and server. The other is just a generic comm error--but would could cause a partial failure like that?
Just finished building and installing on *Sun* (also "--without-readline", not that I think that could be the issue): Works fine. So it's something to do with the SGI build in particular. David Rysdam wrote: > I have a working 8.1 server running on Linux and I can connect to it > from other Linux clients. I built postgresql 8.1 on an SGI (using > --without-readline but otherwise stock) and it compiled OK and > installed fine. But when I try to connect to the Linux server I get > "could not send startup packet: transport endpoint is not connected" > on the client end and "incomplete startup packet" on the server end. > Connectivity between the two machines is working. > > I could find basically no useful references to the former and the only > references to the latter were portscans and the like. > > Browsing the source, I see a couple places that message could come > from. One relates to SSL, which the output from configure says is > turned off on both client and server. The other is just a generic > comm error--but would could cause a partial failure like that? > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq > >
David Rysdam <drysdam@ll.mit.edu> writes: > Just finished building and installing on *Sun* (also > "--without-readline", not that I think that could be the issue): Works > fine. So it's something to do with the SGI build in particular. IRIX buggy, film at 11. :) -Doug
David Rysdam <drysdam@ll.mit.edu> writes: > Just finished building and installing on *Sun* (also > "--without-readline", not that I think that could be the issue): Works > fine. So it's something to do with the SGI build in particular. More likely it's something to do with weird behavior of the SGI kernel's TCP stack. I did a little googling for "transport endpoint is not connected" without turning up anything obviously related, but that or ENOTCONN is probably what you need to search on. regards, tom lane
Tom Lane wrote: >David Rysdam <drysdam@ll.mit.edu> writes: > > >>Just finished building and installing on *Sun* (also >>"--without-readline", not that I think that could be the issue): Works >>fine. So it's something to do with the SGI build in particular. >> >> > >More likely it's something to do with weird behavior of the SGI kernel's >TCP stack. I did a little googling for "transport endpoint is not >connected" without turning up anything obviously related, but that or >ENOTCONN is probably what you need to search on. > > regards, tom lane > >---------------------------(end of broadcast)--------------------------- >TIP 2: Don't 'kill -9' the postmaster > > > > It's acting like a race condition or pointer problem. When I add random debug printfs/PQflushs to libpq it sometimes works.
David Rysdam wrote: > Tom Lane wrote: > >> David Rysdam <drysdam@ll.mit.edu> writes: >> >> >>> Just finished building and installing on *Sun* (also >>> "--without-readline", not that I think that could be the issue): >>> Works fine. So it's something to do with the SGI build in particular. >>> >> >> >> More likely it's something to do with weird behavior of the SGI kernel's >> TCP stack. I did a little googling for "transport endpoint is not >> connected" without turning up anything obviously related, but that or >> ENOTCONN is probably what you need to search on. >> >> regards, tom lane >> >> ---------------------------(end of broadcast)--------------------------- >> TIP 2: Don't 'kill -9' the postmaster >> >> >> >> > It's acting like a race condition or pointer problem. When I add > random debug printfs/PQflushs to libpq it sometimes works. > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > Not a race condition: No threads Not a memory leak: Electric fence says nothing. And it works when electric fence is running, whereas a binary that uses the same libpq without linking efence does not work.
David Rysdam wrote: > David Rysdam wrote: > >> Tom Lane wrote: >> >>> David Rysdam <drysdam@ll.mit.edu> writes: >>> >>> >>>> Just finished building and installing on *Sun* (also >>>> "--without-readline", not that I think that could be the issue): >>>> Works fine. So it's something to do with the SGI build in particular. >>>> >>> >>> >>> >>> More likely it's something to do with weird behavior of the SGI >>> kernel's >>> TCP stack. I did a little googling for "transport endpoint is not >>> connected" without turning up anything obviously related, but that or >>> ENOTCONN is probably what you need to search on. >>> >>> regards, tom lane >>> >>> ---------------------------(end of >>> broadcast)--------------------------- >>> TIP 2: Don't 'kill -9' the postmaster >>> >>> >>> >>> >> It's acting like a race condition or pointer problem. When I add >> random debug printfs/PQflushs to libpq it sometimes works. >> ---------------------------(end of broadcast)--------------------------- >> TIP 9: In versions below 8.0, the planner will ignore your desire to >> choose an index scan if your joining column's datatypes do not >> match >> > Not a race condition: No threads > Not a memory leak: Electric fence says nothing. And it works when > electric fence is running, whereas a binary that uses the same libpq > without linking efence does not work. > I know nobody is interested in this, but I think I should document the "solution" for anyone who finds this thread in the archives: My theory is that Irix is unable to keep up with how fast the postgresql client is going and that the debug statements/efence stuff are slowing it down enough that Irix can catch up and make sure the socket really is there, connected and working. To that end, I inserted a sleep(1) in fe-connect.c just before the pqPacketSend(...startpacket...) stuff. It's stupid and hacky, but gets me where I need to be and maybe this hint will inspire somebody who knows (and cares) about Irix to find a real fix.