Thread: Strange result: UNIX vs. TCP/IP sockets
Hi all, We're run into a rather odd problem here, and we're puzzling out what's going on. But while we do, I thought I'd see if anyone else has anything similar to report. This is for 7.2.4 on Solaris 8. We have a query for which EXPLAIN ANALYSE on a local psql connection always returns a time of between about 325 msec and 850 msec (depending on other load, whether the result is in cache, &c. -- this is an aggregates query involving min() and count()). If I connect using -h 127.0.0.1, however, I can _sometimes_ get the query to take as long as 1200 msec. The effect is sporadic (of course. If it were totally predictable, the computing gods wouldn't be having any fun with me), but it is certainly there off and on. (We discovered it because our application is regularly reporting times on this query roughly twice as long as I was able to get with psql, until I connected via TCP/IP.) I'll have more to report as we investigate further -- at the moment, this has cropped up on a production system, and so we're trying to reproduce it in our test environment. Naturally, we're looking at the TCP/IP stack configuration, among other stuff. In the meantime, however, I wondered if anyone knows which bits I ought to be prodding at to look for sub-optimal libraries, &c.; or whether anyone else has run into similar problems on Solaris or elsewhere. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
'K, this is based on "old information", I don't know if Sun changed it 'yet again' ... but, when I was working at the University, one of our IT directors gave me a report that deal with something Sun did (god, I'm so detailed here, eh?) to "mimic" how Microsoft broke the TCP/IP protocol ... the report was in relation to Web services, and how the change actually made Sun/Solaris appear to be slower then Microsoft ... And Sun made this the 'default' setting, but it was disablable in /etc/systems ... Sorry for being so vague, but if I recall correctly, it had something to do with adding an extra ACK to each packet ... maybe even as vague as the above is, it will jar a memory for someone else? On Fri, 4 Jul 2003, Andrew Sullivan wrote: > Hi all, > > We're run into a rather odd problem here, and we're puzzling out > what's going on. But while we do, I thought I'd see if anyone else > has anything similar to report. > > This is for 7.2.4 on Solaris 8. > > We have a query for which EXPLAIN ANALYSE on a local psql connection > always returns a time of between about 325 msec and 850 msec > (depending on other load, whether the result is in cache, &c. -- this > is an aggregates query involving min() and count()). > > If I connect using -h 127.0.0.1, however, I can _sometimes_ get the > query to take as long as 1200 msec. The effect is sporadic (of > course. If it were totally predictable, the computing gods wouldn't > be having any fun with me), but it is certainly there off and on. > (We discovered it because our application is regularly reporting > times on this query roughly twice as long as I was able to get with > psql, until I connected via TCP/IP.) > > I'll have more to report as we investigate further -- at the moment, > this has cropped up on a production system, and so we're trying to > reproduce it in our test environment. Naturally, we're looking at > the TCP/IP stack configuration, among other stuff. In the meantime, > however, I wondered if anyone knows which bits I ought to be prodding > at to look for sub-optimal libraries, &c.; or whether anyone else has > run into similar problems on Solaris or elsewhere. > > A > > -- > ---- > Andrew Sullivan 204-4141 Yonge Street > Liberty RMS Toronto, Ontario Canada > <andrew@libertyrms.info> M2P 2A8 > +1 416 646 3304 x110 > > > ---------------------------(end of broadcast)--------------------------- > TIP 8: explain analyze is your friend > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy Systems Administrator @ hub.org primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org
http://grotto11.com/blog/slash.html?+1039831658 Summary: IE and IIS cheat at TCP level by leaving out various SYN and ACK packets, thereby making IE requests from IIS servers blazingly fast, and making IE requests to non-IIS servers infuriatingly slow. But since this only relates to making and breaking TCP connections, I don't think this is relevant for a larger query time. It's probably normal for a TCP connection to be slightly slower than a unix socket, but I don't think that's wat Andrew is experiencing. On 2003-07-04 14:35:18 -0300, The Hermit Hacker wrote: > > 'K, this is based on "old information", I don't know if Sun changed it > 'yet again' ... but, when I was working at the University, one of our IT > directors gave me a report that deal with something Sun did (god, I'm so > detailed here, eh?) to "mimic" how Microsoft broke the TCP/IP protocol > ... the report was in relation to Web services, and how the change > actually made Sun/Solaris appear to be slower then Microsoft ... > > And Sun made this the 'default' setting, but it was disablable in > /etc/systems ... > > Sorry for being so vague, but if I recall correctly, it had something to > do with adding an extra ACK to each packet ... maybe even as vague as the > above is, it will jar a memory for someone else? > > > On Fri, 4 Jul 2003, Andrew Sullivan wrote: > > > Hi all, > > > > We're run into a rather odd problem here, and we're puzzling out > > what's going on. But while we do, I thought I'd see if anyone else > > has anything similar to report. > > > > This is for 7.2.4 on Solaris 8. > > > > We have a query for which EXPLAIN ANALYSE on a local psql connection > > always returns a time of between about 325 msec and 850 msec > > (depending on other load, whether the result is in cache, &c. -- this > > is an aggregates query involving min() and count()). > > > > If I connect using -h 127.0.0.1, however, I can _sometimes_ get the > > query to take as long as 1200 msec. The effect is sporadic (of > > course. If it were totally predictable, the computing gods wouldn't > > be having any fun with me), but it is certainly there off and on. > > (We discovered it because our application is regularly reporting > > times on this query roughly twice as long as I was able to get with > > psql, until I connected via TCP/IP.) > > > > I'll have more to report as we investigate further -- at the moment, > > this has cropped up on a production system, and so we're trying to > > reproduce it in our test environment. Naturally, we're looking at > > the TCP/IP stack configuration, among other stuff. In the meantime, > > however, I wondered if anyone knows which bits I ought to be prodding > > at to look for sub-optimal libraries, &c.; or whether anyone else has > > run into similar problems on Solaris or elsewhere. > > > > A > > > > -- > > ---- > > Andrew Sullivan 204-4141 Yonge Street > > Liberty RMS Toronto, Ontario Canada > > <andrew@libertyrms.info> M2P 2A8 > > +1 416 646 3304 x110 > > > > > > ---------------------------(end of broadcast)--------------------------- > > TIP 8: explain analyze is your friend > > > > Marc G. Fournier ICQ#7615664 IRC Nick: Scrappy > Systems Administrator @ hub.org > primary: scrappy@hub.org secondary: scrappy@{freebsd|postgresql}.org > > ---------------------------(end of broadcast)--------------------------- > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org Vincent van Leeuwen Media Design - http://www.mediadesign.nl/
> If I connect using -h 127.0.0.1, however, I can _sometimes_ get the > query to take as long as 1200 msec. The effect is sporadic (of SSL plays havoc with our system when using local loopback for the host on both Solaris 7 and 8. It was probably key renegotiation which 7.4 has addressed.
Attachment
On Fri, Jul 04, 2003 at 07:55:12PM +0200, Vincent van Leeuwen wrote: > But since this only relates to making and breaking TCP connections, > I don't think this is relevant for a larger query time. It's > probably normal for a TCP connection to be slightly slower than a > unix socket, but I don't think that's wat Andrew is experiencing. No, it's not. And my colleague Sorin Iszlai pointed out to me something else about it: we're getting different numbers reported by EXPLAIN ANALYSE itself. How is that even possible? If we try it here on a moderately-loaded Sun box, it seems we're able to reproduce it, as well. How could it be the transport affects the time for the query as reported by the back end? A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Andrew Sullivan <andrew@libertyrms.info> writes: > How could it be the transport affects the time for the query as > reported by the back end? How much data is being sent back by the query? Do you have SSL enabled? SSL encryption overhead is nontrivial, especially if any renegotiations happen. regards, tom lane
On Fri, Jul 04, 2003 at 05:47:27PM -0400, Tom Lane wrote: > Andrew Sullivan <andrew@libertyrms.info> writes: > > How could it be the transport affects the time for the query as > > reported by the back end? > > How much data is being sent back by the query? In this case, it's an all-aggregate query: select count(*), min(id) from sometable where owner = int4; (Yeah, yeah, I know. I didn't write it.) But it's the EXPLAIN ANALYSE that's reporting different times depending on the transport. That's what I find so strange. > Do you have SSL enabled? SSL encryption overhead is nontrivial, > especially if any renegotiations happen. No. -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110
Hi all, You may remember in my last report, I said that it appeared that TCP/IP connections caused EXPLAIN ANALYSE to return (repeatably but not consistently) slower times than when connected over UNIX domain sockets. This turns out to be false. We (well, Chris Browne, actually) ran some tests which demonstrated that the performance problem turned up over the UNIX socket, as well. It was just a statistical fluke that our smaller sample always found the problem on TCP/IP. Of course, now we have some other work to do, but we can rule out the transport at least. Chalk one up for sane results. If we discover any more, I'll post it here. A -- ---- Andrew Sullivan 204-4141 Yonge Street Liberty RMS Toronto, Ontario Canada <andrew@libertyrms.info> M2P 2A8 +1 416 646 3304 x110