Thread: Strange result: UNIX vs. TCP/IP sockets

Strange result: UNIX vs. TCP/IP sockets

From
Andrew Sullivan
Date:
Hi all,

We're run into a rather odd problem here, and we're puzzling out
what's going on.  But while we do, I thought I'd see if anyone else
has anything similar to report.

This is for 7.2.4 on Solaris 8.

We have a query for which EXPLAIN ANALYSE on a local psql connection
always returns a time of between about 325 msec and 850 msec
(depending on other load, whether the result is in cache, &c. -- this
is an aggregates query involving min() and count()).

If I connect using -h 127.0.0.1, however, I can _sometimes_ get the
query to take as long as 1200 msec.  The effect is sporadic (of
course.  If it were totally predictable, the computing gods wouldn't
be having any fun with me), but it is certainly there off and on.
(We discovered it because our application is regularly reporting
times on this query roughly twice as long as I was able to get with
psql, until I connected via TCP/IP.)

I'll have more to report as we investigate further -- at the moment,
this has cropped up on a production system, and so we're trying to
reproduce it in our test environment.  Naturally, we're looking at
the TCP/IP stack configuration, among other stuff.  In the meantime,
however, I wondered if anyone knows which bits I ought to be prodding
at to look for sub-optimal libraries, &c.; or whether anyone else has
run into similar problems on Solaris or elsewhere.

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Strange result: UNIX vs. TCP/IP sockets

From
The Hermit Hacker
Date:
'K, this is based on "old information", I don't know if Sun changed it
'yet again' ... but, when I was working at the University, one of our IT
directors gave me a report that deal with something Sun did (god, I'm so
detailed  here, eh?) to "mimic" how Microsoft broke the TCP/IP protocol
... the report was in relation to Web services, and how the change
actually made Sun/Solaris appear to be slower then Microsoft ...

And Sun made this the 'default' setting, but it was disablable in
/etc/systems ...

Sorry for being so vague, but if I recall correctly, it had something to
do with adding an extra ACK to each packet ... maybe even as vague as the
above is, it will jar a memory for someone else?


On Fri, 4 Jul 2003, Andrew Sullivan wrote:

> Hi all,
>
> We're run into a rather odd problem here, and we're puzzling out
> what's going on.  But while we do, I thought I'd see if anyone else
> has anything similar to report.
>
> This is for 7.2.4 on Solaris 8.
>
> We have a query for which EXPLAIN ANALYSE on a local psql connection
> always returns a time of between about 325 msec and 850 msec
> (depending on other load, whether the result is in cache, &c. -- this
> is an aggregates query involving min() and count()).
>
> If I connect using -h 127.0.0.1, however, I can _sometimes_ get the
> query to take as long as 1200 msec.  The effect is sporadic (of
> course.  If it were totally predictable, the computing gods wouldn't
> be having any fun with me), but it is certainly there off and on.
> (We discovered it because our application is regularly reporting
> times on this query roughly twice as long as I was able to get with
> psql, until I connected via TCP/IP.)
>
> I'll have more to report as we investigate further -- at the moment,
> this has cropped up on a production system, and so we're trying to
> reproduce it in our test environment.  Naturally, we're looking at
> the TCP/IP stack configuration, among other stuff.  In the meantime,
> however, I wondered if anyone knows which bits I ought to be prodding
> at to look for sub-optimal libraries, &c.; or whether anyone else has
> run into similar problems on Solaris or elsewhere.
>
> A
>
> --
> ----
> Andrew Sullivan                         204-4141 Yonge Street
> Liberty RMS                           Toronto, Ontario Canada
> <andrew@libertyrms.info>                              M2P 2A8
>                                          +1 416 646 3304 x110
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>

Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
Systems Administrator @ hub.org
primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org

Re: Strange result: UNIX vs. TCP/IP sockets

From
Vincent van Leeuwen
Date:
http://grotto11.com/blog/slash.html?+1039831658

Summary: IE and IIS cheat at TCP level by leaving out various SYN and ACK
packets, thereby making IE requests from IIS servers blazingly fast, and
making IE requests to non-IIS servers infuriatingly slow.

But since this only relates to making and breaking TCP connections, I don't
think this is relevant for a larger query time. It's probably normal for a TCP
connection to be slightly slower than a unix socket, but I don't think that's
wat Andrew is experiencing.

On 2003-07-04 14:35:18 -0300, The Hermit Hacker wrote:
>
> 'K, this is based on "old information", I don't know if Sun changed it
> 'yet again' ... but, when I was working at the University, one of our IT
> directors gave me a report that deal with something Sun did (god, I'm so
> detailed  here, eh?) to "mimic" how Microsoft broke the TCP/IP protocol
> ... the report was in relation to Web services, and how the change
> actually made Sun/Solaris appear to be slower then Microsoft ...
>
> And Sun made this the 'default' setting, but it was disablable in
> /etc/systems ...
>
> Sorry for being so vague, but if I recall correctly, it had something to
> do with adding an extra ACK to each packet ... maybe even as vague as the
> above is, it will jar a memory for someone else?
>
>
> On Fri, 4 Jul 2003, Andrew Sullivan wrote:
>
> > Hi all,
> >
> > We're run into a rather odd problem here, and we're puzzling out
> > what's going on.  But while we do, I thought I'd see if anyone else
> > has anything similar to report.
> >
> > This is for 7.2.4 on Solaris 8.
> >
> > We have a query for which EXPLAIN ANALYSE on a local psql connection
> > always returns a time of between about 325 msec and 850 msec
> > (depending on other load, whether the result is in cache, &c. -- this
> > is an aggregates query involving min() and count()).
> >
> > If I connect using -h 127.0.0.1, however, I can _sometimes_ get the
> > query to take as long as 1200 msec.  The effect is sporadic (of
> > course.  If it were totally predictable, the computing gods wouldn't
> > be having any fun with me), but it is certainly there off and on.
> > (We discovered it because our application is regularly reporting
> > times on this query roughly twice as long as I was able to get with
> > psql, until I connected via TCP/IP.)
> >
> > I'll have more to report as we investigate further -- at the moment,
> > this has cropped up on a production system, and so we're trying to
> > reproduce it in our test environment.  Naturally, we're looking at
> > the TCP/IP stack configuration, among other stuff.  In the meantime,
> > however, I wondered if anyone knows which bits I ought to be prodding
> > at to look for sub-optimal libraries, &c.; or whether anyone else has
> > run into similar problems on Solaris or elsewhere.
> >
> > A
> >
> > --
> > ----
> > Andrew Sullivan                         204-4141 Yonge Street
> > Liberty RMS                           Toronto, Ontario Canada
> > <andrew@libertyrms.info>                              M2P 2A8
> >                                          +1 416 646 3304 x110
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> >
>
> Marc G. Fournier                   ICQ#7615664               IRC Nick: Scrappy
> Systems Administrator @ hub.org
> primary: scrappy@hub.org           secondary: scrappy@{freebsd|postgresql}.org
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org

Vincent van Leeuwen
Media Design - http://www.mediadesign.nl/

Re: Strange result: UNIX vs. TCP/IP sockets

From
Rod Taylor
Date:
> If I connect using -h 127.0.0.1, however, I can _sometimes_ get the
> query to take as long as 1200 msec.  The effect is sporadic (of

SSL plays havoc with our system when using local loopback for the host
on both Solaris 7 and 8.  It was probably key renegotiation which 7.4
has addressed.


Attachment

Re: Strange result: UNIX vs. TCP/IP sockets

From
Andrew Sullivan
Date:
On Fri, Jul 04, 2003 at 07:55:12PM +0200, Vincent van Leeuwen wrote:

> But since this only relates to making and breaking TCP connections,
> I don't think this is relevant for a larger query time. It's
> probably normal for a TCP connection to be slightly slower than a
> unix socket, but I don't think that's wat Andrew is experiencing.

No, it's not.  And my colleague Sorin Iszlai pointed out to me
something else about it: we're getting different numbers reported by
EXPLAIN ANALYSE itself.  How is that even possible?

If we try it here on a moderately-loaded Sun box, it seems we're able
to reproduce it, as well.

How could it be the transport affects the time for the query as
reported by the back end?

A

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Strange result: UNIX vs. TCP/IP sockets

From
Tom Lane
Date:
Andrew Sullivan <andrew@libertyrms.info> writes:
> How could it be the transport affects the time for the query as
> reported by the back end?

How much data is being sent back by the query?

Do you have SSL enabled?  SSL encryption overhead is nontrivial,
especially if any renegotiations happen.

            regards, tom lane

Re: Strange result: UNIX vs. TCP/IP sockets

From
Andrew Sullivan
Date:
On Fri, Jul 04, 2003 at 05:47:27PM -0400, Tom Lane wrote:
> Andrew Sullivan <andrew@libertyrms.info> writes:
> > How could it be the transport affects the time for the query as
> > reported by the back end?
>
> How much data is being sent back by the query?

In this case, it's an all-aggregate query:

select count(*), min(id) from sometable where owner = int4;

(Yeah, yeah, I know.  I didn't write it.)

But it's the EXPLAIN ANALYSE that's reporting different times
depending on the transport.  That's what I find so strange.

> Do you have SSL enabled?  SSL encryption overhead is nontrivial,
> especially if any renegotiations happen.

No.

--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110


Re: Strange result: UNIX vs. TCP/IP sockets

From
Andrew Sullivan
Date:
Hi all,

You may remember in my last report, I said that it appeared that
TCP/IP connections caused EXPLAIN ANALYSE to return (repeatably but
not consistently) slower times than when connected over UNIX domain
sockets.

This turns out to be false.  We (well, Chris Browne, actually) ran
some tests which demonstrated that the performance problem turned up
over the UNIX socket, as well.  It was just a statistical fluke that
our smaller sample always found the problem on TCP/IP.

Of course, now we have some other work to do, but we can rule out the
transport at least.  Chalk one up for sane results.  If we discover
any more, I'll post it here.

A
--
----
Andrew Sullivan                         204-4141 Yonge Street
Liberty RMS                           Toronto, Ontario Canada
<andrew@libertyrms.info>                              M2P 2A8
                                         +1 416 646 3304 x110