Thread: TCP network cost
Recently I've been working on improving the performance of a system that delivers files stored in postgresql as bytea data. I was surprised at just how much a penalty I find moving from a domain socket connection to a TCP connection, even localhost. For one particular 40MB file (nothing outragous) I see ~ 2.5 sec. to download w/ the domain socket, but ~ 45 sec for a TCP connection (either localhost, name of localhost, or from another machine 5 hops away (on campus - gigabit LAN) Similar numbers for 8.2.3 or 8.3.6 (on Linux/Debian etch + backports) So, why the 20 fold penalty for using TCP? Any clues on how to trace what's up in the network IO stack? Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Feb 17, 2009, at 12:04 AM, Ross J. Reedstrom wrote:
Recently I've been working on improving the performance of a system that
delivers files stored in postgresql as bytea data. I was surprised at
just how much a penalty I find moving from a domain socket connection to
a TCP connection, even localhost. For one particular 40MB file (nothing
outragous) I see ~ 2.5 sec. to download w/ the domain socket, but ~ 45 sec
for a TCP connection (either localhost, name of localhost, or from
another machine 5 hops away (on campus - gigabit LAN) Similar numbers
for 8.2.3 or 8.3.6 (on Linux/Debian etch + backports)
So, why the 20 fold penalty for using TCP? Any clues on how to trace
what's up in the network IO stack?
Try running tests with ttcp to eliminate any PostgreSQL overhead and find out the real bandwidth between the two machines. If its results are also slow, you know the problem is TCP related and not PostgreSQL related.
Cheers,
Rusty
--
Rusty Conover
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
On Tue, 17 Feb 2009, Rusty Conover wrote: > On Feb 17, 2009, at 12:04 AM, Ross J. Reedstrom wrote: > >> Recently I've been working on improving the performance of a system that >> delivers files stored in postgresql as bytea data. I was surprised at >> just how much a penalty I find moving from a domain socket connection to >> a TCP connection, even localhost. For one particular 40MB file (nothing >> outragous) I see ~ 2.5 sec. to download w/ the domain socket, but ~ 45 sec >> for a TCP connection (either localhost, name of localhost, or from >> another machine 5 hops away (on campus - gigabit LAN) Similar numbers >> for 8.2.3 or 8.3.6 (on Linux/Debian etch + backports) >> >> So, why the 20 fold penalty for using TCP? Any clues on how to trace >> what's up in the network IO stack? > > Try running tests with ttcp to eliminate any PostgreSQL overhead and find out > the real bandwidth between the two machines. If its results are also slow, > you know the problem is TCP related and not PostgreSQL related. note that he saw problems even on localhost. in the last couple of months I've seen a lot of discussin on the linux-kernel list about the performance of localhost. unfortunantly those fixes are only in the 2.6.27.x and 2.6.28.x -stable kernels. David Lang
On Mon, Feb 16, 2009 at 11:04 PM, Ross J. Reedstrom <reedstrm@rice.edu> wrote: > Recently I've been working on improving the performance of a system that > delivers files stored in postgresql as bytea data. I was surprised at > just how much a penalty I find moving from a domain socket connection to > a TCP connection, even localhost. For one particular 40MB file (nothing > outragous) I see ~ 2.5 sec. to download w/ the domain socket, but ~ 45 sec > for a TCP connection (either localhost, name of localhost, or from > another machine 5 hops away (on campus - gigabit LAN) Similar numbers > for 8.2.3 or 8.3.6 (on Linux/Debian etch + backports) > > So, why the 20 fold penalty for using TCP? Any clues on how to trace > what's up in the network IO stack? TCP has additional overhead as well as going through the IP stack which for non-tuned Linux kernels is pretty limiting. long story short, there are things in /proc you can use to increase buffers and window sizes which will help with large TCP streams (like a 40MB file for example). There's a lot of documentation on the net for how to tune the Linux IP stack so I won't repeat it here. Now, having your DB box 5 hops away is going to add a lot of latency and any packet loss is going to kill TCP throughput- especially if you increase window sizes. I'd recommend something like "mtr" to map the network traffic (make sure you run it both ways in case you have an asymmetric routing situation) for a long period of time to look for hiccups. -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
On Tue, Feb 17, 2009 at 12:20:02AM -0700, Rusty Conover wrote: > > > Try running tests with ttcp to eliminate any PostgreSQL overhead and > find out the real bandwidth between the two machines. If its results > are also slow, you know the problem is TCP related and not PostgreSQL > related. I did in fact run a simple netcat client/server pair and verified that I can transfer that file on 0.12 sec localhost (or hostname), 0.35 over the net, so TCP stack and network are not to blame. This is purely inside the postgresql code issue, I believe. On Tue, Feb 17, 2009 at 10:13:40AM -0800, Aaron Turner wrote: > > TCP has additional overhead as well as going through the IP stack > which for non-tuned Linux kernels is pretty limiting. Right. Already tuned those so long ago, I failed to mention it. Note the 'bare' transfer times added above. Nothing to write home about (~3Mb/sec) but another order of magnitude faster than the postgresql transfer. > long story short, there are things in /proc you can use to increase > buffers and window sizes which will help with large TCP streams (like > a 40MB file for example). There's a lot of documentation on the net > for how to tune the Linux IP stack so I won't repeat it here. > > Now, having your DB box 5 hops away is going to add a lot of latency > and any packet loss is going to kill TCP throughput- especially if you > increase window sizes. I'd recommend something like "mtr" to map the > network traffic (make sure you run it both ways in case you have an > asymmetric routing situation) for a long period of time to look for > hiccups. The 5-hops in on campus, gigabit all the way, w/ reasonable routing - and not the issue: I see the same times from another machine attaached to the same switch (which is the real use-case, actually.) Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Feb 17, 2009, at 1:04 PM, Ross J. Reedstrom wrote:
On Tue, Feb 17, 2009 at 12:20:02AM -0700, Rusty Conover wrote:Try running tests with ttcp to eliminate any PostgreSQL overhead andfind out the real bandwidth between the two machines. If its resultsare also slow, you know the problem is TCP related and not PostgreSQLrelated.
I did in fact run a simple netcat client/server pair and verified that I
can transfer that file on 0.12 sec localhost (or hostname), 0.35 over the
net, so TCP stack and network are not to blame. This is purely inside
the postgresql code issue, I believe.
What is the client software you're using? libpq?
Rusty
--
Rusty Conover
InfoGears Inc / GearBuyer.com / FootwearBuyer.com
On Tue, Feb 17, 2009 at 01:59:55PM -0700, Rusty Conover wrote: > > On Feb 17, 2009, at 1:04 PM, Ross J. Reedstrom wrote: > > > What is the client software you're using? libpq? > python w/ psycopg (or psycopg2), which wraps libpq. Same results w/ either version. I think I'll try network sniffing to see if I can find where the delays are happening. Ross
On Tue, Feb 17, 2009 at 03:14:55PM -0600, Ross J. Reedstrom wrote: > On Tue, Feb 17, 2009 at 01:59:55PM -0700, Rusty Conover wrote: > > > > What is the client software you're using? libpq? > > > > python w/ psycopg (or psycopg2), which wraps libpq. Same results w/ > either version. > It's not python networking per se's fault: sending the file via a SimpleHTTPServer, adn fetching w/ wget takes on the order of 0.5 sec. as well. > I think I'll try network sniffing to see if I can find where the > delays are happening. I'm no TCP/IP expert, but some packet capturing, and wireshark analysis makes me suspicious about flow control. the 'netcat' transfer shows lots of packets from server -> client, w/ deltaTs of 8 - 200 usec (that's micro-sec), mostly in the 10-20 range. The client -> server 'ack's seem bursty, happening only every 50-100 packets, then a few back-to-back, all taking 10-20 usec. I also see occasional lost packets, retransmits, and TCP Window Updates in this stream. FIN packet is after 8553 packets. For the libpq driven transfer, I see lots of packets flowing both ways. Seems about every other packet from server to client is 'ack'ed. Each of these 'ack's takes 10 uS to send, but seem to cause the transfer to 'reset', since the next packet from the server doesn't arrive for 2-2.5 ms (that's milli-sec!) FIN happens at 63155 packets. No lost packets, no renegotiation, etc. Capturing a localhost transfer shows the same pattern, although now almost every single packet from server -> client takes ~ 3 ms So, TCP experts out there, what's the scoop? Is libpq/psycopg being very conservative, or am I barking up the wrong tree? Are there network socket properities I need to be tweaking? Does framing up for TCP just take that long when the bits are coming from the DB? I assume the unix-domain socket case still uses the full postgresql messaging protocol, but wouldn't need to worry about network-byte-order, etc. All the postgres tunable knobs I can see seem to talk about disk IO, rather than net IO. Can someone point me at some doco about net IO? Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
On Tue, Feb 17, 2009 at 2:30 PM, Ross J. Reedstrom <reedstrm@rice.edu> wrote: > On Tue, Feb 17, 2009 at 03:14:55PM -0600, Ross J. Reedstrom wrote: >> On Tue, Feb 17, 2009 at 01:59:55PM -0700, Rusty Conover wrote: >> > >> > What is the client software you're using? libpq? >> > >> >> python w/ psycopg (or psycopg2), which wraps libpq. Same results w/ >> either version. >> > > It's not python networking per se's fault: sending the file via a > SimpleHTTPServer, adn fetching w/ wget takes on the order of 0.5 sec. > as well. > >> I think I'll try network sniffing to see if I can find where the >> delays are happening. > > I'm no TCP/IP expert, but some packet capturing, and wireshark analysis > makes me suspicious about flow control. the 'netcat' transfer shows lots > of packets from server -> client, w/ deltaTs of 8 - 200 usec (that's > micro-sec), mostly in the 10-20 range. The client -> server 'ack's seem > bursty, happening only every 50-100 packets, then a few back-to-back, > all taking 10-20 usec. > > I also see occasional lost packets, retransmits, and TCP Window Updates > in this stream. FIN packet is after 8553 packets. > > For the libpq driven transfer, I see lots of packets flowing both ways. > Seems about every other packet from server to client is 'ack'ed. Each of > these 'ack's takes 10 uS to send, but seem to cause the transfer to > 'reset', since the next packet from the server doesn't arrive for 2-2.5 > ms (that's milli-sec!) FIN happens at 63155 packets. > > No lost packets, no renegotiation, etc. > > Capturing a localhost transfer shows the same pattern, although now > almost every single packet from server -> client takes ~ 3 ms > > So, TCP experts out there, what's the scoop? Is libpq/psycopg being very > conservative, or am I barking up the wrong tree? Are there network > socket properities I need to be tweaking? > > Does framing up for TCP just take that long when the bits are coming > from the DB? I assume the unix-domain socket case still uses the full > postgresql messaging protocol, but wouldn't need to worry about > network-byte-order, etc. > > All the postgres tunable knobs I can see seem to talk about disk IO, > rather than net IO. Can someone point me at some doco about net IO? What's the negotiated window size? That's the amount of data allowed "in flight" without an ack. The fact that acks happen regularly shouldn't be a problem, but if the sender is stalling because it has a small window, waiting for an ack to be received that could cause a large slow down. Do the ack's include any data? If so it's indicative of the PG networking protocol overhead- probably not much you can do about that. Without looking at a pcap myself, I'm not sure I can help out any more. -- Aaron Turner http://synfin.net/ http://tcpreplay.synfin.net/ - Pcap editing and replay tools for Unix & Windows Those who would give up essential Liberty, to purchase a little temporary Safety, deserve neither Liberty nor Safety. -- Benjamin Franklin
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > On Tue, Feb 17, 2009 at 12:20:02AM -0700, Rusty Conover wrote: >> >> Try running tests with ttcp to eliminate any PostgreSQL overhead and >> find out the real bandwidth between the two machines. If its results >> are also slow, you know the problem is TCP related and not PostgreSQL >> related. > > I did in fact run a simple netcat client/server pair and verified that I > can transfer that file on 0.12 sec localhost (or hostname), 0.35 over the > net, so TCP stack and network are not to blame. This is purely inside > the postgresql code issue, I believe. There's not much Postgres can do to mess up TCP/IP. The only things that come to mind are a) making lots of short-lived connections and b) getting caught by Nagle when doing lots of short operations and blocking waiting on results. What libpq (or other interface) operations are you doing exactly? [also, your Mail-Followup-To has a bogus email address in it. Please don't do that] -- Gregory Stark EnterpriseDB http://www.enterprisedb.com Ask me about EnterpriseDB's On-Demand Production Tuning
> python w/ psycopg (or psycopg2), which wraps libpq. Same results w/ > either version. I've seen psycopg2 saturate a 100 Mbps ethernet connection (direct connection with crossover cable) between postgres server and client during a benchmark... I had to change the benchmark to not retrieve a large TEXT column to remove this bottleneck... this was last year so versions are probably different, but I don't think this matters a lot... > Note the 'bare' transfer times added above. Nothing to write home about > (~3Mb/sec) but another order of magnitude faster than the postgresql > transfer. You should test with sending a large (>100 MB) amount of data through Netcat. This should give you your maximum wire speed. Use /dev/null as the test file, and use "pv" (pipe viewer) to measure throughput : box 1 : pv < /dev/zero | nc -lp 12345 box 2 : nc (ip) 12345 >/dev/null On gigabit lan you should get 100 MB/s, on 100BaseT about 10 MB/s. If you dont get that, there is a problem somewhere (bad cable, bad NIC, slow switch/router, etc). Monitor CPU during this test (vmstat). Usage should be low.
[note: sending a message that's been sitting in 'drafts' since last week] Summary: C client and large-object API python both send bits in reasonable time, but I suspect there's still room for improvement in libpq over TCP: I'm suspicious of the 6x difference. Detailed analysis will probably find it's all down to memory allocation and extra copying of bits around (client side) Ross On Wed, Feb 18, 2009 at 01:44:23PM +0000, Gregory Stark wrote: > > There's not much Postgres can do to mess up TCP/IP. The only things that come > to mind are a) making lots of short-lived connections and b) getting caught by > Nagle when doing lots of short operations and blocking waiting on results. The hint re: Nagle sent to off hunting. It looks like libpq _should_ be setting NODELAY on both sides of the socket. However, tcptrace output does show (what I understand to be) the stereotypical every-other-packet-acked stairstep of a delayed-ack/Nagle interaction. (as described here: http://www.stuartcheshire.org/papers/NagleDelayedAck/ ) Walking through the libpq code, though, it sets NODELAY, so Nagle should be out of the picture. This may be a red herring, though. See below. > What libpq (or other interface) operations are you doing exactly? I'm using psycopg from python. My cut down test case is: con=psycopg.connect('dbname=mydb user=myuser port=5433 host=myhost') cur=con.cursor() start=DateTime.now() cur.execute("""select file from files where fileid=1""") data = cur.fetchone()[0] end=DateTime.now() f=open('/dev/null','w') f.write(data) f.close() cur.close() print "tcp socket: %s" % str(end - start) I've since written a minimal C app, and it's doing much better, down to about 7 sec for a local TCP connection (either localhost or hostname) So, I get to blame the psycopg wrapper for ~ 30 sec of delay. I'm suspicous of memory allocation, myself. The tcp traces (tcpdump + tcptrace + xplot are cool set of tools, btw) indicate that the backend's taking ~ 0.35 sec to process the query and start sending bits, and using a domain socket w/ that code gets the file in 1.3 - 1.4 sec, so I'm still seeing a 6-fold slowdown for going via TCP (6 sec. vs. 1 sec.) Sending the raw file via apache (localhost) takes ~200 ms. Moving to a large-object based implementation would seem to confirm that: psycopg2 (snapshot of svn head) manages to pull a lo version of the file in times equivalent to the C client (7 sec local) I'll probably move the system to use that, since there's really almost no use-case for access to the insides of these files from SQL. > [also, your Mail-Followup-To has a bogus email address in it. Please don't do > that] Hmm, not on purpose. I'll take a look.
On Thu, Feb 19, 2009 at 02:09:04PM +0100, PFC wrote: > > >python w/ psycopg (or psycopg2), which wraps libpq. Same results w/ > >either version. > > I've seen psycopg2 saturate a 100 Mbps ethernet connection (direct > connection with crossover cable) between postgres server and client during > a benchmark... I had to change the benchmark to not retrieve a large TEXT > column to remove this bottleneck... this was last year so versions are > probably different, but I don't think this matters a lot... Here's the core of the problem: I in fact need to transfer exactly that: a large single field (bytea in my case). I suspect psycopg[12] is having issues w/ memory allocation, but that's just an unsupported gut feeling. The final upshot is that I need to restructure my config to use the large-object API (and hence a snapshot of psycopg2) to get decent throughput. > You should test with sending a large (>100 MB) amount of data > through Netcat. This should give you your maximum wire speed. Use > /dev/null as the test file, and use "pv" (pipe viewer) to measure > throughput : > > box 1 : pv < /dev/zero | nc -lp 12345 > box 2 : nc (ip) 12345 >/dev/null > > On gigabit lan you should get 100 MB/s, on 100BaseT about 10 MB/s. 112 MB/s, and 233 MB/s for localhost. Thanks for the pointer to pv: looks like a nice tool. Investigating this problem has lead me to a number of nice 'old school' tools: the other is tcptrace and xplot.org. I've been hand reading tcpdump output, or clicking around in ethereal/wireshark. I like tcptrace's approach. Ross -- Ross Reedstrom, Ph.D. reedstrm@rice.edu Systems Engineer & Admin, Research Scientist phone: 713-348-6166 The Connexions Project http://cnx.org fax: 713-348-3665 Rice University MS-375, Houston, TX 77005 GPG Key fingerprint = F023 82C8 9B0E 2CC6 0D8E F888 D3AE 810E 88F0 BEDE
"Ross J. Reedstrom" <reedstrm@rice.edu> writes: > Summary: C client and large-object API python both send bits in > reasonable time, but I suspect there's still room for improvement in > libpq over TCP: I'm suspicious of the 6x difference. Detailed analysis > will probably find it's all down to memory allocation and extra copying > of bits around (client side) I wonder if the backend isn't contributing to the problem too. It chops its sends up into 8K units, which doesn't seem to create huge overhead in my environment but maybe it does in yours. It'd be interesting to see what results you get from the attached quick-and-dirty patch (against HEAD, but it should apply back to at least 8.1). regards, tom lane Index: src/backend/libpq/pqcomm.c =================================================================== RCS file: /cvsroot/pgsql/src/backend/libpq/pqcomm.c,v retrieving revision 1.199 diff -c -r1.199 pqcomm.c *** src/backend/libpq/pqcomm.c 1 Jan 2009 17:23:42 -0000 1.199 --- src/backend/libpq/pqcomm.c 23 Feb 2009 21:09:45 -0000 *************** *** 124,129 **** --- 124,130 ---- static void pq_close(int code, Datum arg); static int internal_putbytes(const char *s, size_t len); static int internal_flush(void); + static int internal_send(const char *bufptr, size_t len); #ifdef HAVE_UNIX_SOCKETS static int Lock_AF_UNIX(unsigned short portNumber, char *unixSocketName); *************** *** 1041,1046 **** --- 1042,1056 ---- if (PqSendPointer >= PQ_BUFFER_SIZE) if (internal_flush()) return EOF; + + /* + * If buffer is empty and we'd fill it, just push the data immediately + * rather than copying it into PqSendBuffer. + */ + if (PqSendPointer == 0 && len >= PQ_BUFFER_SIZE) + return internal_send(s, len); + + /* Else put (some of) the data into the buffer */ amount = PQ_BUFFER_SIZE - PqSendPointer; if (amount > len) amount = len; *************** *** 1075,1090 **** static int internal_flush(void) { static int last_reported_send_errno = 0; ! char *bufptr = PqSendBuffer; ! char *bufend = PqSendBuffer + PqSendPointer; while (bufptr < bufend) { int r; ! r = secure_write(MyProcPort, bufptr, bufend - bufptr); if (r <= 0) { --- 1085,1115 ---- static int internal_flush(void) { + int r; + + r = internal_send(PqSendBuffer, PqSendPointer); + + /* + * On error, we drop the buffered data anyway so that processing can + * continue, even though we'll probably quit soon. + */ + PqSendPointer = 0; + + return r; + } + + static int + internal_send(const char *bufptr, size_t len) + { static int last_reported_send_errno = 0; ! const char *bufend = bufptr + len; while (bufptr < bufend) { int r; ! r = secure_write(MyProcPort, (void *) bufptr, bufend - bufptr); if (r <= 0) { *************** *** 1108,1118 **** errmsg("could not send data to client: %m"))); } - /* - * We drop the buffered data anyway so that processing can - * continue, even though we'll probably quit soon. - */ - PqSendPointer = 0; return EOF; } --- 1133,1138 ---- *************** *** 1120,1126 **** bufptr += r; } - PqSendPointer = 0; return 0; } --- 1140,1145 ----
Excellent. I'll take a look at this and report back here. Ross On Mon, Feb 23, 2009 at 04:17:00PM -0500, Tom Lane wrote: > "Ross J. Reedstrom" <reedstrm@rice.edu> writes: > > Summary: C client and large-object API python both send bits in > > reasonable time, but I suspect there's still room for improvement in > > libpq over TCP: I'm suspicious of the 6x difference. Detailed analysis > > will probably find it's all down to memory allocation and extra copying > > of bits around (client side) > > I wonder if the backend isn't contributing to the problem too. It chops > its sends up into 8K units, which doesn't seem to create huge overhead > in my environment but maybe it does in yours. It'd be interesting to see > what results you get from the attached quick-and-dirty patch (against > HEAD, but it should apply back to at least 8.1). > > regards, tom lane
Ross J. Reedstrom escribió: > Excellent. I'll take a look at this and report back here. > > Ross > > > On Mon, Feb 23, 2009 at 04:17:00PM -0500, Tom Lane wrote: >> "Ross J. Reedstrom" <reedstrm@rice.edu> writes: >>> Summary: C client and large-object API python both send bits in >>> reasonable time, but I suspect there's still room for improvement in >>> libpq over TCP: I'm suspicious of the 6x difference. Detailed analysis >>> will probably find it's all down to memory allocation and extra copying >>> of bits around (client side) >> I wonder if the backend isn't contributing to the problem too. It chops >> its sends up into 8K units, which doesn't seem to create huge overhead >> in my environment but maybe it does in yours. It'd be interesting to see >> what results you get from the attached quick-and-dirty patch (against >> HEAD, but it should apply back to at least 8.1). >> >> regards, tom lane > Hello, i have been having a problem like this in debian machines and i have discovered that (almost in my case), the problem only arises when i am using "ssl = true" in postgresql.conf although i am using clear tcp connections to localhost to perform my query, if i disable ssl in configuration my localhost query times goes from 4200ms to 110ms, the same parameter does not have this effect in my Arch Linux development machine, so maybe you should see how this parameter affect your setup Ross. My original post to general list is in http://archives.postgresql.org/pgsql-general/2009-02/msg01297.php for more information. Regards, Miguel Angel.
Linos <info@linos.es> writes: > Hello, i have been having a problem like this in debian machines and i have > discovered that (almost in my case), the problem only arises when i am using > "ssl = true" in postgresql.conf although i am using clear tcp connections to > localhost to perform my query, if i disable ssl in configuration my localhost > query times goes from 4200ms to 110ms, Does that number include connection startup overhead? (If it doesn't, it'd be pretty strange.) Ross's problem is not about startup overhead, unless I've misread him completely. regards, tom lane
Tom Lane escribió: > Linos <info@linos.es> writes: >> Hello, i have been having a problem like this in debian machines and i have >> discovered that (almost in my case), the problem only arises when i am using >> "ssl = true" in postgresql.conf although i am using clear tcp connections to >> localhost to perform my query, if i disable ssl in configuration my localhost >> query times goes from 4200ms to 110ms, > > Does that number include connection startup overhead? (If it doesn't, > it'd be pretty strange.) Ross's problem is not about startup overhead, > unless I've misread him completely. > > regards, tom lane This difference it is from the runtime of the query, i get this with \timing parameter in psql, it is from a table that have 300 small png (one for every row in table) on a bytea column but the problem grows with any large result anyway, i have attacted pcap files in general list but the differences are like this: ssl enabled: `psql -d database`: SELECT * FROM TABLE (110 ms with \timing) `psql -d database -h localhost`: SELECT * FROM TABLE (4200 ms with \timing) ssl disabled: `psql -d database`: SELECT * FROM TABLE (110 ms with \timing) `psql -d database -h localhost`: SELECT * FROM TABLE (120 ~ 130 ms with \timing) Anyway i dont know if this apply to Ross problem but reading his post and after see that he is using debian and have problem with speed on tcp localhost i suppose that maybe have the same problem. Regards, Miguel Angel
Linos <info@linos.es> writes: > Tom Lane escribi�: >> Does that number include connection startup overhead? (If it doesn't, >> it'd be pretty strange.) > This difference it is from the runtime of the query, i get this with \timing > parameter in psql, That's just weird --- ssl off should be ssl off no matter which knob you use to turn it off. Are you sure it's really off in the slow connections? regards, tom lane
Tom Lane escribió: > Linos <info@linos.es> writes: >> Tom Lane escribió: >>> Does that number include connection startup overhead? (If it doesn't, >>> it'd be pretty strange.) > >> This difference it is from the runtime of the query, i get this with \timing >> parameter in psql, > > That's just weird --- ssl off should be ssl off no matter which knob you > use to turn it off. Are you sure it's really off in the slow connections? > > regards, tom lane Maybe i am missing something, i use the same command to connect to it from localhost "psql -d database -h localhost" and in the pcap files i have captured the protocol it is clear (with "ssl = false" or "ssl = true" either), but in the debian machine with "ssl = true" in postgresql.conf you can see in the pcap file big time jumps between data packets, psql commandline enables automatically ssl if the server supports it? but if this is the case i should see encrypted traffic in pcap files, and the problem should be the same in Arch Linux, no? If you want that i make a local test explain me the steps and i will try here. Regards, Miguel Angel.
Linos <info@linos.es> writes: > Tom Lane escribi�: >> That's just weird --- ssl off should be ssl off no matter which knob you >> use to turn it off. Are you sure it's really off in the slow connections? > Maybe i am missing something, i use the same command to connect to it > from localhost "psql -d database -h localhost" and in the pcap files i > have captured the protocol it is clear (with "ssl = false" or "ssl = > true" either), but in the debian machine with "ssl = true" in > postgresql.conf you can see in the pcap file big time jumps between > data packets, psql commandline enables automatically ssl if the server > supports it? Yeah, the default behavior is to do SSL if supported; see PGSSLMODE. Non-TCP connections never do SSL, though. One possibility to check is that one of the two distros has altered the default value of PGSSLMODE. > but if this is the case i should see encrypted traffic in > pcap files, I would suppose so, so there's something that doesn't quite add up here. regards, tom lane
Tom Lane wrote: > Linos <info@linos.es> writes: >> Tom Lane escribió: >>> That's just weird --- ssl off should be ssl off no matter which knob you >>> use to turn it off. Are you sure it's really off in the slow connections? > >> Maybe i am missing something, i use the same command to connect to it >> from localhost "psql -d database -h localhost" and in the pcap files i >> have captured the protocol it is clear (with "ssl = false" or "ssl = >> true" either), but in the debian machine with "ssl = true" in >> postgresql.conf you can see in the pcap file big time jumps between >> data packets, psql commandline enables automatically ssl if the server >> supports it? > > Yeah, the default behavior is to do SSL if supported; see PGSSLMODE. > Non-TCP connections never do SSL, though. One possibility to check > is that one of the two distros has altered the default value of > PGSSLMODE. IIRC, debian ships with a default certificate for the postgres installation, so it can actually *use* SSL by default. I don't know if other distros do that - I think most require you to actually create a certificate yourself. //Magnus
Magnus Hagander escribió: > Tom Lane wrote: >> Linos <info@linos.es> writes: >>> Tom Lane escribió: >>>> That's just weird --- ssl off should be ssl off no matter which knob you >>>> use to turn it off. Are you sure it's really off in the slow connections? >>> Maybe i am missing something, i use the same command to connect to it >>> from localhost "psql -d database -h localhost" and in the pcap files i >>> have captured the protocol it is clear (with "ssl = false" or "ssl = >>> true" either), but in the debian machine with "ssl = true" in >>> postgresql.conf you can see in the pcap file big time jumps between >>> data packets, psql commandline enables automatically ssl if the server >>> supports it? >> Yeah, the default behavior is to do SSL if supported; see PGSSLMODE. >> Non-TCP connections never do SSL, though. One possibility to check >> is that one of the two distros has altered the default value of >> PGSSLMODE. > > IIRC, debian ships with a default certificate for the postgres > installation, so it can actually *use* SSL by default. I don't know if > other distros do that - I think most require you to actually create a > certificate yourself. > > //Magnus Yeah i have tested with PGSSLMODE environment and it makes the difference when it is activated, debian ships with a cert that makes it enabled by default but Arch Linux no, i get with wireshark in the data packets from postgresql "unreassembled packet" so i thought that was the same but obviously one it is using ssl and the other not, and before now i have not noticed but psql gives me the hint that it is connect by ssl with the line "conexión SSL (cifrado: DHE-RSA-AES256-SHA, bits: 256)" after connect, i did not know that ssl activated would have this speed penalty, goes from 110 ms to 4200ms, Thanks Tom and Magnus for the help. Regards, Miguel Angel.