Thread: long transfer time for binary data
I noticed transferring a large object or bytea data between client and server takes a long time. For example: An image with a real size of 11 MB could be read on server side (explain analyze) in 81ms. Fine. But on client side the result was completed after 6.7 seconds without ssl compression and 4.5 seconds with ssl compression (both via 100MBit ethernet). SSL compression seems to be not a good idea anymore, since this had become a security risk. Its still possible with pgadmin, but afaik not with java/jdbc . Are there any other solutions available to display my images in my client application more quickly? Or are there planned improvements to postgresql (transferring the real binary data)? Best regards Johannes
Attachment
On 01/20/2016 03:29 PM, Johannes wrote: > I noticed transferring a large object or bytea data between client and > server takes a long time. > For example: An image with a real size of 11 MB could be read on server > side (explain analyze) in 81ms. Fine. > > But on client side the result was completed after 6.7 seconds without > ssl compression and 4.5 seconds with ssl compression (both via 100MBit > ethernet). > > SSL compression seems to be not a good idea anymore, since this had > become a security risk. Its still possible with pgadmin, but afaik not > with java/jdbc . > > Are there any other solutions available to display my images in my > client application more quickly? Or are there planned improvements to > postgresql (transferring the real binary data)? > > Best regards > Johannes > Yep, that's slow. The ssl compression is very odd if the image is jpeg'ish and already compressed. If its a bitmap or uncompressedtif then its not so surprising. A few tests you could try: 1) copy the same 11 meg file from server to client via regular file copy and time it. At 100 Mbit/s it should take abouta second. If it takes 6 you have network problems, not PG problems. 2) try it via psql command line (or at least something other than java), to see if its java thats the problem. 3) watch wireshark/tcpdump, maybe you'll see something glaring that'll point you in the right direction. -Andy PS: I've never heard that ssl compression was a security risk, got links/proof?
On Wed, 20 Jan 2016 22:29:07 +0100, Johannes <jotpe@posteo.de> wrote: >I noticed transferring a large object or bytea data between client and >server takes a long time. >For example: An image with a real size of 11 MB could be read on server >side (explain analyze) in 81ms. Fine. > >But on client side the result was completed after 6.7 seconds without >ssl compression and 4.5 seconds with ssl compression (both via 100MBit >ethernet). I think at ~4 seconds you're actually running pretty close to the limit of what is possible. Remember that, even assuming the execution plan is accurate and also is representative of an average request, your 81ms image fetch may be arbitrarily delayed due to server load. Even a quiet network has overhead: layers of packet headers, TCP checksums (CRC) and ack packets, etc. ... it's quite hard to sustain more than 95% of the theoretical bandwidth even on a full duplex private subnet. So figure 11MB of data will take ~1.2 seconds under _optimal_ conditions. Any competing traffic will just slow it down. Also note that if the image data was stored already compressed, additionally trying to use connection level compression may expand the data and increase the transmission time, as well as adding processing overhead at both ends. And then the client has to convert the image from the storage format into a display compatible bitmap and get it onto the screen. >Are there any other solutions available to display my images in my >client application more quickly? Or are there planned improvements to >postgresql (transferring the real binary data)? You don't say what is the client platform/software or what format are the images. 11MB is (equivalent to) 1500+ pixels square depending on pixel/color depth. That's a relatively large image - even from a local file, rendering that would take a couple of seconds. Add a couple more seconds for request turn-around and there is your time gone. BMP and GIF repectively are the formats that are quickest to render. If your stored images are in different format, it might be worth converting them to one of these. GIF and _some_ BMP formats support direct compression of the pixel data. If you find you must store the pixel data uncompressed, you can always gzip the resulting image file and store that. Then don't use connection level compression. With images stored already compressed the transmitted size is minimized, and you will only ever decompress (on the client) data in the critical path to the display. Hope this helps, George
Here are some transferring measurements (from server to client) with the same file. scp +ssl -compression 1.3 sec +ssl +compression 4.6 sec pgadmin select lo_get(12345); -ssl 3.4 sec +ssl +compression 5.5 sec +ssl -compression 4.5 sec psql select lo_get(12345); +ssl -compression 6.0 sec -ssl 4.4 sec java/jdbc only while(in.read(buf,0,len)) +ssl -compression 6.0 sec -ssl 3.0 sec (+ 1.8 sec for new Image()) Here is a link for insecure ssl compression: https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Compression Best Regargs Johannes Am 21.01.2016 um 03:33 schrieb Andy Colson: > On 01/20/2016 03:29 PM, Johannes wrote: >> I noticed transferring a large object or bytea data between client and >> server takes a long time. >> For example: An image with a real size of 11 MB could be read on server >> side (explain analyze) in 81ms. Fine. >> >> But on client side the result was completed after 6.7 seconds without >> ssl compression and 4.5 seconds with ssl compression (both via 100MBit >> ethernet). >> >> SSL compression seems to be not a good idea anymore, since this had >> become a security risk. Its still possible with pgadmin, but afaik not >> with java/jdbc . >> >> Are there any other solutions available to display my images in my >> client application more quickly? Or are there planned improvements to >> postgresql (transferring the real binary data)? >> >> Best regards >> Johannes >> > > Yep, that's slow. The ssl compression is very odd if the image is > jpeg'ish and already compressed. If its a bitmap or uncompressed tif > then its not so surprising. > > A few tests you could try: > > 1) copy the same 11 meg file from server to client via regular file copy > and time it. At 100 Mbit/s it should take about a second. If it takes > 6 you have network problems, not PG problems. > > 2) try it via psql command line (or at least something other than java), > to see if its java thats the problem. > > 3) watch wireshark/tcpdump, maybe you'll see something glaring that'll > point you in the right direction. > > -Andy > > PS: I've never heard that ssl compression was a security risk, got > links/proof? > >
Attachment
> Am 21.01.2016 um 03:33 schrieb Andy Colson: >> On 01/20/2016 03:29 PM, Johannes wrote: >>> I noticed transferring a large object or bytea data between client and >>> server takes a long time. >>> For example: An image with a real size of 11 MB could be read on server >>> side (explain analyze) in 81ms. Fine. >>> >>> But on client side the result was completed after 6.7 seconds without >>> ssl compression and 4.5 seconds with ssl compression (both via 100MBit >>> ethernet). >>> >>> SSL compression seems to be not a good idea anymore, since this had >>> become a security risk. Its still possible with pgadmin, but afaik not >>> with java/jdbc . >>> >>> Are there any other solutions available to display my images in my >>> client application more quickly? Or are there planned improvements to >>> postgresql (transferring the real binary data)? >>> >>> Best regards >>> Johannes >>> >> >> Yep, that's slow. The ssl compression is very odd if the image is >> jpeg'ish and already compressed. If its a bitmap or uncompressed tif >> then its not so surprising. >> >> A few tests you could try: >> >> 1) copy the same 11 meg file from server to client via regular file copy >> and time it. At 100 Mbit/s it should take about a second. If it takes >> 6 you have network problems, not PG problems. >> >> 2) try it via psql command line (or at least something other than java), >> to see if its java thats the problem. >> >> 3) watch wireshark/tcpdump, maybe you'll see something glaring that'll >> point you in the right direction. >> >> -Andy >> >> PS: I've never heard that ssl compression was a security risk, got >> links/proof? >> >> > On 01/21/2016 03:59 PM, Johannes wrote: > Here are some transferring measurements (from server to client) with the > same file. > > scp > +ssl -compression 1.3 sec > +ssl +compression 4.6 sec > > pgadmin > select lo_get(12345); > -ssl 3.4 sec > +ssl +compression 5.5 sec > +ssl -compression 4.5 sec > > psql > select lo_get(12345); > +ssl -compression 6.0 sec > -ssl 4.4 sec > > java/jdbc > only while(in.read(buf,0,len)) > +ssl -compression 6.0 sec > -ssl 3.0 sec (+ 1.8 sec for new Image()) > > Here is a link for insecure ssl compression: > https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Compression > > Best Regargs > Johannes > Please don't top post. Thanks for the link on ssl compression, I'd not seen that before. I'm going to have to read up. Your numbers ... look ... odd. scp +compression is slower? pgadmin -ssl and psql -ssl and java -ssl are all differentspeeds? ssl always adds extra time? Maybe a high latency thing? If you ping the other box what sort of time'sdo you get? Maybe the extra ssl handshakes up front + high latency is causing it. You could try a shared/cached sshconnection to avoid the overhead. Best case though, your file copy was 1.3 seconds and with PG it was 3 seconds. Even getting ssl fixed, you probably wontget faster than 3 seconds. Is that enough? -Andy
Andy Colson wrote: > On 01/21/2016 03:59 PM, Johannes wrote: >> Here are some transferring measurements (from server to client) with the >> same file. >> >> scp >> +ssl -compression 1.3 sec >> +ssl +compression 4.6 sec >> >> pgadmin >> select lo_get(12345); >> -ssl 3.4 sec >> +ssl +compression 5.5 sec >> +ssl -compression 4.5 sec >> >> psql >> select lo_get(12345); >> +ssl -compression 6.0 sec >> -ssl 4.4 sec >> >> java/jdbc >> only while(in.read(buf,0,len)) >> +ssl -compression 6.0 sec >> -ssl 3.0 sec (+ 1.8 sec for new Image()) >> >> Here is a link for insecure ssl compression: >> https://en.wikipedia.org/wiki/Comparison_of_TLS_implementations#Compression > Thanks for the link on ssl compression, I'd not seen that before. I'm going to have to read up. > > Your numbers ... look ... odd. scp +compression is slower? pgadmin -ssl and psql -ssl and java -ssl > are all different speeds? ssl always adds extra time? Maybe a high latency thing? If you ping the > other box what sort of time's do you get? Maybe the extra ssl handshakes up front + high latency is > causing it. You could try a shared/cached ssh connection to avoid the overhead. Johannes' measurements make sense to me. In situations where network bandwith is not the bottleneck, you will be slower with SSL compression than without. The time you lose is the time the CPU needs to compress and decompress the data. I observed that behaviour in one of our systems that transferred byteas with images over SSL, which led me to introduce the "sslcompression" connection parameter into PostgreSQL. The variation of times between different clients could be randon (are these differences conststent across repeated runs?), could be caused by different SSL implementations in Java and OpenSSL or by additional processing in pgAdmin III. Yours, Laurenz Albe
Am 21.01.2016 um 08:44 schrieb George Neuner: > On Wed, 20 Jan 2016 22:29:07 +0100, Johannes <jotpe@posteo.de> wrote: > >> I noticed transferring a large object or bytea data between client and >> server takes a long time. >> For example: An image with a real size of 11 MB could be read on server >> side (explain analyze) in 81ms. Fine. >> >> But on client side the result was completed after 6.7 seconds without >> ssl compression and 4.5 seconds with ssl compression (both via 100MBit >> ethernet). > > I think at ~4 seconds you're actually running pretty close to the > limit of what is possible. > > Remember that, even assuming the execution plan is accurate and also > is representative of an average request, your 81ms image fetch may be > arbitrarily delayed due to server load. > > Even a quiet network has overhead: layers of packet headers, TCP > checksums (CRC) and ack packets, etc. ... it's quite hard to sustain > more than 95% of the theoretical bandwidth even on a full duplex > private subnet. So figure 11MB of data will take ~1.2 seconds under > _optimal_ conditions. Any competing traffic will just slow it down. > > Also note that if the image data was stored already compressed, > additionally trying to use connection level compression may expand the > data and increase the transmission time, as well as adding processing > overhead at both ends. > > And then the client has to convert the image from the storage format > into a display compatible bitmap and get it onto the screen. > > >> Are there any other solutions available to display my images in my >> client application more quickly? Or are there planned improvements to >> postgresql (transferring the real binary data)? > > You don't say what is the client platform/software or what format are > the images. 11MB is (equivalent to) 1500+ pixels square depending on > pixel/color depth. That's a relatively large image - even from a > local file, rendering that would take a couple of seconds. Add a > couple more seconds for request turn-around and there is your time > gone. > > BMP and GIF repectively are the formats that are quickest to render. > If your stored images are in different format, it might be worth > converting them to one of these. > > GIF and _some_ BMP formats support direct compression of the pixel > data. If you find you must store the pixel data uncompressed, you can > always gzip the resulting image file and store that. > > Then don't use connection level compression. With images stored > already compressed the transmitted size is minimized, and you will > only ever decompress (on the client) data in the critical path to the > display. > > > Hope this helps, > George Thanks for explanation. Im writing a client software in java/jdbc. Most images are in jpeg format. Some have high quality, most medium. Rendering this 11MB Image in eog (Eye Of Gome) takes 0.5 sec, in GIMP it is very fast. In Java the object createion takes nearly all time, the drawing is done very quickly. The size of the binary string representation of this image is 22MB. I guess there are not other special transfer mechanism for binary data than plain text via sql, or? Best regards Johannes
Attachment
Johannes wrote: > psql > select lo_get(12345); > +ssl -compression 6.0 sec > -ssl 4.4 sec psql requests results in text format so that SELECT does not really test the transfer of binary data. With bytea_output to 'hex', contents are inflated by 2x. Can you tell how fast this goes for you, as a comparison point: \lo_export 12345 /dev/null ? Many client interfaces use the text format, but you want to avoid that if possible with large bytea contents. In addition to putting twice the data on the wire, the server has to convert the bytes to hex and the client has to do the reverse operation, a complete waste of CPU time on both ends. At the SQL level, the DECLARE name BINARY CURSOR FOR query can help to force results in binary, but as the doc says: http://www.postgresql.org/docs/current/static/sql-declare.html "Binary cursors should be used carefully. Many applications, including psql, are not prepared to handle binary cursors and expect data to come back in the text format." Personally I don't have experience with JDBC, but looking at the doc: https://jdbc.postgresql.org/documentation/94/binary-data.html I see this: "To use the Large Object functionality you can use either the LargeObject class provided by the PostgreSQL™ JDBC driver, or by using the getBLOB() and setBLOB() methods." If the data lives on the server as large objects, I would think that this LargeObject class has the best potential for retrieving them efficiently, as opposed to "SELECT lo_get(oid)" which looks like it could trigger the undesirable round-trip to the text format. You may want to test that or bring it up as a question to JDBC folks. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
On Fri, 22 Jan 2016 22:05:24 +0100, Johannes <jotpe@posteo.de> wrote: >Thanks for explanation. Im writing a client software in java/jdbc. Most >images are in jpeg format. Some have high quality, most medium. Unfortunately I'm not terribly conversant in Java ... I can read/understand it, but I rarely write any. >Rendering this 11MB Image in eog (Eye Of Gome) takes 0.5 sec, in GIMP it >is very fast. I'm not familiar with EoG, but GIMP uses both SIMD code to process the image and OpenGL (which in turn uses your GPU if possible) to draw directly to the video buffer. That makes a big difference vs drawing to a generic GUI window context. >In Java the object createion takes nearly all time, the drawing is done >very quickly. I have zero experience with jdbc - but if it's anything like ODBC, then it may be reading the images inefficiently (at least by default). In ODBC connections have a settable MTU size - BLOBs that are bigger than 1 MTU get transferred in pieces. That is fine if you don't know the size of the object beforehand, but it can be much slower than necessary if you do (or can approximate it). ODBC's default MTU is quite small by today's multimedia data standards. If it's something other than this - e.g., you need to process the image faster from Java - then I'm afraid you'll have to look to other's for help. >The size of the binary string representation of this image is 22MB. I >guess there are not other special transfer mechanism for binary data >than plain text via sql, or? You said originally it was a bytea column? If so, the BLOB shouldn't be any longer than the original image file. It would be different if you stored the image in a text column, e.g., as escaped ASCII or as ROT64 encoded, etc. Characters in Java are 16-bit values. If you convert the BLOB into a printable string [or your debugger does to view it], that string will be twice as long as the binary. Hope this helps, George
Am 23.01.2016 um 01:25 schrieb Daniel Verite: > Johannes wrote: > >> psql >> select lo_get(12345); >> +ssl -compression 6.0 sec >> -ssl 4.4 sec > > psql requests results in text format so that SELECT does not > really test the transfer of binary data. > With bytea_output to 'hex', contents are inflated by 2x. > > Can you tell how fast this goes for you, as a comparison point: > \lo_export 12345 /dev/null > ? > > Many client interfaces use the text format, but you want to > avoid that if possible with large bytea contents. > In addition to putting twice the data on the wire, the server has to > convert the bytes to hex and the client has to do the reverse operation, > a complete waste of CPU time on both ends. > > At the SQL level, the DECLARE name BINARY CURSOR FOR query > can help to force results in binary, but as the doc says: > > http://www.postgresql.org/docs/current/static/sql-declare.html > > "Binary cursors should be used carefully. Many applications, including > psql, are not prepared to handle binary cursors and expect data to > come back in the text format." > > Personally I don't have experience with JDBC, but looking at the doc: > https://jdbc.postgresql.org/documentation/94/binary-data.html > > I see this: > > "To use the Large Object functionality you can use either the > LargeObject class provided by the PostgreSQL™ JDBC driver, or by using > the getBLOB() and setBLOB() methods." > > If the data lives on the server as large objects, I would think that > this LargeObject class has the best potential for retrieving them > efficiently, as opposed to "SELECT lo_get(oid)" which looks like > it could trigger the undesirable round-trip to the text format. > You may want to test that or bring it up as a question to JDBC folks. > > > Best regards, \lo_export 12345 /dev/null is completed in 0.86 seconds. I save my images as large object, which afaik is in practise not readable with a binary cursor (we should use the lo_* functions). And of course I already use the LargeObjectManager of the postgresql jdbc library. You said, the server has to convert the bytes to hex string before sending it over the wire. In my understanding bytea values are stored as strings and are may compressed in TOAST storage. > The bytea data type allows storage of binary strings [1] What is correct? Your post gave me the hint. I found a binary transfer parameter in the postgresql jdbc library available [2], [3]. But turning it on, doesn't speed anything up. It seems the binary transfer mode is active by default. The byte counter (iptables -v) is nearly as big as the image itself. It is already optimal. packets byte counter psql +ssl 8514 23M psql -ssl 8179 23M pgadmin -ssl 11716 33M pgadmin +ssl -compress 12196 34M pgadmin +ssl +compress 12193 34M java jdbc +ssl 14037 24M java jdbc -ssl 5622 12M (3.1 seconds) java jdbc -ssl binarytransfer=true 5615 12M (3.1 seconds) In fact I do not understand what is the bottleneck. OK my server, runs in a Raspberry 2b+, thats maybe not the best hardware. But the scp command could be finished from there in 1.3 seconds. So the bottleneck is not the network speed. And also not the USB diskdrive. Maybe it is the slow java program? I pointed my java program to my local postgresql instance (with the same image as large object, same mtu, no loopback device, no unix socket, but better system) it was finished in 400 ms. The java progam is out too. Did I forget anything? I'm afraid I have to live with it and may use thumbnail images. Best regards [1] http://www.postgresql.org/docs/current/static/datatype-binary.html [2] https://wiki.postgresql.org/wiki/JDBC-BinaryTransfer [3] https://jdbc.postgresql.org/documentation/94/connect.html#connection-parameters
Attachment
On 1/23/2016 2:19 PM, Johannes wrote: > I save my images as large object, which afaik is in practise not > readable with a binary cursor (we should use the lo_* functions). And of > course I already use the LargeObjectManager of the postgresql jdbc library. afaik, Large Objects are completely independent of the other mode stuff. they are stored and transmitted in binary. I haven't read this whole ongoing thread, just glanced at messages as they passed by over the past week or whatever, but I have to say, I would NOT be storing 11MB images directly in SQL, rather, I would store it on a file server, and access it with nfs or https or whatever is most appropriate for the nature of the application. I would store the location and metadata in SQL. -- john r pierce, recycling bits in santa cruz
Am 23.01.2016 um 23:38 schrieb John R Pierce: > On 1/23/2016 2:19 PM, Johannes wrote: >> I save my images as large object, which afaik is in practise not >> readable with a binary cursor (we should use the lo_* functions). And of >> course I already use the LargeObjectManager of the postgresql jdbc >> library. > > > afaik, Large Objects are completely independent of the other mode > stuff. they are stored and transmitted in binary. Depends on the client. It can be transfered as text or binary. And the data is sliced into bytea segements [1] and afaik it is stored as binary string. > I haven't read this whole ongoing thread, just glanced at messages as > they passed by over the past week or whatever, but I have to say, I > would NOT be storing 11MB images directly in SQL, rather, I would store > it on a file server, and access it with nfs or https or whatever is most > appropriate for the nature of the application. I would store the > location and metadata in SQL. The 11MB file is the biggest image one, the rest is normal. I know about the arguments, but there are pros I want to use in production (transactions, integrity). But if it fails (amount of space?, slow import?) I may exclude the image data. [1] http://www.postgresql.org/docs/9.5/static/catalog-pg-largeobject.html
Attachment
Johannes wrote: > \lo_export 12345 /dev/null is completed in 0.86 seconds. If it's an 11MB file through a 100Mbits/s network, that's pretty much the best that can be expected. I would think the above is the baseline against which the other methods should be compared. > I sa my images as large object, which afaik is in practise not > readable with a binary cursor (we should use the lo_* functions). And of > course I already use the LargeObjectManager of the postgresql jdbc library. Sounds good. > You said, the server has to convert the bytes to hex string before > sending it over the wire. Only in certain contexts. SELECT lo_get(oid) is a query that returns bytea, so if the clients requests results in text format, they will be transferred as text, and it's the responsibility of the client to convert them back to bytes (or not, who knows, maybe the client wants hexadecimal). But lo_get is an exception, and besides a late addition (9.4 I believe). Generally, client-side access to large objects functions doesn't use a client-side SQL query, it's done through the "Function Call sub-protocol" described here: http://www.postgresql.org/docs/current/static/protocol-flow.html#AEN108750 and the result comes back as binary. Presumably the JDBC LargeObjectManager uses that method. Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite
Am 25.01.2016 um 19:59 schrieb Daniel Verite: > Johannes wrote: > >> \lo_export 12345 /dev/null is completed in 0.86 seconds. > > If it's an 11MB file through a 100Mbits/s network, that's > pretty much the best that can be expected. > > I would think the above is the baseline against which > the other methods should be compared. > >> I sa my images as large object, which afaik is in practise not >> readable with a binary cursor (we should use the lo_* functions). And of >> course I already use the LargeObjectManager of the postgresql jdbc library. > > Sounds good. > >> You said, the server has to convert the bytes to hex string before >> sending it over the wire. > > Only in certain contexts. SELECT lo_get(oid) is a query that returns > bytea, so if the clients requests results in text format, they will > be transferred as text, and it's the responsibility of the client > to convert them back to bytes (or not, who knows, maybe the > client wants hexadecimal). > > But lo_get is an exception, and besides a late addition (9.4 I believe). > Generally, client-side access to large objects functions doesn't > use a client-side SQL query, it's done through the > "Function Call sub-protocol" described here: > http://www.postgresql.org/docs/current/static/protocol-flow.html#AEN108750 > and the result comes back as binary. > > Presumably the JDBC LargeObjectManager uses that method. > > Best regards, I thougth \lo_export can only run on server side only (like \copy copy). 0.8 seconds was the rutime on server to server disk. Running from client (transfers only 12M): real 0m3.386s user 0m0.308s sys 0m0.176s Best regards Johannes