Thread: Patch for binary receive of basic array types
Hi, I coded a patch that allows to receive binary coded int2,int4,int8,float4,float8,text arrays. I tried a small read-only benchmark that loads repeatedly same rows from database and records the speed. The test uses the resultSet.getArray(1).getArray() API, not the resultSet.getArray(1).getResultSet() which is slower. I tried with 4 different types of data to better observe the speed differences. column data bytes speed bytes speed data inc speed inc int[] '{1,2,3}' text: rowlen=7 4338.39 ops/s, bin: rowlen=44 4477.75 ops/s => 528.57% 3.21% ops/s int[][] '{{1000,2},{NULL,-4000}}' text: rowlen=23 3315.53 ops/s, bin: rowlen=56 4149.75 ops/s => 143.48% 25.16% ops/s float4[] '{3.141,-523.23,1e5,0}' text: rowlen=32 2803.35 ops/s, bin: rowlen=52 4410.70 ops/s => 62.50% 57.34% ops/s text[] '{abba,\"fa\\\"b\"}' text: rowlen=14 3987.68 ops/s, bin: rowlen=36 4462.47 ops/s => 157.14% 11.91% ops/s summary: speed-up 3-57%, row-length on the wire increases typically between 50% and 200% (smaller increases for larger arraysor complex types) I think the speed-up is enough to justify using of binary transfers for arrays by default even though the data amount onthe wire is increased. -Mikko
Attachment
Mikko Tiihonen <mikko.tiihonen@nitorcreations.com> wrote: > summary: speed-up 3-57%, row-length on the wire increases > typically between 50% and 200% (smaller increases for larger > arrays or complex types) > > I think the speed-up is enough to justify using of binary > transfers for arrays by default even though the data amount on the > wire is increased. What would it take to get the other behavior if you have a network bottleneck but CPU time to spare? -Kevin
On 09/24/2011 12:49 AM, Kevin Grittner wrote: > Mikko Tiihonen<mikko.tiihonen@nitorcreations.com> wrote: > >> summary: speed-up 3-57%, row-length on the wire increases >> typically between 50% and 200% (smaller increases for larger >> arrays or complex types) >> >> I think the speed-up is enough to justify using of binary >> transfers for arrays by default even though the data amount on the >> wire is increased. > > What would it take to get the other behavior if you have a network > bottleneck but CPU time to spare? I did some (too much) measurements & calculations, which are below. In general could add a mechanism to add/remove specific oids from the binary transfer mode as a configuration option to the jdbc driver. The same mechanism could be used for by application code that uses custom types inside PostgreSQL to take advantage of binary transfers. There will always be trade-offs between txt and bin encoding and sometimes the differences are large enough to change from driver defaults. ---- Measurements: for '{1,2,3}' stored in int4[] column in txt form: column in bytes=7, whole response tcp packet in bytes=109 in bin form: column in bytes=44, whole response tcp packet in bytes=146 While column size increased 500%, the tcp packet increased 34%. for '{1,2,3,....,99,100}' stored in int4[] column in txt form: column in bytes=293, whole response tcp packet in bytes=395 in bin form: column in bytes=820, whole response tcp packet in bytes=922 While column size increased 180%, the tcp packet increased 130%. for '{10000,10001,10002,..,10098,10098}' stored in int4[] column in txt form: column in bytes=601, whole response tcp packet in bytes=703 in bin form: column in bytes=820, whole response tcp packet in bytes=922 While column size increased 32%, the tcp packet increased 31%. Calculations: The txt encoding uses 2-7 bytes per value in int2[]. The bin encoding uses 6 bytes per value in int2[] + 19 extra overhead bytes. The txt encoding uses 2-12 bytes per value in int4[]. The bin encoding uses 8 bytes per value in int4[] + 19 extra overhead bytes. The txt encoding uses 2-22 bytes per value in int8[]. The bin encoding uses 12 bytes per value in int8[] + 19 extra overhead bytes. For very small arrays, and especially arrays containing small numeric values the txt encoding uses less bytes but is slower to generate and parse. For extreme case of only storing single digit numbers in int8[] the tcp packet size of bin can be 6x the size of txt encoding. For extreme case of only storing 21 digit negative numbers in int8[] the tcp packet size of txt can be 1.8x the size of bin encoding. Using multidimensional arrays is usually of small advantage to bin format. One can transfer from 500k to 5000k txt int8 values per second in array or 800k bin int8 values per second in array over a 100mbit line.
> Mikko Tiihonen wrote: > In general could add a mechanism to add/remove specific oids from > the binary transfer mode as a configuration option to the jdbc > driver. > > The same mechanism could be used for by application code that uses > custom types inside PostgreSQL to take advantage of binary > transfers. > > There will always be trade-offs between txt and bin encoding and > sometimes the differences are large enough to change from driver > defaults. Yeah, all I'm saying is that when there are such large trade-offs between CPU usage and network traffic requirements, it might be a good idea to allow a user to choose with some connection option. For perspective, just within our organization we have cases where we're communicating between busy machines on the same 1Gbps switch, and cases where we're communicating between nearly-idle machines over a sometimes-saturated 3Mbps WAN connection. This change would be a big win for one and could potentially be a big problem in the other. -Kevin
Patch for allowing explicit enable/disable of binary transfer for each OID type
From
Mikko Tiihonen
Date:
On 09/24/2011 05:46 PM, Kevin Grittner wrote: >> Mikko Tiihonen wrote: > >> In general could add a mechanism to add/remove specific oids from >> the binary transfer mode as a configuration option to the jdbc >> driver. >> >> The same mechanism could be used for by application code that uses >> custom types inside PostgreSQL to take advantage of binary >> transfers. >> >> There will always be trade-offs between txt and bin encoding and >> sometimes the differences are large enough to change from driver >> defaults. > > Yeah, all I'm saying is that when there are such large trade-offs > between CPU usage and network traffic requirements, it might be a > good idea to allow a user to choose with some connection option. For > perspective, just within our organization we have cases where we're > communicating between busy machines on the same 1Gbps switch, and > cases where we're communicating between nearly-idle machines over a > sometimes-saturated 3Mbps WAN connection. This change would be a big > win for one and could potentially be a big problem in the other. Here is a patch that allows using binaryTransferEnable and binaryTransferDisable options to allow driver used to override the defaults. It accepts a comma separated list of OID names (much match names in org.postgres.core.Oid) or any numeric oid value. For example one can use binaryTransferDisable=INT4_ARRAY,1016 binaryTransferEnable=TEXT It will also be useful if for some reason the binary encoding of a variable changes in the server to allow using of old jdbc driver by disabling the binary transfer for one type as a workaround while waiting for a new jdbc driver release. -Mikko