Thread: Patch for binary receive of basic array types

Patch for binary receive of basic array types

From
Mikko Tiihonen
Date:
Hi,

I coded a patch that allows to receive binary coded int2,int4,int8,float4,float8,text arrays.


I tried a small read-only benchmark that loads repeatedly same rows from database and records the speed.
The test uses the resultSet.getArray(1).getArray() API, not the resultSet.getArray(1).getResultSet() which is slower.
I tried with 4 different types of data to better observe the speed differences.

column   data                            bytes     speed               bytes     speed            data inc  speed inc
int[]    '{1,2,3}'                 text: rowlen=7  4338.39 ops/s, bin: rowlen=44 4477.75 ops/s => 528.57%    3.21%
ops/s
int[][]  '{{1000,2},{NULL,-4000}}' text: rowlen=23 3315.53 ops/s, bin: rowlen=56 4149.75 ops/s => 143.48%   25.16%
ops/s
float4[] '{3.141,-523.23,1e5,0}'   text: rowlen=32 2803.35 ops/s, bin: rowlen=52 4410.70 ops/s =>  62.50%   57.34%
ops/s
text[]   '{abba,\"fa\\\"b\"}'      text: rowlen=14 3987.68 ops/s, bin: rowlen=36 4462.47 ops/s => 157.14%   11.91%
ops/s

summary: speed-up 3-57%, row-length on the wire increases typically between 50% and 200% (smaller increases for larger
arraysor complex types) 


I think the speed-up is enough to justify using of binary transfers for arrays by default even though the data amount
onthe wire is increased. 

-Mikko

Attachment

Re: Patch for binary receive of basic array types

From
"Kevin Grittner"
Date:
Mikko Tiihonen <mikko.tiihonen@nitorcreations.com> wrote:

> summary: speed-up 3-57%, row-length on the wire increases
> typically between 50% and 200% (smaller increases for larger
> arrays or complex types)
>
> I think the speed-up is enough to justify using of binary
> transfers for arrays by default even though the data amount on the
> wire is increased.

What would it take to get the other behavior if you have a network
bottleneck but CPU time to spare?

-Kevin

Re: Patch for binary receive of basic array types

From
Mikko Tiihonen
Date:
On 09/24/2011 12:49 AM, Kevin Grittner wrote:
> Mikko Tiihonen<mikko.tiihonen@nitorcreations.com>  wrote:
>
>> summary: speed-up 3-57%, row-length on the wire increases
>> typically between 50% and 200% (smaller increases for larger
>> arrays or complex types)
>>
>> I think the speed-up is enough to justify using of binary
>> transfers for arrays by default even though the data amount on the
>> wire is increased.
>
> What would it take to get the other behavior if you have a network
> bottleneck but CPU time to spare?

I did some (too much) measurements & calculations, which are below.

In general could add a mechanism to add/remove specific oids from the
binary transfer mode as a configuration option to the jdbc driver.

The same mechanism could be used for by application code that uses custom types
inside PostgreSQL to take advantage of binary transfers.

There will always be trade-offs between txt and bin encoding and sometimes
the differences are large enough to change from driver defaults.

----

Measurements:
for '{1,2,3}' stored in int4[] column
in txt form: column in bytes=7,  whole response tcp packet in bytes=109
in bin form: column in bytes=44, whole response tcp packet in bytes=146
While column size increased 500%, the tcp packet increased 34%.

for '{1,2,3,....,99,100}' stored in int4[] column
in txt form: column in bytes=293,  whole response tcp packet in bytes=395
in bin form: column in bytes=820, whole response tcp packet in bytes=922
While column size increased 180%, the tcp packet increased 130%.

for '{10000,10001,10002,..,10098,10098}' stored in int4[] column
in txt form: column in bytes=601,  whole response tcp packet in bytes=703
in bin form: column in bytes=820, whole response tcp packet in bytes=922
While column size increased 32%, the tcp packet increased 31%.

Calculations:
The txt encoding uses  2-7 bytes per value in int2[].
The bin encoding uses    6 bytes per value in int2[] + 19 extra overhead bytes.

The txt encoding uses 2-12 bytes per value in int4[].
The bin encoding uses    8 bytes per value in int4[] + 19 extra overhead bytes.

The txt encoding uses 2-22 bytes per value in int8[].
The bin encoding uses   12 bytes per value in int8[] + 19 extra overhead bytes.

For very small arrays, and especially arrays containing small numeric values
the txt encoding uses less bytes but is slower to generate and parse.

For extreme case of only storing single digit numbers in int8[] the tcp packet
size of bin can be 6x the size of txt encoding.

For extreme case of only storing 21 digit negative numbers in int8[] the tcp packet
size of txt can be 1.8x the size of bin encoding.

Using multidimensional arrays is usually of small advantage to bin format.

One can transfer from 500k to 5000k txt int8 values per second in array or
800k bin int8 values per second in array over a 100mbit line.

Re: Patch for binary receive of basic array types

From
"Kevin Grittner"
Date:
> Mikko Tiihonen  wrote:

> In general could add a mechanism to add/remove specific oids from
> the binary transfer mode as a configuration option to the jdbc
> driver.
>
> The same mechanism could be used for by application code that uses
> custom types inside PostgreSQL to take advantage of binary
> transfers.
>
> There will always be trade-offs between txt and bin encoding and
> sometimes the differences are large enough to change from driver
> defaults.

Yeah, all I'm saying is that when there are such large trade-offs
between CPU usage and network traffic requirements, it might be a
good idea to allow a user to choose with some connection option.  For
perspective, just within our organization we have cases where we're
communicating between busy machines on the same 1Gbps switch, and
cases where we're communicating between nearly-idle machines over a
sometimes-saturated 3Mbps WAN connection.  This change would be a big
win for one and could potentially be a big problem in the other.

-Kevin

Patch for allowing explicit enable/disable of binary transfer for each OID type

From
Mikko Tiihonen
Date:
On 09/24/2011 05:46 PM, Kevin Grittner wrote:
>> Mikko Tiihonen  wrote:
>
>> In general could add a mechanism to add/remove specific oids from
>> the binary transfer mode as a configuration option to the jdbc
>> driver.
>>
>> The same mechanism could be used for by application code that uses
>> custom types inside PostgreSQL to take advantage of binary
>> transfers.
>>
>> There will always be trade-offs between txt and bin encoding and
>> sometimes the differences are large enough to change from driver
>> defaults.
>
> Yeah, all I'm saying is that when there are such large trade-offs
> between CPU usage and network traffic requirements, it might be a
> good idea to allow a user to choose with some connection option.  For
> perspective, just within our organization we have cases where we're
> communicating between busy machines on the same 1Gbps switch, and
> cases where we're communicating between nearly-idle machines over a
> sometimes-saturated 3Mbps WAN connection.  This change would be a big
> win for one and could potentially be a big problem in the other.

Here is a patch that allows using binaryTransferEnable and
binaryTransferDisable options to allow driver used to override the
defaults.

It accepts a comma separated list of OID names (much match names in
org.postgres.core.Oid) or any numeric oid value.

For example one can use
binaryTransferDisable=INT4_ARRAY,1016
binaryTransferEnable=TEXT

It will also be useful if for some reason the binary encoding of a
variable changes in the server to allow using of old jdbc driver by
disabling the binary transfer for one type as a workaround while
waiting for a new jdbc driver release.

-Mikko

Attachment