Bytea network traffic: binary vs text result format - Mailing list pgsql-interfaces

From Miha D. Puc
Subject Bytea network traffic: binary vs text result format
Date
Msg-id 466299A7.20403@eba.si
Whole thread Raw
Responses Re: Bytea network traffic: binary vs text result format
Re: Bytea network traffic: binary vs text result format
List pgsql-interfaces
Hi!

There was some debate recently about using text or binary format.
There's people who would like to use it but have trouble converting
binary encoded results into native types and there's people that say
there's not much performance difference.

I'd like to stress that performance is very different over slow
network.  The biggest difference is for byte where the text format
performance is about 3.5 times worse at inserts and updates and about
2.9 times worse at selects . Here's the reasoning:

In text format bytea are escaped using PQescapeBytea. In an average
binary stream about 2/3 would be escaped. Each escaped byte becomes of
form \\ooo at upload and of form \ooo for download, so the size of the
escaped stream is 1/3 + 2/3 * 5 = 11/3 = 3.6 and 1/3 + 2/3 * 4 = 3
respectively.

Here are the results of my test. I inserted and selected an OpenOffice
document of size 2Mb over a 2M/512K cable.
text format: insert: 120.1s select: 24.9s
binary format: insert: 33.5s select: 8.6s
factor:  insert: 3.6 select: 2.9

The difference between the test and the above calculation comes from the
estimate that 2/3 of bytes are escaped where in fact 95 out of 256 are
escaped (63%).

So there is a need (people asking) and reason (performance) to use
binary format. But there's a huge drawback - the conversions. It's easy
for varchar, not too bad for basic types (int, float, bool), effort is
needed for timestamp, date, time and numeric is a pain.

So with all the above there should be a utility for conversion between
binary format and native types and/or string format in libpq.

Regards,
Miha Puc




pgsql-interfaces by date:

Previous
From: "Francisco Figueiredo Jr."
Date:
Subject: Re: Question on NpgsqlParameter object
Next
From: Markus Schiltknecht
Date:
Subject: Re: Bytea network traffic: binary vs text result format