Re: libpq compression - Mailing list pgsql-hackers

From Tomas Vondra
Subject Re: libpq compression
Date
Msg-id 739641a6-f64d-0f2b-8483-34ab23e2dcd6@2ndquadrant.com
Whole thread Raw
In response to Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: libpq compression
List pgsql-hackers


On 2/15/19 3:03 PM, Konstantin Knizhnik wrote:
> 
> 
> On 15.02.2019 15:42, Peter Eisentraut wrote:
>> On 2018-06-19 09:54, Konstantin Knizhnik wrote:
>>> The main drawback of streaming compression is that you can not
>>> decompress some particular message without decompression of all previous
>>> messages.
>> It seems this would have an adverse effect on protocol-aware connection
>> proxies: They would have to uncompress everything coming in and
>> recompress everything going out.
>>
>> The alternative of compressing each packet individually would work much
>> better: A connection proxy could peek into the packet header and only
>> uncompress the (few, small) packets that it needs for state and routing.
>>
> Individual compression of each message depreciate all idea of libpq 
> compression. Messages are two small to efficiently compress each of
> them separately. So using streaming compression algorithm is
> absolutely necessary here.
> 

Hmmm, I see Peter was talking about "packets" while you're talking about
"messages". Are you talking about the same thing?

Anyway, I was going to write about the same thing - that per-message
compression would likely eliminate most of the benefits - but I'm
wondering if it's actually true. That is, how much will the compression
ratio drop if we compress individual messages?

Obviously, if there are just tiny messages, it might easily eliminate
any benefits (and in fact it would add overhead). But I'd say we're way
more interested in transferring large data sets (result sets, data for
copy, etc.) and presumably those messages are much larger. So maybe we
could compress just those, somehow?

> Concerning possible problem with proxies I do not think that it is
> really a problem.
> Proxy is very rarely located somewhere in the "middle" between client
> and database servers.
> It is usually launched either in the same network as DBMS client (for
> example, if client is application server) either in the same network
> with database server.
> In both cases there is not so much sense to pass compressed traffic
> through the proxy.
> If proxy and DBMS server are located in the same network, then proxy
> should perform decompression and send
> decompressed messages to the database server.
> 

I don't think that's entirely true. It makes perfect sense to pass
compressed traffic in various situations - even in local network the
network bandwidth matters quite a lot, these days. Because "local
network" may be "same availability zone" or "same DC" etc.

That being said, I'm not sure it's a big deal / issue when the proxy has
to deal with compression.

Either it's fine to forward decompressed data, so the proxy performs
just decompression, which requires much less CPU. (It probably needs to
compress data in the opposite direction, but it's usually quite
asymmetric - much more data is sent in one direction). Or the data has
to be recompressed, because it saves enough network bandwidth. It's
essentially a trade-off between using CPU and network bandwidth.

IMHO it'd be nonsense to adopt the per-message compression based merely
on the fact that it might be easier to handle on proxies. We need to
know if we can get reasonable compression ratio with that approach,
because if not then it's useless that it's more proxy-friendly.

Do the proxies actually need to recompress the data? Can't they just
decompress it to determine which messages are in the data, and then
forward the original compressed stream? That would be much cheaper,
because decompression requires much less CPU. Although, some proxies
(like connection pools) probably have to compress the connections
independently ...

> Thank you very much for noticing this problem with compatibility
> compression and protocol-aware connection proxies.
> I have wrote that current compression implementation (zpq_stream.c) can
> be used not only for libpq backend/frontend, but
> also for compression any other streaming data. But I could not imaging
> what other data sources can require compression.
> And proxy is exactly such case: it also needs to compress/decompress
> messages.
> It is one more argument to make interface of zpq_stream as simple as
> possible and encapsulate all inflating/deflating logic in this code.
> It can be achieved by passing arbitrary rx/tx function to zpq_create
> function.
> 

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Using POPCNT and other advanced bit manipulation instructions
Next
From: Konstantin Knizhnik
Date:
Subject: Re: libpq compression