Re: libpq compression - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: libpq compression
Date
Msg-id 943c3ef5-47aa-3480-75c7-421ea81e6f65@garret.ru
Whole thread Raw
In response to Re: libpq compression  (Daniil Zakhlystov <usernamedt@yandex-team.ru>)
List pgsql-hackers

On 24.11.2020 22:47, Daniil Zakhlystov wrote:
> I completely agree that backward-compatibility is important here.
>
> I think that it is a good idea to clarify how the compression establishment works in the current version of the
patch:
>
> 1. Frontend send the startup packet which may look like this:
>
> _pq_.compression = 'zlib,zstd' (I omitted the variations with compression levels for clarity)
>
> Then, on the backend, there are two possible cases:
> 2.1 If the backend is too old and doesn't know anything about the compression or if the compression is disabled on
thebackend, it just ignores the compression parameter
 
> 2.2 In the other case, the backend intersects the client compression method with its own supported ones and responds
withcompressionAck message which contains the index of the chosen compression method (or '-1' if it doesn't support any
ofthe methods provided).
 
>
> If the frontend receives the compressionAck message, there is also two cases:
> 3.1 If compressionAck contains '-1', do not initiate compression
> 3.2 In the other case, initialize the chosen compression method immediately.
>
> My idea is that we can add new compression approaches in the future and initialize them differently on step 3.2.
>
> For example, in the case of switchable compression:
>
> 1. Client sends a startup packet with _pq_.compression = 'switchable,zlib,zstd' - it means that client wants
switchablecompression or permanent zlib/zstd compression.
 
>
> Again, two main cases on the backend:
> 2.1 Backend doesn't know about any compression or compression turned off => ignore the _pq_.compression
>
> 2.2.1 If the backend doesn't have switchable compression implemented, it won't have 'switchable' in his supported
methods.So it will simply discard this method in the process of the intersection of the client and frontend compression
methodsand respond with some compressionAck message - choose permanent zlib, zstd, or nothing (-1).
 
>
> 2.2.2 If the backend supports switchable on the fly compression, it will have 'switchable' in his supported methods
soit may choose 'switchable' in his compressionAck response.
 
>
> After that, on the frontend side:
> 3.1 If compressionAck contains '-1', do not initiate compression
>
> 3.2.1 If compressionAck has 'zstd' or 'zlib' as the chosen compression method, init permanent streaming compression
immediately.
>
> 3.2.2 If compressionAck has 'switchable' as the chosen compression method, init the switchable compression.
Initializationmay involve sending some additional messages to the backend to negotiate the details like the supported
switchableon the fly compression methods or any other details.
 
>
> The same applies to the compression with the different algorithms in each direction. We can call it, for example,
'directional-specific'and init differently on step 3.2. The key is that we don't even have to decide the exact
initializationprotocol for 'switchable' and 'direction-specific'. It may be added in the future.
 
>
> Basically, this is what I’ve meant in my previous message about the future expansion of the current design, I hope
thatI managed to clarify it.
 
>
> Thanks,
>
> Daniil Zakhlystov
>
>> On Nov 24, 2020, at 11:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>>
>> On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
>> <usernamedt@yandex-team.ru> wrote:
>>> To sum up, I think that the current implementation already introduces good benefits. As I proposed in the Usability
review,we may introduce the new approaches later as separate compression 'algorithms'.
 
>> I don't think the current patch is so close to being committable that
>> we shouldn't be considering what we really want to have here. It's one
>> thing to say, well, this patch is basically done, let's not start
>> redesigning it now. But that's not the case here. For example, I don't
>> see any committer accepting the comments in zpq_stream.c as adequate,
>> or the documentation, either. Some comments that have been made
>> previously, like Andres's remark about the non-standard message
>> construction in pq_configure(), have not been addressed, and I do not
>> think any committer is going to agree with the idea that the novel
>> method chosen by the patch is superior here, not least but not only
>> because it seems like it's endian-dependent. That function also uses
>> goto, which anybody thinking of committing this will surely try to get
>> rid of, and I'm pretty sure the sscanf() isn't good enough to reject
>> trailing garbage, and the error message that follows is improperly
>> capitalized. I'm sure there's other stuff, too: this is just based on
>> a quick look.
>>
>> Before we start worrying about any of that stuff in too much detail, I
>> think it makes a lot of sense to step back and consider the design.
>> Honestly, the work of changing the design might be smaller than the
>> amount of cleanup the patch needs. But even if it's larger, it's
>> probably not vastly larger. And in any case, I quite disagree with the
>> idea that we should commit to a user-visible interface that exposes a
>> subset of the functionality that we needed and then try to glue the
>> rest of the functionality on top of it later. If we make a libpq
>> connection option called compression that controls the type of
>> compression that is used in both direction, then how exactly would we
>> extend that later to allow for different compression in the two
>> directions? Some syntax like compression=zlib/none, where the value
>> before the slash controls one direction and the value after the slash
>> controls the other? Maybe. But on the other hand, maybe it's better to
>> have separate connection options for client compression and server
>> compression. Or, maybe the kind of compression used by the server
>> should be controlled via a GUC rather than a connection option. Or,
>> maybe none of that is right and we should stick with the approach the
>> patch currently takes. But it's not like we can do something for v1
>> and then just change things randomly later: there will be
>> backward-compatibility to worry about. So the time to talk about the
>> general approach here is now, before anything gets committed, before
>> the project has committed itself to any particular design. If we
>> decide in that discussion that certain things can be left for the
>> future, that's fine. If we've have discussed how they could be added
>> without breaking backward compatibility, even better. But we can't
>> just skip over having that discussion.
>>
>> --
>> Robert Haas
>> EDB: http://www.enterprisedb.com


I attach new version of the patch with one more bug fix and minimal 
support of future protocol extension.
Now if you define environment PGCOMPRESSION='zstd' (or zlib) then it is 
possible to pass all regression tests with enabled compression.
I wonder if we are going to support protocol compression in PG14?
If so, I will try to fix all reported issues with code style and include 
in handshake protocol some general mechanism which allow in future to 
support some advanced features.
Right now client sends to the server list of the supported protocols (as 
comma separated list with with optional compression level, i.e. 
"ztsd:10,zlib") and server replies with index of used compression 
algorithm in this list (or -1 if compression is disabled).

The minimal change in handshake protocol from my point of view is to add 
to client message arbitrary extension part which is not currently 
interpreted by server:
"ztsd:10,zlib;arbitrary text"

So all text after ';' will be currently ignored by backend but may be 
interpreted in future.
When old backend connects to new server (extended part is missed), then 
server responds with algorithm index (as it is done now)
When new backend connects to old server, then server just ignores 
extension and responds with algorithm index.
When new backend connects to new server, then server may interpret 
extended part and return either algorithm index, either some larger 
response and based on message
size new client will recognize that server is using extended response
format and properly interpret response.

I do not think that we really need to discuss now advanced features 
(like switching algorithm or compression level on the fly or be able to 
use different algorithms in different directions).
But I am open for this discussion. I have only one suggestion: before 
discussing any advanced feature, let's try to formulate
1. Why do we need it (what are the expected advantages)?
2. How it can be used?

Assume that we will wan to implement smarter algorithm choice.
The question number 1  is which algorithms we are going to support.
Right now I have supported zstd (demonstrating best quality/speed 
results on most  payloads I have tested) and zlib (available almost 
everywhere and supported by default by Postgres).
We can also add lz4 which is also very fast and may be more 
compact/popular than zstd.
I do not know some other algorithms which are much better for traffic 
compression than this two.
There are certainly type specific compression algorithms, but the only 
really useful algorithm I know is compression of monotonic sequence of 
integers and floats.
Not sure that it is relevant for libpq.
So there seems to be sense to switch lz4 with zstd on the fly or use lz4 
for compression of traffic from server to client and zstd - from client 
to server (or visa versa).

And more important question - if we really want to switch algorithms on 
the fly: who and how will do it?
Do we want user to explicitly control it (something like "\compression 
on"  psql command)?
Or there should be some API for application?
How it can be supported for example by JDBC driver?
I do not have answers for this questions...


P.S.
I am grateful for the criticism.
Sorry if some my answers are considered as impolite.
I just a developer and for me writing code is much easier than writing 
e-mails.

Attachment

pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: allow to \dtS+ pg_toast.*
Next
From: "Bossart, Nathan"
Date:
Subject: Re: A few new options for CHECKPOINT