Re: libpq compression - Mailing list pgsql-hackers

From Daniil Zakhlystov
Subject Re: libpq compression
Date
Msg-id 0B5A312E-8FC7-42FF-A208-8BDA50727B40@yandex-team.ru
Whole thread Raw
In response to Re: libpq compression  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: libpq compression  (Konstantin Knizhnik <knizhnik@garret.ru>)
List pgsql-hackers
I completely agree that backward-compatibility is important here.

I think that it is a good idea to clarify how the compression establishment works in the current version of the patch:

1. Frontend send the startup packet which may look like this:

_pq_.compression = 'zlib,zstd' (I omitted the variations with compression levels for clarity)

Then, on the backend, there are two possible cases:
2.1 If the backend is too old and doesn't know anything about the compression or if the compression is disabled on the
backend,it just ignores the compression parameter 
2.2 In the other case, the backend intersects the client compression method with its own supported ones and responds
withcompressionAck message which contains the index of the chosen compression method (or '-1' if it doesn't support any
ofthe methods provided). 

If the frontend receives the compressionAck message, there is also two cases:
3.1 If compressionAck contains '-1', do not initiate compression
3.2 In the other case, initialize the chosen compression method immediately.

My idea is that we can add new compression approaches in the future and initialize them differently on step 3.2.

For example, in the case of switchable compression:

1. Client sends a startup packet with _pq_.compression = 'switchable,zlib,zstd' - it means that client wants switchable
compressionor permanent zlib/zstd compression. 

Again, two main cases on the backend:
2.1 Backend doesn't know about any compression or compression turned off => ignore the _pq_.compression

2.2.1 If the backend doesn't have switchable compression implemented, it won't have 'switchable' in his supported
methods.So it will simply discard this method in the process of the intersection of the client and frontend compression
methodsand respond with some compressionAck message - choose permanent zlib, zstd, or nothing (-1). 

2.2.2 If the backend supports switchable on the fly compression, it will have 'switchable' in his supported methods so
itmay choose 'switchable' in his compressionAck response. 

After that, on the frontend side:
3.1 If compressionAck contains '-1', do not initiate compression

3.2.1 If compressionAck has 'zstd' or 'zlib' as the chosen compression method, init permanent streaming compression
immediately.

3.2.2 If compressionAck has 'switchable' as the chosen compression method, init the switchable compression.
Initializationmay involve sending some additional messages to the backend to negotiate the details like the supported
switchableon the fly compression methods or any other details. 

The same applies to the compression with the different algorithms in each direction. We can call it, for example,
'directional-specific'and init differently on step 3.2. The key is that we don't even have to decide the exact
initializationprotocol for 'switchable' and 'direction-specific'. It may be added in the future. 

Basically, this is what I’ve meant in my previous message about the future expansion of the current design, I hope that
Imanaged to clarify it. 

Thanks,

Daniil Zakhlystov

> On Nov 24, 2020, at 11:35 PM, Robert Haas <robertmhaas@gmail.com> wrote:
>
> On Tue, Nov 24, 2020 at 12:35 PM Daniil Zakhlystov
> <usernamedt@yandex-team.ru> wrote:
>> To sum up, I think that the current implementation already introduces good benefits. As I proposed in the Usability
review,we may introduce the new approaches later as separate compression 'algorithms'. 
>
> I don't think the current patch is so close to being committable that
> we shouldn't be considering what we really want to have here. It's one
> thing to say, well, this patch is basically done, let's not start
> redesigning it now. But that's not the case here. For example, I don't
> see any committer accepting the comments in zpq_stream.c as adequate,
> or the documentation, either. Some comments that have been made
> previously, like Andres's remark about the non-standard message
> construction in pq_configure(), have not been addressed, and I do not
> think any committer is going to agree with the idea that the novel
> method chosen by the patch is superior here, not least but not only
> because it seems like it's endian-dependent. That function also uses
> goto, which anybody thinking of committing this will surely try to get
> rid of, and I'm pretty sure the sscanf() isn't good enough to reject
> trailing garbage, and the error message that follows is improperly
> capitalized. I'm sure there's other stuff, too: this is just based on
> a quick look.
>
> Before we start worrying about any of that stuff in too much detail, I
> think it makes a lot of sense to step back and consider the design.
> Honestly, the work of changing the design might be smaller than the
> amount of cleanup the patch needs. But even if it's larger, it's
> probably not vastly larger. And in any case, I quite disagree with the
> idea that we should commit to a user-visible interface that exposes a
> subset of the functionality that we needed and then try to glue the
> rest of the functionality on top of it later. If we make a libpq
> connection option called compression that controls the type of
> compression that is used in both direction, then how exactly would we
> extend that later to allow for different compression in the two
> directions? Some syntax like compression=zlib/none, where the value
> before the slash controls one direction and the value after the slash
> controls the other? Maybe. But on the other hand, maybe it's better to
> have separate connection options for client compression and server
> compression. Or, maybe the kind of compression used by the server
> should be controlled via a GUC rather than a connection option. Or,
> maybe none of that is right and we should stick with the approach the
> patch currently takes. But it's not like we can do something for v1
> and then just change things randomly later: there will be
> backward-compatibility to worry about. So the time to talk about the
> general approach here is now, before anything gets committed, before
> the project has committed itself to any particular design. If we
> decide in that discussion that certain things can be left for the
> future, that's fine. If we've have discussed how they could be added
> without breaking backward compatibility, even better. But we can't
> just skip over having that discussion.
>
> --
> Robert Haas
> EDB: http://www.enterprisedb.com




pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: mark/restore failures on unsorted merge joins
Next
From: Tom Lane
Date:
Subject: Re: mark/restore failures on unsorted merge joins