Re: libpq compression - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: libpq compression |
Date | |
Msg-id | 852c6ee2-5971-dc56-923b-f8eeea6b8961@postgrespro.ru Whole thread Raw |
In response to | Re: libpq compression (Robbie Harwood <rharwood@redhat.com>) |
Responses |
Re: libpq compression
|
List | pgsql-hackers |
On 22.06.2018 18:59, Robbie Harwood wrote: > Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: > >> On 21.06.2018 20:14, Robbie Harwood wrote: >>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: >>>> On 21.06.2018 17:56, Robbie Harwood wrote: >>>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: >>>>>> On 20.06.2018 23:34, Robbie Harwood wrote: >>>>>>> Konstantin Knizhnik <k.knizhnik@postgrespro.ru> writes: >>>>>>> >>>>>>> Well, that's a design decision you've made. You could put >>>>>>> lengths on chunks that are sent out - then you'd know exactly how >>>>>>> much is needed. (For instance, 4 bytes of network-order length >>>>>>> followed by a complete payload.) Then you'd absolutely know >>>>>>> whether you have enough to decompress or not. >>>>>> Do you really suggest to send extra header for each chunk of data? >>>>>> Please notice that chunk can be as small as one message: dozen of >>>>>> bytes because libpq is used for client-server communication with >>>>>> request-reply pattern. >>>>> I want you to think critically about your design. I *really* don't >>>>> want to design it for you - I have enough stuff to be doing. But >>>>> again, the design I gave you doesn't necessarily need that - you >>>>> just need to properly buffer incomplete data. >>>> Right now secure_read may return any number of available bytes. But >>>> in case of using streaming compression, it can happen that available >>>> number of bytes is not enough to perform decompression. This is why >>>> we may need to try to fetch additional portion of data. This is how >>>> zpq_stream is working now. >>> No, you need to buffer and wait until you're called again. Which is >>> to say: decompress() shouldn't call secure_read(). secure_read() >>> should call decompress(). >> I this case I will have to implement this code twice: both for backend >> and frontend, i.e. for secure_read/secure_write and >> pqsecure_read/pqsecure_write. > Likely, yes. You can see that this is how TLS does it (which you should > be using as a model, architecture-wise). > >> Frankly speaking i was very upset by design of libpq communication >> layer in Postgres: there are two different implementations of almost >> the same stuff for cbackend and frontend. > Changing the codebases so that more could be shared is not necessarily a > bad idea; however, it is a separate change from compression. > >>>> I do not understand how it is possible to implement in different way >>>> and what is wrong with current implementation. >>> The most succinct thing I can say is: absolutely don't modify >>> pq_recvbuf(). I gave you pseudocode for how to do that. All of your >>> logic should be *below* the secure_read/secure_write functions. >>> >>> I cannot approve code that modifies pq_recvbuf() in the manner you >>> currently do. >> Well. I understand you arguments. But please also consider my >> argument above (about avoiding code duplication). >> >> In any case, secure_read function is called only from pq_recvbuf() as >> well as pqsecure_read is called only from pqReadData. So I do not >> think principle difference in handling compression in secure_read or >> pq_recvbuf functions and do not understand why it is "destroying the >> existing model". >> >> Frankly speaking, I will really like to destroy existed model, moving >> all system dependent stuff in Postgres to SAL and avoid this awful mix >> of code sharing and duplication between backend and frontend. But it >> is a another story and I do not want to discuss it here. > I understand you want to avoid code duplication. I will absolutely > agree that the current setup makes it difficult to share code between > postmaster and libpq clients. But the way I see it, you have two > choices: > > 1. Modify the code to make code sharing easier. Once this has been > done, *then* build a compression patch on top, with the nice new > architecture. > > 2. Leave the architecture as-is and add compression support. > (Optionally, you can make code sharing easier at a later point.) > > Fundamentally, I think you value code sharing more than I do. So while > I might advocate for (2), you might personally prefer (1). > > But right now you're not doing either of those. > >> If we are speaking about the "right design", then neither your >> suggestion, neither my implementation are good and I do not see >> principle differences between them. >> >> The right approach is using "decorator pattern": this is how streams >> are designed in .Net/Java. You can construct pipe of "secure", >> "compressed" and whatever else streams. > I *strongly* disagree, but I don't think you're seriously suggesting > this. > >>>>>>> My idea was the following: client want to use compression. But >>>>>>> server may reject this attempt (for any reasons: it doesn't support >>>>>>> it, has no proper compression library, do not want to spend CPU for >>>>>>> decompression,...) Right now compression algorithm is >>>>>>> hardcoded. But in future client and server may negotiate to choose >>>>>>> proper compression protocol. This is why I prefer to perform >>>>>>> negotiation between client and server to enable compression. Well, >>>>>>> for negotiation you could put the name of the algorithm you want in >>>>>>> the startup. It doesn't have to be a boolean for compression, and >>>>>>> then you don't need an additional round-trip. >>>>>> Sorry, I can only repeat arguments I already mentioned: >>>>>> - in future it may be possible to specify compression algorithm >>>>>> - even with boolean compression option server may have some reasons to >>>>>> reject client's request to use compression >>>>>> >>>>>> Extra flexibility is always good thing if it doesn't cost too >>>>>> much. And extra round of negotiation in case of enabling compression >>>>>> seems to me not to be a high price for it. >>>>> You already have this flexibility even without negotiation. I don't >>>>> want you to lose your flexibility. Protocol looks like this: >>>>> >>>>> - Client sends connection option "compression" with list of >>>>> algorithms it wants to use (comma-separated, or something). >>>>> >>>>> - First packet that the server can compress one of those algorithms >>>>> (or none, if it doesn't want to turn on compression). >>>>> >>>>> No additional round-trips needed. >>>> This is exactly how it works now... Client includes compression >>>> option in connection string and server replies with special message >>>> ('Z') if it accepts request to compress traffic between this client >>>> and server. >>> No, it's not. You don't need this message. If the server receives a >>> compression request, it should just turn compression on (or not), and >>> then have the client figure out whether it got compression back. >> How it will managed to do it. It receives some reply and first of all >> it should know whether it has to be decompressed or not. > You can tell whether a message is compressed by looking at it. The way > the protocol works, every message has a type associated with it: a > single byte, like 'R', that says what kind of message it is. Compressed message can contain any sequence of bytes, including 'R':) > > Thanks, > --Robbie -- Konstantin Knizhnik Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
pgsql-hackers by date: