libpq compression (part 3) - Mailing list pgsql-hackers
From | Jacob Burroughs |
---|---|
Subject | libpq compression (part 3) |
Date | |
Msg-id | CACzsqT4cJG0kaCbz24Sd=GAEgiQDpzU8yuD6vF25zo870+3M6g@mail.gmail.com Whole thread Raw |
Responses |
Re: libpq compression (part 3)
Re: libpq compression (part 3) |
List | pgsql-hackers |
Hello PG developers! I would like to introduce an updated take on libpq protocol-level compression, building off off the work in https://www.postgresql.org/message-id/aad16e41-b3f9-e89d-fa57-fb4c694bec25@postgrespro.ru and the followon work in https://www.postgresql.org/message-id/ABAA09C6-BB95-47A5-890D-90353533F9AC@yandex-team.ru along with all of the (nice and detailed) feedback and discussion therein. The first patch in the stack replaces `secure_read` and `secure_write` (and their frontend counterparts) with an "IO stream" abstraction, which should address a lot of the concerns from all parties around secure_read. The fundamental idea of an IO stream is that it is a linked list of "IoStreamProcessor"s. The "base" processor is the actual socket-backed API, and then SSL, GSSAPI, and compression can all add layers on top of that base that add functionality and rely on the layer below to read/write data. This structure makes it easy to add compression on top of either plain or encrypted sockets through a unified, unconditional API, and also makes it easy for callers to use plain, plain-compressed, secure, and secure-compressed communication channels equivalently. The second patch is the refactored implementation of compression itself, with ZSTD support merged into the main patch because the configuration-level work is now already merged in master. There was a good bit of rebasing, housekeeping, and bugfixing (including fixing lz4 by making it now be explicitly buffered inside ZStream), along with taking into account a lot of the feedback from this mailing list. I reworked the API to use the general compression processing types and methods from `common/compression`. This change also refactors the protocol to require the minimum amount of new message types and exchanges possible, while also enabling one-directional compression. The compression "handshaking" process now looks as follows: 1. Client sends startup packet with `_pq_.libpq_compression = alg1;alg2` 2. At this point, the server can immediately begin compressing packets to the client with any of the specified algorithms it supports if it so chooses 3. Server includes `libpq_compression` in the automatically sent `ParameterStatus` messages before handshaking 4. At this point, the client can immediately begin compressing packets to the server with any of the supported algorithms Both the server and client will prefer to compress using the first algorithm in their list that the other side supports, and we explicitly support `none` in the algorithm list. This allows e.g. a client to use `none;gzip` and a server to use `zstd;gzip;lz4`, and then the client will not compress its data but the server will send its data using gzip. Each side uses its own configured compression level (if set), since compression levels affect compression effort much more than decompression effort. This change also allows connections to succeed if compression was requested but not available (most of the time, I imagine that a client would prefer to just not use compression if the server doesn't support it; unlike SSL, it's a nice to have not an essential. If a client application actually really *needs* compression, that can still be facilitated by explicitly checking the negotiated compression methods.) The third patch adds the traffic monitoring statistics that had been in the main patch of the previous series. I've renamed them and changed slightly where to measure the actual raw network bytes and the "logical" protocol bytes, which also means this view can measure SSL/GSSAPI overhead (not that that's a particularly *important* thing to measure, but it's worth nothing what the view will actually measure. The fourth patch adds a TAP test that validates all of the compression methods and compression negotiation. Ideally it would probably be part of patch #2, but it uses the monitoring from #3 to be able to validate that compression is actually working. The fifth patch is just a placeholder to allow running the test suite with compression maximally enabled to work out any kinks. I believe this patch series is ready for detailed review/testing, with one caveat: as can be seen here https://cirrus-ci.com/build/6732518292979712 , the build is passing on all platforms and all tests except for the primary SSL test on Windows. After some network-level debugging, it appears that we are bumping into a version of the issues seen here https://www.postgresql.org/message-id/flat/CA%2BhUKG%2BOeoETZQ%3DQw5Ub5h3tmwQhBmDA%3DnuNO3KG%3DzWfUypFAw%40mail.gmail.com , where on Windows some SSL error messages end up getting swallowed up by the the process exiting and closing the socket with a RST rather than a nice clean shutdown. I may have the cause/effect wrong here, but the issues appear before the compression is actually fully set up in the client used and would appear to be a side effect of timing differences and/or possibly size differences in the startup packet. Any pointers on how to resolve this would be appreciated. It does reproduce on Windows fairly readily, though any one particular test still sometimes succeeds, and the relevant SSL connection failure message reliably shows up in Wireshark. Also please let me know if I have made any notable mailing list/patch etiquette/format/structure errors. This is my first time submitting a patch to a mailing-list driven open source project and while I have tried to carefully review the various wiki guides I'm sure I didn't take everything in perfectly. Thanks, Jacob Burroughs
Attachment
pgsql-hackers by date: