Re: libpq compression - Mailing list pgsql-hackers
From | Konstantin Knizhnik |
---|---|
Subject | Re: libpq compression |
Date | |
Msg-id | a6dbaab3-a9fd-7f57-7fbe-d0c088cf2e4a@garret.ru Whole thread Raw |
In response to | Re: libpq compression (Justin Pryzby <pryzby@telsasoft.com>) |
List | pgsql-hackers |
On 12.01.2021 4:20, Justin Pryzby wrote:
On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote:On 09.01.2021 23:31, Justin Pryzby wrote:I suggest that there should be an enum of algorithms, which is constant across all servers. They would be unconditionally included and not #ifdef depending on compilation options.I do not think that it is possible (even right now, it is possible to build Postgres without zlib support). Also if new compression algorithms are added, then in any case we have to somehow handle situation when old client is connected to new server and visa versa.I mean an enum of all compression supported in the master branch, starting with ZLIB = 1. I think this applies to libpq compression (which requires client and server to handle a common compression algorithm), and pg_dump (output of which needs to be read by pg_restore), but maybe not TOAST, the patch for which supports extensions, with dynamically allocated OIDs.
Sorry, I do not understand the goal of introducing this enum.
Algorithms are in any case specified by name. And internally there is no need in such enum,
at least in libpq compression.
In config.sgml, it says that libpq_compression defaults to on (on the server side), but in libpq.sgml it says that it defaults to off (on the client side). Is that what's intended ? I would've thought the defaults would match, or that the server would enforce a default more conservative than the client's (the DBA should probably have to explicitly enable compression, and need to "opt-in" rather than "opt-out").Yes, it is intended behavior: libpq_compression GUC allows to prohibit compression requests fro clients if due to some reasons (security, CPU consumption is not desired). But by default server should support compression if it is requested by client.It's not clear to me if that's true.. It may be what's convenient for you, especially during development, but that doesn't mean it's safe or efficient or what's generally desirable to everyone.
Definitely it is only my point of view, may be DBA will have different opinions.
I think that compression is not dangerousness or resource consuming feature which should be disabled by default.
At least there is on GUC prohibiting SSL connections and them are even more expensive.
In any case, I will be glad to collect votes whether compression requests should be be default rejected by server or not.
But client should not request compression by default: it makes sense only for queries returning large result sets or transferring a lot of data (liek COPY).I think you're making assumptions about everyone's use of the tools, and it's better if the DBA makes that determination. The clients aren't generally under the admin's control, and if they don't request compression, then it's not used. If they request compression, then the DBA still has control over whether it's allowed. We agree that it should be disabled by default, but I suggest that it's most flexible if client's makes the request and allow the server to decide. By default the server should deny/ignore the request, with the client gracefully falling back to no compression.
I think that compression will be most efficient for internal connections (i.e. replication, bulk data loading, ...)
which are mostly controlled by DBA.
I mostly agree with you here, except that compression level 1 gives little effect.Compression would have little effect on most queries, especially at default level=1.
All my experiments both bith page level compression and libpq compression shows that default compression level
provides optimal balance between compression ratio and compression speed.
In Daniil's benchmark the difference between compression ration 1 and 19 in zstd is only 30%.
Right now "compression" parameter accepts not only boolean values but alsolist of suggested algorithms with optional compression level, like "zstd:1,zlib"You're talking about the client compression param. I'm suggesting that the server should allow fine-grained control of what compression algs are permitted at *runtime*. This would allow distributions to compile with all compression libraries enabled at configure time, and still allow an DBA to disable one without recompiling.
Frankly speaking I do not see much sense in it.
My point of view is the following: there are several well known and widely used compression algorithms now:
zlib, zstd, lz4. There are few others but results of various my benchmarks (mostly for page level compression)
shows that zstd provides best compression ratio/speed balance and lz4 - the best speed.
zlib is available almost everywhere and postgres by default is configured with zlib.
So it can be considered as default compression algorithm available everywhere.
If Postgres was built with zstd or lz4 (not currently supported for libpq compression) then it should be used instead of zlib
because it is faster and provides better compression quality.
So I do not see any other arguments for enabling or disabling some particular compression algorithms by DBA or suggesting it by client.
That's possible with environment variable or connection string. I'm proposing to simplify change of its default when recompiling, just like this one: src/interfaces/libpq/fe-connect.c:#define DefaultHost "localhost" Think this would be a 2 line change, and makes the default less hardcoded.
Sorry, my intention was the following: there is list of supported algorithms in order of decreasing their efficiency
(yes, I realize that in general case it may be not always possible to provide partial order on the set of supported compression algorithms,
but right now with zstd and zlib it is trivial).
Client sends to the server its list of supported algorithms (or explicitly specified by user) and server choose most efficient and supported among them.
So there is no need to specify some default in this schema. We can change order of algorithms in the array to affect choice of default algorithm.
Makes sense. I add this check.Your patch warns if *none* of the specified algorithms are supported, but I wonder if it should warn if *any* of them are unsupported (like if someone writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess about half of us will do).Ugh... Handling mistyping in connection string seems to be not so good idea (from my point of view). And situation when some of algorithms is not supported by server seems to normal (if new client connects to old server). So I do not want to produce warning in this case.I'm not talking about warning if an alg is "not supported by server", but rather if it's "not supported by client". I think there should be a warning if a connection string specifies two libraries, but one is misspelled.
I think it's good if it *can* be automatic. But it's also good if the DBA can implement a simple "policy" about what's (not) supported. If the user requests a compression and it's not known on the *client* side I think it should warn. BTW I think the compression should be shown in psql \conninfo
Good point: done.
+ if ((unsigned)index >= conn->n_compressors) + { + appendPQExpBuffer(&conn->errorMessage, + libpq_gettext( + "server returns incorrect compression aslogirhm index: %d\n"), Now, you're checking that the index is not too large, but not that it's not too small.
Sorry, it was my "fad" to write most optimal code which is completely not relevant in this place.
I specially use unsigned comparison to perform check using just one comparison instead of two.
I have added comment here.
Right now, when I try to connect to an unpatched server, I get: psql: error: expected authentication request from server, but received vThank you for pointing it: fixed.I tested that I can connect to unpatced server, thanks. I have to confess that I don't know how this works, or what it has to do with what the commit message claims it does ? Are you saying I can use zlib in one direction and zstd in the other ? How would I do that ?
Sorry, this feature was suggested by Andres and partly implemented by Daniil.
I always think that it is useless and overkill feature.
New version of the patch with the suggested changes is attached.
Attachment
pgsql-hackers by date: