Re: libpq compression - Mailing list pgsql-hackers

From Konstantin Knizhnik
Subject Re: libpq compression
Date
Msg-id a6dbaab3-a9fd-7f57-7fbe-d0c088cf2e4a@garret.ru
Whole thread Raw
In response to Re: libpq compression  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers


On 12.01.2021 4:20, Justin Pryzby wrote:
On Mon, Jan 11, 2021 at 04:53:51PM +0300, Konstantin Knizhnik wrote:
On 09.01.2021 23:31, Justin Pryzby wrote:
I suggest that there should be an enum of algorithms, which is constant across
all servers.  They would be unconditionally included and not #ifdef depending
on compilation options.
I do not think that it is possible (even right now, it is possible to build
Postgres without zlib support).
Also if new compression algorithms  are added, then in any case we have to
somehow handle situation when
old client is connected to new server and visa versa.
I mean an enum of all compression supported in the master branch, starting with
ZLIB = 1.  I think this applies to libpq compression (which requires client and
server to handle a common compression algorithm), and pg_dump (output of which
needs to be read by pg_restore), but maybe not TOAST, the patch for which
supports extensions, with dynamically allocated OIDs.

Sorry, I do not understand the goal of introducing this enum.
Algorithms are in any case specified by name. And internally there is no need in such enum,
at least in libpq compression.

In config.sgml, it says that libpq_compression defaults to on (on the server
side), but in libpq.sgml it says that it defaults to off (on the client side).
Is that what's intended ?  I would've thought the defaults would match, or that
the server would enforce a default more conservative than the client's (the DBA
should probably have to explicitly enable compression, and need to "opt-in"
rather than "opt-out").
Yes, it is intended behavior:  libpq_compression GUC allows to prohibit
compression requests fro clients if due to some reasons (security, CPU
consumption is not desired).
But by default server should support compression if it is requested by
client.
It's not clear to me if that's true..   It may be what's convenient for you,
especially during development, but that doesn't mean it's safe or efficient or
what's generally desirable to everyone.

Definitely it is only my point of view, may be DBA will have different opinions.
I think that compression is not dangerousness or resource consuming feature which should be disabled by default.
At least there is on GUC prohibiting SSL connections and them are even more expensive.
In any case, I will be glad to collect votes whether compression requests should be be default rejected by server or not.

But client should not request compression by default: it makes sense only for
queries returning large result sets or transferring a lot of data (liek
COPY).
I think you're making assumptions about everyone's use of the tools, and it's
better if the DBA makes that determination.  The clients aren't generally under
the admin's control, and if they don't request compression, then it's not used.
If they request compression, then the DBA still has control over whether it's
allowed.

We agree that it should be disabled by default, but I suggest that it's most
flexible if client's makes the request and allow the server to decide.  By
default the server should deny/ignore the request, with the client gracefully
falling back to no compression.

I think that compression will be most efficient for internal connections (i.e. replication, bulk data loading, ...)
which are mostly controlled by DBA.
Compression would have little effect on most queries, especially at default
level=1.
I mostly agree with you here, except that compression level 1 gives little effect.
All my experiments both bith page level compression and libpq compression shows that default compression level
provides optimal balance between compression ratio and compression speed.

In Daniil's benchmark the difference between compression ration 1 and 19 in zstd is only 30%.

Right now "compression" parameter accepts not only boolean values but also
list of suggested algorithms with optional compression level, like
"zstd:1,zlib"
You're talking about the client compression param.  I'm suggesting that the
server should allow fine-grained control of what compression algs are permitted
at *runtime*.  This would allow distributions to compile with all compression
libraries enabled at configure time, and still allow an DBA to disable one
without recompiling.

Frankly speaking I do not see much sense in it.
My point of view is the following: there are several well known and widely used compression algorithms now:
zlib, zstd, lz4. There are few others but results of various my benchmarks (mostly for page level compression)
shows that zstd provides best compression ratio/speed  balance and lz4 - the best speed.

zlib is available almost everywhere and postgres by default is configured with zlib.
So it can be considered as default compression algorithm available everywhere.
If Postgres was built with zstd or lz4 (not currently supported for libpq compression) then it should be used instead of zlib
because it is faster and provides better compression quality.

So I do not see any other arguments for enabling or disabling some particular compression algorithms by DBA or suggesting it by client.

That's possible with environment variable or connection string.

I'm proposing to simplify change of its default when recompiling, just like this one:
src/interfaces/libpq/fe-connect.c:#define DefaultHost           "localhost"

Think this would be a 2 line change, and makes the default less hardcoded.

Sorry, my intention was the following: there is list of supported algorithms in order of decreasing their efficiency
(yes, I realize that in general case it may be not always possible to provide partial order on the set of supported compression algorithms,
but right now with zstd and zlib it is trivial).
Client sends to the server its list of supported algorithms (or explicitly specified by user) and server choose most efficient and supported among them.
So there is no need to specify some default in this schema. We can change order of algorithms in the array to affect choice of default algorithm.

Your patch warns if *none* of the specified algorithms are supported, but I
wonder if it should warn if *any* of them are unsupported (like if someone
writes libz instead of zlib, or zst/libzstd instead of zstd, which I guess
about half of us will do).
Ugh... Handling mistyping in connection string seems to be not so good idea
(from my point of view).
And situation when some of algorithms is not supported by server seems to
normal (if new client connects to old server).
So I do not want to produce warning in this case.
I'm not talking about warning if an alg is "not supported by server", but
rather if it's "not supported by client".  I think there should be a warning if
a connection string specifies two libraries, but one is misspelled.
Makes sense. I  add this check.


I think it's good if it *can* be automatic.  But it's also good if the DBA can
implement a simple "policy" about what's (not) supported.  If the user requests
a compression and it's not known on the *client* side I think it should warn.

BTW I think the compression should be shown in psql \conninfo

Good point: done.



+                  if ((unsigned)index >= conn->n_compressors)
+                  {
+                          appendPQExpBuffer(&conn->errorMessage,
+                                                            libpq_gettext(
+                                                                    "server returns incorrect compression aslogirhm  index: %d\n"),

Now, you're checking that the index is not too large, but not that it's not too
small.

Sorry, it was my "fad" to write most optimal code which is completely not relevant in this place.
I specially use unsigned comparison to perform check using just one comparison instead of two.
I have added comment here.
Right now, when I try to connect to an unpatched server, I get:
psql: error: expected authentication request from server, but received v
Thank you for pointing it: fixed.
I tested that I can connect to unpatced server, thanks.

I have to confess that I don't know how this works, or what it has to do with
what the commit message claims it does ?

Are you saying I can use zlib in one direction and zstd in the other ?  How
would I do that ?

Sorry, this feature was suggested by Andres and partly implemented by Daniil.
I always think that it is useless and overkill feature.


New version of the patch with the suggested changes is attached.
Attachment

pgsql-hackers by date:

Previous
From: Andrew Dunstan
Date:
Subject: Re: Cirrus CI (Windows help wanted)
Next
From: vignesh C
Date:
Subject: Re: Added schema level support for publication.