Re: libpq compression - Mailing list pgsql-hackers

From Daniil Zakhlystov
Subject Re: libpq compression
Date
Msg-id 66AB9CC3-4EDA-4443-8F82-F96D8919F942@yandex-team.ru
Whole thread Raw
In response to Re: libpq compression  (vignesh C <vignesh21@gmail.com>)
Responses Re: libpq compression  (Daniil Zakhlystov <usernamedt@yandex-team.ru>)
List pgsql-hackers
Hi!

I made some noticeable changes to the patch and fixed the previously mentioned issues.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Previously, I suggested that the server should have a "policy" GUC defining
> which compression methods are allowed.  Possibly including compression "level".
> For example, the default might be to allow zstd, but only up to level 9.

Now libpq_compression GUC server setting controls the available client-server traffic compression methods.
It allows specifying the exact list of the allowed compression algorithms.

For example, to allow only and zlib, set the setting to "zstd,zlib". Also, maximal allowed compression level can be
specifiedfor each  
method, e.g. "zstd:1,zlib:2" setting will set the maximal compression level for zstd to 1 and zlib to 2.
If a client requests the compression with a higher compression level, it will be set to the maximal allowed one.
The default (and recommended) maximal compression level for each algorithm is 1.

On Fri, Mar 19, 2021 at 16:28 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> There's commit messages claiming that the compression can be "asymmetric"
> between client-server vs server-client, but the commit seems to be unrelated,
> and the protocol documentation doesn't describe how this works.

On Dec 10, 2020, at 1:39 AM, Robert Haas <robertmhaas@gmail.com> wrote:
> Another idea is that you could have a new message type that says "hey,
> the payload of this is 1 or more compressed messages." It uses the
> most-recently set compression method. This would make switching
> compression methods easier since the SetCompressionMethod message
> itself could always be sent uncompressed and/or not take effect until
> the next compressed message


I rewrote the compression initialization logic. Now it is asymmetric and the compression method is changeable on the
fly.

Now compression initialization consists only of two phases:
1. Client sends startup packet with _pq_.compression containing the list of compression algorithms specified by the
clientwith an optional 
specification of compression level (e.g. “zstd:2,zlib:1”)
2. The server intersects the requested compression algorithms with the allowed ones (controlled via the
libpq_compressionserver config setting). 
If the intersection is not empty, the server responds with CompressionAck containing the final list of the compression
algorithmsthat can be used for the compression of libpq messages between the client and server. 
If the intersection is empty (server does not accept any of the requested algorithms), then it replies with
CompressionAckcontaining the empty list. 

After sending the CompressionAck message, the server can send the SetCompressionMethod message to set the current
compressionalgorithm for server-to-client traffic compression. 
After receiving the CompressionAck message, the client can send the SetCompressionMethod message to set the current
compressionalgorithm for client-to-server traffic compression. 

SetCompressionMethod message contains the index of the compression algorithm in the final list of the compression
algorithmswhich is generated during compression initialization (described above). 

Compressed data is wrapped into CompressedData messages.

Rules for compressing the protocol messages are defined in zpq_stream.c. For each protocol message, the preferred
compressionalgorithm can be chosen. 

On Wed, Apr 21, 2021 at 15:35 AM Ian Zagorskikh <izagorskikh@cloudlinux.com> wrote:
> Let me drop my humble 2 cents in this thread. At this moment I checked only
0001-Add-zlib-and-zstd-streaming-compressionpatch. With no order: 
>
> * No checks for failures. For example, value from malloc() is not checked for NULL and used as-is right after the
call.Also as a result possibly NULL values passed into ZSTD functions which are explicitly not NULL-tolerant and so
dereferencepointers without checks. 
>
> * Used memory management does not follow the schema used in the common module. Direct calls to std C malloc/free are
hardcoded. AFAIU this is not backend friendly. Looking at the code around I believe they should be wrapped to
ALLOC/FREElocal macroses that depend on FRONTEND define. As it's done for example. in src/common/hmac.c. 
>
> * If we're going to fix our code to be palloc/pfree friendly there's also a way to pass our custom allocator
callbacksinside ZSTD functions. By default ZSTD uses malloc/free but it can be overwritten by the caller in e.g.
ZSTD_createDStream_advanced(ZSTD_customMemcustomMem) versions of API . IMHO that would be good. If a 3rd party
componentallows us to inject  a custom memory allocator and we do have it why not use this feature? 
>
> * Most zs_foo() functions do not check ptr arguments, though some like zs_free() do. If we're speaking about a
"commonAPI" that can be used by a wide range of different modules around the project, such a relaxed way to treat input
argumentsIMHO can be an evil. Or at least add Assert(ptr) assertions so they can be catched up in debug mode. 
>
> * For zs_create() function which must be called first to create context for further operations, a compression method
ispassed as integer. This method is used inside zs_create() as index inside an array of compress implementation
descriptors.There are several problems: 
> 1) Method ID value is not checked for validity. By passing an invalid method ID we can easily get out of array
bounds.
> 2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD, HAVE_LIBZ etc. So index inside this array
isnot persistent. For some config options combination method ID with value 0 may refer to ZSTD and for others to zlib. 
> 3) There's no defines/enums/etc in public z_stream.h header that would define which value should be passed to create
aspecific compressor. Users have to guess it or check the code. 

Fixed almost all of these issues, except the malloc/free-related stuff (will fix this later).

On Fri, Mar 19, 2021 at 11:02 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Also, I'm not sure, but I think you may find that the zstd configure.ac should
> use pkgconfig.  This allowed the CIs to compile these patches.  Without
> pkg-config, the macos CI couldn't find (at least) LZ4.k
> https://commitfest.postgresql.org/32/2813/
> https://commitfest.postgresql.org/32/3015/


Now --with-zstd uses pkg-config to link the ZSTD library and works correctly on macos.

I would appreciate hearing your thoughts on the new version of the patch,

Daniil Zakhlystov




pgsql-hackers by date:

Previous
From: "Daniel Verite"
Date:
Subject: Re: [WIP] UNNEST(REFCURSOR): allowing SELECT to consume data from a REFCURSOR
Next
From: Bruce Momjian
Date:
Subject: Re: pg_upgrade does not upgrade pg_stat_statements properly