Re: libpq compression - Mailing list pgsql-hackers

From Ian Zagorskikh
Subject Re: libpq compression
Date
Msg-id CAByvzpb4JX_F1BuOpHxc1XxBXmCeXBeaE-TOvLOE+=5Y7MGXeQ@mail.gmail.com
Whole thread Raw
In response to libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: libpq compression  (Ian Zagorskikh <izagorskikh@cloudlinux.com>)
Re: libpq compression  (Daniil Zakhlystov <usernamedt@yandex-team.ru>)
List pgsql-hackers
All, thanks!

On Wed, Apr 21, 2021 at 10:47 AM Michael Paquier <michael@paquier.xyz> wrote:
 
Patch reviews had better be posted on the community lists.  This way,
if the patch is left dead by the authors (things happen in life), then
somebody else could move on with the patch without having to worry
about this kind of issues twice.
--
Michael

Let me drop my humble 2 cents in this thread. At this moment I checked only 0001-Add-zlib-and-zstd-streaming-compression patch. With no order:

* No checks for failures. For example, value from malloc() is not checked for NULL and used as-is right after the call. Also as a result possibly NULL values passed into ZSTD functions which are explicitly not NULL-tolerant and so dereference pointers without checks.

* Used memory management does not follow the schema used in the common module. Direct calls to std C malloc/free are hard coded. AFAIU this is not backend friendly. Looking at the code around I believe they should be wrapped to ALLOC/FREE local macroses that depend on FRONTEND define. As it's done for example. in src/common/hmac.c.

* If we're going to fix our code to be palloc/pfree friendly there's also a way to pass our custom allocator callbacks inside ZSTD functions. By default ZSTD uses malloc/free but it can be overwritten by the caller in e.g. ZSTD_createDStream_advanced(ZSTD_customMem customMem) versions of API . IMHO that would be good. If a 3rd party component allows us to inject  a custom memory allocator and we do have it why not use this feature?

* Most zs_foo() functions do not check ptr arguments, though some like zs_free() do. If we're speaking about a "common API" that can be used by a wide range of different modules around the project, such a relaxed way to treat input arguments IMHO can be an evil. Or at least add Assert(ptr) assertions so they can be catched up in debug mode.

* For zs_create() function which must be called first to create context for further operations, a compression method is passed as integer. This method is used inside zs_create() as index inside an array of compress implementation descriptors. There are several problems:
1) Method ID value is not checked for validity. By passing an invalid method ID we can easily get out of array bounds.
2) Content of array depends on on configuration options e.g. HAVE_LIBZSTD, HAVE_LIBZ etc. So index inside this array is not persistent. For some config options combination method ID with value 0 may refer to ZSTD and for others to zlib.
3) There's no defines/enums/etc in public z_stream.h header that would define which value should be passed to create a specific compressor. Users have to guess it or check the code.

* I have a feeling after reading comments for ZSTD compress/decompress functions that their return value is treated in a wrong way handling only success path. Though need more checks/POCs on that.

In general, IMHO the idea of a generic compress/decompress API is very good and useful. But specific implementation needs some code cleanup/refactoring.

--
Best Regards,
Ian Zagorskikh

pgsql-hackers by date:

Previous
From: Fujii Masao
Date:
Subject: Re: Feature improvement: can we add queryId for pg_catalog.pg_stat_activity view?
Next
From: Stefan Keller
Date:
Subject: Re: ML-based indexing ("The Case for Learned Index Structures", a paper from Google)