Hello all,
I’ve finally read the whole thread (it was huge). It is extremely sad that this patch hang without progress for such a
longtime. It seems that the main problem in discussion is that everyone has its own view what problems should be solve
withthis patch. Here are some of positions (not all of them):
1. Add a compression for networks with a bad bandwidth (and make a patch as simple and maintainable as possible) -
author’sposition.
2. Don’t change current network protocol and related code much.
3. Refactor compression API (and network compression as well)
4. Solve cloud provider’s problems: on demand buy network bandwidth with CPU utilisation and vice versa.
All of these requirements have a different nature and sometimes conflict with each other. Without clearly formed
requirementsthis patch would never be released.
Anyway, I have rebased it to the current master branch, applied pgindent, tested on MacOS and fixed a MacOS specific
problemwith strcpy in build_compressors_list(): it has an undefined behaviour when source and destination strings
overlap.
- *client_compressors = src = dst = strdup(value);
+ *client_compressors = src = strdup(value);
+ dst = strdup(value);
According to my very simple tests with randomly generated data, zstd gives about 3x compression (zlib has a little
worsecompression ratio and a little bigger CPU utilisation). It seems to be a normal ratio for any streaming data -
Greenplumalso uses zstd/zlib to compress append optimised tables and compression ratio is usually about 3-5x. Also
accordingto my Greenplum experience, the most commonly used zstd ratio is 1, while for zlib it is usually in a range of
1-5.CPU and execution time were not affected much according to uncompressed data (but my tests were very simple and
theyshould not be treated as reliable).
Best regards,
Denis Smirnov | Developer
sd@arenadata.io
Arenadata | Godovikova 9-17, Moscow 129085 Russia