Re: libpq compression (part 2) - Mailing list pgsql-hackers
From | Justin Pryzby |
---|---|
Subject | Re: libpq compression (part 2) |
Date | |
Msg-id | 20220101172504.GW24477@telsasoft.com Whole thread Raw |
In response to | libpq compression (part 2) (Daniil Zakhlystov <usernamedt@yandex-team.ru>) |
Responses |
Re: libpq compression (part 2)
|
List | pgsql-hackers |
Thanks for working on this. The patch looks to be in good shape - I hope more people will help to review and test it. I took the liberty of creating a new CF entry. http://cfbot.cputube.org/daniil-zakhlystov.html +zpq_should_compress(ZpqStream * zpq, char msg_type, uint32 msg_len) +{ + return zpq_choose_compressor(zpq, msg_type, msg_len) == -1; I think this is backwards , and should say != -1 ? As written, the server GUC libpq_compression defaults to "on", and the client doesn't request compression. I think the server GUC should default to off. I failed to convince Kontantin about this last year. The reason is that 1) it's a new feature; 2) with security implications. An admin should need to "opt in" to this. I still wonder if this should be controlled by a new "TYPE" in pg_hba (rather than a GUC); that would make it exclusive of SSL. However, I also think that in the development patches both client and server should enable compression by default, to allow it to be exercized by cfbot. For the other compression patches I worked on, I had an 0001 commit where the feature default was off, plus following patches which enabled the feature by default - only for cfbot and not intended for commit. +/* GUC variable containing the allowed compression algorithms list (separated by comma) */ +char *libpq_compress_algorithms; => The global variable is conventionally initialized to the same default as guc.c (even though the GUC default value will be applied at startup). + * - NegotiateProtocolVersion in cases when server does not support protocol compression + * Anything else probably means it's not Postgres on the other end at all. */ - if (!(beresp == 'R' || beresp == 'E')) + if (!(beresp == 'R' || beresp == 'E' || beresp == 'z' || beresp == 'v')) => I think NegotiateProtocolVersion and 'v' are from an old version of the patch and no longer used ? There's a handful of compiler warnings: | z_stream.c:311:5: warning: ISO C90 forbids mixed declarations and code [-Wdeclaration-after-statement] + ereport(LOG, (errmsg("failed to parse configured compression setting: %s", libpq_compress_algorithms))); + return 0; => Since March, errmsg doesn't need extra parenthesis around it (e3a87b4). + * Report current transaction start timestamp as the specified value. + * Zero means there is no active transaction. => The comment doesn't correspond to the function (copy+paste). + return id >= 0 && id < (sizeof(zs_algorithms) / sizeof(*zs_algorithms)); + size_t n_algorithms = sizeof(zs_algorithms) / sizeof(*zs_algorithms); => You can use lengthof() for these. Maybe pg_stat_network_traffic should show the compression? It'd have to show the current client-server and server-client values. Or maybe that's not useful since it can change dynamically (unless it were reset when the compression method was changed). I think the psql conninfo Compressor/Decompressor line may be confusing. Maybe it should talk about Client->Server and Server->Client. Maybe it should be displayed on a single line. Actually, right now the \conninfo part is hardly useful, since the compressor is allocated lazily/"on demand", so it shows the compression of some previous command, but maybe not the last command, and maybe not the next command... I'm not sure it's needed to allow the compression to be changed dynamically, and the generality to support a different compression mthod for each libpq message type seems excessive. Maybe it's enough to check that the message type is one of VALID_LONG_MESSAGE_TYPE and its length is long enough. I wonder whether the asymmetric compression idea is useful. The only application I can see for this is that a server might allow to *decompress* data from a client, but may not want to *compress* data to a client. What happenes if the compression doesn't decrease the message size ? I don't see anything that allows sending the original, raw data. The advantage would be that the remote side doesn't incur the overhead of decompression. I hit this assertion with PGCOMPRESSION=lz4 (but not zlib), and haven't yet found the cause. TRAP: FailedAssertion("decBytes > 0", File: "z_stream.c", Line: 315, PID: 21180) postgres: pryzbyj postgres 127.0.0.1(46154) idle(ExceptionalCondition+0x99)[0x5642137806d9] postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x632bfe)[0x5642137d4bfe] postgres: pryzbyj postgres 127.0.0.1(46154) idle(zs_read+0x2b)[0x5642137d4ebb] postgres: pryzbyj postgres 127.0.0.1(46154) idle(zpq_read+0x192)[0x5642137d1172] postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x378b8a)[0x56421351ab8a] postgres: pryzbyj postgres 127.0.0.1(46154) idle(pq_getbyte+0x1f)[0x56421351c12f] postgres: pryzbyj postgres 127.0.0.1(46154) idle(PostgresMain+0xf96)[0x56421365fcc6] postgres: pryzbyj postgres 127.0.0.1(46154) idle(+0x428b69)[0x5642135cab69] postgres: pryzbyj postgres 127.0.0.1(46154) idle(PostmasterMain+0xd1c)[0x5642135cbbac] postgres: pryzbyj postgres 127.0.0.1(46154) idle(main+0x220)[0x5642132f9a40] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7fb70ef45b97] Maybe you should reset the streams between each compression message (even if it's using the same compression algorithm). This might allow better compression. You could either unconditionally call zs_compressor_free()/ zs_create_compressor(). Or, add a new, optional interface for resetting the stream (and if that isn't set for a given compression method, then free+create the stream). For LZ4, that'd be LZ4_resetStream_fast(), but only for v1.9.0+... Some minor formatting issues: - There are spaces rather than tabs in a few files; you can maybe fix it by piping the file through unexpand -t4. - pointers in function declarations should have no space after the stars: +zs_buffered(ZStream * zs) - There's a few extraneous whitespace changes: ConfigureNamesEnum, pq_buffer_has_data, libpq-int.h. - Some lines are too long: git log -2 -U1 '*.[ch]' |grep -E '.{80}|^diff' |less - Some 1-line "if" statements would be easier to read without braces {}. - Some multi-line comments should have "*/" on a separate line: git log -3 -p |grep -E '^\+ \*.*\*\/' git log -3 -p '*.c' |grep -E '^\+[^/]*\*.*\*\/' -- Justin
Attachment
- 0001-Implement-libpq-compression.patch
- 0002-Add-ZSTD-support.patch
- 0003-Avoid-buffer-overflow-found-by-valgrind.patch
- 0004-fix-compiler-warnings.patch
- 0005-respect-LZ4-compression-level.patch
- 0006-zpq_choose_compressor.patch
- 0007-conninfo-on-one-line.patch
- 0008-Enable-zlib-compression-by-default.patch
pgsql-hackers by date: