Re: libpq compression - Mailing list pgsql-hackers

From Daniil Zakhlystov
Subject Re: libpq compression
Date
Msg-id 160623927937.7563.15974279758478581796.pgcf@coridan.postgresql.org
Whole thread Raw
In response to Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: libpq compression  (Robert Haas <robertmhaas@gmail.com>)
Re: libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
List pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, failed
Implements feature:       tested, passed
Spec compliant:           tested, failed
Documentation:            tested, failed

Submission review
--
Is the patch in a patch format which has context? (eg: context diff format)
NO, need to fix
Does it apply cleanly to the current git master?
YES
Does it include reasonable tests, necessary doc patches, etc?
have docs, missing tests

Usability review
--
At the moment, the patch supports per-connection (permanent) compression. The frontend can specify the desired
compressionalgorithms and compression levels and then negotiate the compression algorithm that is going to be used with
thebackend. In current state patch is missing the ability to enable/disable the compression on the backend side, I
thinkit might be not great from the usability side.
 

Regarding on-the-fly configurable compression and different compression algorithms for each direction - these two ideas
arepromising but tend to make the implementation more complex. However, the current implementation can be extended to
supportthese approaches in the future. For example, we can specify switchable on-the-fly compression as ‘switchable’
algorithmand negotiate it like the regular compression algorithm (like we currently negotiate ‘zstd’ and ‘zlib’).
‘switchable’algorithm may then introduce new specific messages to Postgres protocol to make the on-the-fly compression
magicwork.
 

The same applies to Robert’s idea of the different compression algorithms for different directions - we can introduce
itlater as a new compression algorithm with new specific protocol messages.
 

Does the patch actually implement that?
YES

Do we want that?
YES

Do we already have it?
NO

Does it follow SQL spec, or the community-agreed behavior?
To be discussed

Does it include pg_dump support (if applicable)?
not applicable

Are there dangers?
theoretically possible CRIME-like attack when using with SSL enabled

Have all the bases been covered?
To be discussed


Feature test
--

I’ve applied the patch, compiled, and tested it with configure options --enable-cassert and --enable-debug turned on.
I’vetested the following scenarios:
 

1. make check
=======================
 All 201 tests passed. 
=======================

2. make check-world
initially failed with:
============== running regression test queries        ==============
test postgres_fdw                 ... FAILED     4465 ms
============== shutting down postmaster               ==============
======================
1 of 1 tests failed. 
======================
The differences that caused some tests to fail can be viewed in the
file "/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.diffs".  A copy of the test summary that you see above
issaved in the file "/xxx/xxx/review/postgresql/contrib/postgres_fdw/regression.out".
 

All tests passed after replacing ‘gsslib, target_session_attrs’ with ‘gsslib, compression, target_session_attrs’ in
line8914 of postgresql/contrib/postgres_fdw/expected/postgres_fdw.out
 

3. simple psql utility usage
psql -d "host=xxx port=5432 dbname=xxx user=xxx compression=1" 

4. pgbench tpcb-like w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" --builtin tpcb-like -t 70 --jobs=32
--client=700

5. pgbench tpcb-like w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" --builtin tpcb-like -t 70 --jobs=32
--client=700

6. pgbench initialization w/ SSL turned ON
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=require compression=1" -i -s 500

7. pgbench initialization w/ SSL turned OFF
pgbench "host=xxx port=5432 dbname=xxx user=xxx sslmode=disable compression=1" -i -s 500

8. Streaming physical replication. Recovery-related parameters
recovery_target_timeline = 'latest'
primary_conninfo = 'host=xxx port=5432 user=repl application_name=xxx compression=1'
primary_slot_name = 'xxx'
restore_command = 'some command'

9. This compression has been implemented in an experimental build of odyssey connection pooler and tested with ~1500
syntheticsimultaneous clients configuration and ~300 GB databases.
 
During the testing, I’ve reported and fixed some of the issues.

Does the feature work as advertised?
YES
Are there corner cases the author has failed to consider?
NO
Are there any assertion failures or crashes?
NO


Performance review
--

Does the patch slow down simple tests?
NO

If it claims to improve performance, does it?
YES

Does it slow down other things?
Using compression may add a CPU overhead. This mostly depends on compression algorithm and chosen compression level.
Duringtesting with ZSTD algorithm and compression level 1 there was about 10% of CPU overhead in read/write balanced
scenariosand almost no overhead in mostly read scenarios.
 


Coding review
--

In protocol.sgml:
>        It can be just boolean values enabling or disabling compression
>         ("true"/"false", "on"/"off", "yes"/"no", "1"/"0"), "auto" or explicit list of compression algorithms
>         separated by comma with optional specification of compression level: "zlib,zstd:5".

But in fe-protocol3.c:
>    if (pg_strcasecmp(value, "true") == 0 ||
>        pg_strcasecmp(value, "yes") == 0 ||
>        pg_strcasecmp(value, "on") == 0 ||
>        pg_strcasecmp(value, "any") == 0 ||
>        pg_strcasecmp(value, "1") == 0)
>    {
I believe there is some mismatch - in docs, there is an “auto” parameter, but in code “auto” is missing, but “any”
exists.Actually, I propose to remove both “auto” and “any” parameters because they work the same way as “true/on/yes/1”
butappear like something else.
 


In fe-protocol3.c:
>#define pq_read_conn(conn)    \
>    (conn->zstream        \
>     ? zpq_read(conn->zstream, conn->inBuffer + conn->inEnd,            \
>                conn->inBufSize - conn->inEnd)                \
>     : pqsecure_read(conn, conn->inBuffer + conn->inEnd,                \
>                     conn->inBufSize - conn->inEnd))
I think there should be some comment regarding the read function choosing logic. Same for zpq_write calls. Also,
pq_read_connis defined as a macros, but there is no macros for pq_write_conn.
 

In configure.ac:
> if test "$with_zstd" = yes; then
>   AC_CHECK_LIB(zstd, ZSTD_decompressStream, [],
>                [AC_MSG_ERROR([zstd library not found
> If you have zstd already installed, see config.log for details on the
> failure.  It is possible the compiler isn't looking in the proper directory.
> Use --without-zstd to disable zstd support.])])
> fi

> if test "$with_zstd" = yes; then
>   AC_CHECK_HEADER(zstd.h, [], [AC_MSG_ERROR([zstd header not found
> If you have zstd already installed, see config.log for details on the
> failure.  It is possible the compiler isn't looking in the proper directory.
> Use --without-zstd to disable zstd support.])])
> fi

Looks like the rows with --without-zstd are incorrect.

In fe-connect.c:
> if (index == (char)-1)
> {
>     appendPQExpBuffer(&conn->errorMessage,
>     libpq_gettext(
>     "server is not supported requested compression algorithms %s\n"),
>     conn->compression);
>     goto error_return;
> }
Right now this error might be displayed in two cases:
Backend support compression, but it is somehow disabled/turned off
Backend support compression, but does not support requested algorithms

I think that it is a good idea to differentiate these two cases. Maybe define the following behavior somewhere in
docs:

“When connecting to an older backend, which does not support compression, or in the case when the backend support
compressionbut for some reason wants to disable it, the backend will just ignore the _pq_.compression parameter and
won’tsend the compressionAck message to the frontend.”
 


To sum up, I think that the current implementation already introduces good benefits. As I proposed in the Usability
review,we may introduce the new approaches later as separate compression 'algorithms'.
 

Thanks,
Daniil Zakhlystov

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: Strange behavior with polygon and NaN
Next
From: David Steele
Date:
Subject: Re: Online verification of checksums