Re: libpq compression - Mailing list pgsql-hackers

From Daniil Zakhlystov
Subject Re: libpq compression
Date
Msg-id 161609580905.28624.5304095609680400810.pgcf@coridan.postgresql.org
Whole thread Raw
In response to libpq compression  (Konstantin Knizhnik <k.knizhnik@postgrespro.ru>)
Responses Re: libpq compression  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
The following review has been posted through the commitfest application:
make installcheck-world:  tested, passed
Implements feature:       tested, passed
Spec compliant:           tested, passed
Documentation:            tested, passed

Hi,

I've compared the different libpq compression approaches in the streaming physical replication scenario.

Test setup
Three hosts: first is used for pg_restore run, second is master, third is the standby replica.
In each test run, I've run the pg_restore of the IMDB database
(https://dataverse.harvard.edu/dataset.xhtml?persistentId=doi:10.7910/DVN/2QYZBT)
 
and measured the received traffic on the standby replica.

Also, I've enlarged the ZPQ_BUFFER_SIZE buffer in all versions because too small buffer size (8192 bytes) lead to more

system calls to socket read/write and poor compression in the chunked-reset scenario.

Scenarios:

chunked
use streaming compression, wrap compressed data into CompressedData messages and preserve the compression context
betweenmultiple CompressedData messages.
 
https://github.com/usernamedt/libpq_compression/tree/chunked-compression

chunked-reset
use streaming compression, wrap compressed data into CompressedData messages and reset the compression context on each
CompressedDatamessage.
 
https://github.com/usernamedt/libpq_compression/tree/chunked-reset

permanent
use streaming compression, send raw compressed stream without any wrapping
https://github.com/usernamedt/libpq_compression/tree/permanent-w-enlarged-buffer

Tested compression levels
ZSTD, level 1
ZSTD, level 5
ZSTD, level 9

Scenario            Replica rx, mean, MB
uncompressed    6683.6


ZSTD, level 1
Scenario            Replica rx, mean, MB
chunked-reset        2726
chunked            2694
permanent        2694.3

ZSTD, level 5
Scenario            Replica rx, mean, MB
chunked-reset        2234.3
chunked            2123
permanent        2115.3

ZSTD, level 9
Scenario            Replica rx, mean, MB
chunked-reset        2153.6
chunked            1943
permanent        1941.6

Full report with additional data and resource usage graphs is available here
https://docs.google.com/document/d/1a5bj0jhtFMWRKQqwu9ag1PgDF5fLo7Ayrw3Uh53VEbs

Based on these results, I suggest sticking with chunked compression approach
which introduces more flexibility and contains almost no overhead compared to permanent compression.
Also, later we may introduce some setting to control should we reset the compression context in each message without
breaking the backward compatibility.

--
Daniil Zakhlystov

The new status of this patch is: Ready for Committer

pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: GROUP BY DISTINCT
Next
From: Robert Haas
Date:
Subject: Re: [HACKERS] Custom compression methods