Re: wal_compression=zstd - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: wal_compression=zstd
Date
Msg-id YiM63/0LybPYqSUN@paquier.xyz
Whole thread Raw
In response to Re: wal_compression=zstd  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: wal_compression=zstd  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Fri, Mar 04, 2022 at 08:08:03AM -0500, Robert Haas wrote:
> On Fri, Mar 4, 2022 at 6:44 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
>> In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
>> cost.

Hmm, it may be good to start afresh and compile numbers in a single
chart.  I did that here with some numbers on the user and system CPU:
https://www.postgresql.org/message-id/YMmlvyVyAFlxZ+/H@paquier.xyz

For this test, regarding ZSTD, the lowest level did not have much
difference with the default level, and at the highest level the user
CPU spiked for little gain in compression.  All of them compressed
more than LZ4, with more CPU used in each case, but the default or a
level value lower than the default gives me the impression that it
won't matter much in terms of compression gains and CPU usage.

> I agree with Michael. Your 1-off test is exactly that, and the results
> will have depended on the data you used for the test. I'm not saying
> we could never decide to default to a compression level other than the
> library's default, but I do not think we should do it casually or as
> the result of any small number of tests. There should be a strong
> presumption that the authors of the library have a good idea what is
> sensible in general unless we can explain compellingly why our use
> case is different from typical ones.
>
> There's an ease-of-use concern here too. It's not going to make things
> any easier for users to grok if zstd is available in different parts
> of the system but has different defaults in each place. It wouldn't be
> the end of the world if that happened, but neither would it be ideal.

I'd like to believe that anybody who writes his/her own compression
algorithm have a good idea of the default behavior they want to show,
so we could remain simple, and believe in them.  Now, I would not
object to see some fresh numbers, and assuming that all FPIs have the
same page size, we could go down to designing a couple of test cases
that produce a fixed number of FPIs and measure the compressability in
a single session.

Repeatability and randomness of data counts, we could have for example
one case with a set of 5~7 int attributes, a second with text values
that include random data, up to 10~12 bytes each to count on the tuple
header to be able to compress some data, and a third with more
repeatable data, like one attribute with an int column populate
with generate_series().  Just to give an idea.
--
Michael

Attachment

pgsql-hackers by date:

Previous
From: Julien Rouhaud
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references
Next
From: Michael Paquier
Date:
Subject: Re: pl/pgsql feature request: shorthand for argument and local variable references