Re: wal_compression=zstd - Mailing list pgsql-hackers

From Justin Pryzby
Subject Re: wal_compression=zstd
Date
Msg-id 20220304114405.GK27651@telsasoft.com
Whole thread Raw
In response to Re: wal_compression=zstd  (Michael Paquier <michael@paquier.xyz>)
Responses Re: wal_compression=zstd  (Robert Haas <robertmhaas@gmail.com>)
Re: wal_compression=zstd  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Fri, Mar 04, 2022 at 04:19:32PM +0900, Michael Paquier wrote:
> On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
> 
> > As writen, this patch uses zstd level=1 (whereas the ZSTD's default compress
> > level is 6).
> 
> Why?  ZSTD using this default has its reasons, no?  And it would be
> consistent to do the same for ZSTD as for the other two methods.

In my 1-off test, it gets 610/633 = 96% of the benefit at 209/273 = 77% of the
cost.

postgres=# SET wal_compression= 'zstd-6';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Duración: 2729,017 ms (00:02,729)
wal_bytes        | 6102403

postgres=# SET wal_compression= 'zstd-1';
postgres=# \set QUIET \\ \timing on \\ SET max_parallel_maintenance_workers=0; SELECT pg_stat_reset_shared('wal');
begin;CREATE INDEX ON t(a); rollback; SELECT * FROM pg_stat_wal;
 
Duración: 2090,459 ms (00:02,090)
wal_bytes        | 6330269

It may be relevant that we're only compressing 8k [0].
It would probably pay off to preserve a compression "dictionary" to be re-used
across FPIs.

> -        The supported methods are <literal>pglz</literal> and
> -        <literal>lz4</literal> (if <productname>PostgreSQL</productname> was
> -        compiled with <option>--with-lz4</option>). The default value is
> -        <literal>off</literal>. Only superusers can change this setting.
> +        The supported methods are <literal>pglz</literal>, and
> +        (if configured when <productname>PostgreSQL</productname>was built)
> +        <literal>lz4</literal> and zstd.
> +        The default value is <literal>off</literal>.
> +        Only superusers can change this setting.
> 
> This is making the docs less verbose.  I think that you should keep
> the mention to respectively --with-lz4 and --with-zstd after each
> value.

I changed this relative to your latest zstd patch in July since it reads better
to me.  YMMV.

On Tue, Feb 22, 2022 at 05:19:48PM -0600, Justin Pryzby wrote:
>> 0001 adds support for zstd
>> 0002 makes more efficient our use of bits in the WAL header
>> 0003 makes it the default compression, to exercise on CI (windows will fail).
>> 0004 adds support for zstd-level
>> 0005-6 are needed to allow recovery tests to pass when wal compression is enabled;
>> 0007 (not included) also adds support for zlib.  I'm of the impression nobody
>>      cares about this, otherwise it would've been included 5-10 years ago.

> 0003, defaulting to zstd, would require a lot of benchmarking to be
> justified.  

Maybe you didn't see that I wrote that it was for CI ?
(In any case, it's impossible to do that without first taking care of 005-6).

> and 0004 to extend compression to support a level 
> I have argued against making the code more complicated for such things in the
> past, with reloptions.

I suppose that you're referring to reloptions for toast compression, which was
about controlling LZ4 compression level.

https://www.postgresql.org/message-id/flat/CAFiTN-vMHU_yakMgNo90SUih_6LtvmqZ5hpQb2iTgQtVyOjyFA@mail.gmail.com

I think you're right that it's not interesting to control the compression level
of LZ4 - but there's no reason to say the same thing of zstd.  We're already
debating which is the "right" level to use (1 or 6).  I don't think there is a
"right" level - some people will want to trade more CPU for disk writes and
some people won't.



pgsql-hackers by date:

Previous
From: Masahiko Sawada
Date:
Subject: Re: Add the replication origin name and commit-LSN to logical replication worker errcontext
Next
From: Ashutosh Sharma
Date:
Subject: Re: Report checkpoint progress with pg_stat_progress_checkpoint (was: Report checkpoint progress in server logs)