Re: multithreaded zstd backup compression for client and server - Mailing list pgsql-hackers

From Robert Haas
Subject Re: multithreaded zstd backup compression for client and server
Date
Msg-id CA+TgmoZCd0RaFAQm-uB7mEbhdbM4B-gzYYiUjg1751U9fgRwQA@mail.gmail.com
Whole thread Raw
In response to Re: multithreaded zstd backup compression for client and server  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Mar 23, 2022 at 7:31 PM Andres Freund <andres@anarazel.de> wrote:
> I found this the following section in the manual [1]:
>
>     ZSTD_c_nbWorkers=400,    /* Select how many threads will be spawned to compress in parallel.
>                               * When nbWorkers >= 1, triggers asynchronous mode when invoking ZSTD_compressStream*()
:
>                               * ZSTD_compressStream*() consumes input and flush output if possible, but immediately
givesback control to caller,
 
>                               * while compression is performed in parallel, within worker thread(s).
>                               * (note : a strong exception to this rule is when first invocation of
ZSTD_compressStream2()sets ZSTD_e_end :
 
>                               *  in which case, ZSTD_compressStream2() delegates to ZSTD_compress2(), which is always
ablocking call).
 
>                               * More workers improve speed, but also increase memory usage.
>                               * Default value is `0`, aka "single-threaded mode" : no worker is spawned,
>                               * compression is performed inside Caller's thread, and all invocations are blocking */
>
> "ZSTD_compressStream*() consumes input ... immediately gives back control"
> pretty much confirms that.

I saw that too, but I didn't consider it conclusive. It would be nice
if their documentation had a bit more detail on what's really
happening.

> Do we care about zstd's memory usage here? I think it's OK to mostly ignore
> work_mem/maintenance_work_mem here, but I could also see limiting concurrency
> so that estimated memory usage would fit into work_mem/maintenance_work_mem.

I think it's possible that we want to do nothing and possible that we
want to do something, but I think it's very unlikely that the thing we
want to do is related to maintenance_work_mem. Say we soft-cap the
compression level to the one which we think will fit within
maintanence_work_mem. I think the most likely outcome is that people
will not get the compression level they request and be confused about
why that has happened. It also seems possible that we'll be wrong
about how much memory will be used - say, because somebody changes the
library behavior in a new release - and will limit it to the wrong
level. If we're going to do anything here, I think it should be to
limit based on the compression level itself and not based how much
memory we think that level will use.

But that leaves the question of whether we should even try to impose
some kind of limit, and there I'm not sure. It feels like it might be
overengineered, because we're only talking about users who have
replication privileges, and if those accounts are subverted there are
big problems anyway. I think if we imposed a governance system here it
would get very little use. On the other hand, I think that the higher
zstd compression levels of 20+ can actually use a ton of memory, so we
might want to limit access to those somehow. Apparently on the command
line you have to say --ultra -- not sure if there's a corresponding
API call or if that's a guard that's built specifically into the CLI.

> Perhaps it's related to the amounts of memory fed to ZSTD_compressStream2() in
> one invocation? I recall that there's some differences between basebackup
> client / serverside around buffer sizes - but that's before all the recent-ish
> changes...

That thought occurred to me too but I haven't investigated yet.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Japin Li
Date:
Subject: Re: turn fastgetattr and heap_getattr to inline functions
Next
From: Tom Lane
Date:
Subject: Re: Documenting when to retry on serialization failure