Re: multithreaded zstd backup compression for client and server - Mailing list pgsql-hackers

From Robert Haas
Subject Re: multithreaded zstd backup compression for client and server
Date
Msg-id CA+TgmoZms1KvLSXsAhXwKJ0FzY-mVe-B3=m-sePW6i-0k8vrTA@mail.gmail.com
Whole thread Raw
In response to Re: multithreaded zstd backup compression for client and server  (Andres Freund <andres@anarazel.de>)
Responses Re: multithreaded zstd backup compression for client and server  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote:
> The most likely source of problem would errors thrown while zstd threads are
> alive. Should make sure that that can't happen.
>
> What is the lifetime of the threads zstd spawns? Are they tied to a single
> compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
> is our code ensuring that we don't leak such contexts?

I haven't found any real documentation explaining how libzstd manages
its threads. I am assuming that it is tied to the ZSTD_CCtx, but I
don't know. I guess I could try to figure it out from the source code.
Anyway, what we have now is  a PG_TRY()/PG_CATCH() block around the
code that uses the basink which will cause bbsink_zstd_cleanup() to
get called in the event of an error. That will do ZSTD_freeCCtx().

It's probably also worth mentioning here that even if, contrary to
expectations, the compression threads hang around to the end of time
and chill, in practice nobody is likely to run BASE_BACKUP and then
keep the connection open for a long time afterward. So it probably
wouldn't really affect resource utilization in real-world scenarios
even if the threads never exited, as long as they didn't, you know,
busy-loop in the background. And I assume the actual library behavior
can't be nearly that bad. This is a pretty mainstream piece of
software.

> If they're short-lived, are we compressing large enough batches to not waste a
> lot of time starting/stopping threads?

Well, we're using a single ZSTD_CCtx for an entire base backup. Again,
I haven't found documentation explaining with libzstd is actually
doing, but it's hard to see how we could make the batch any bigger
than that. The context gets reset for each new tablespace, which may
or may not do anything to the compression threads.

> > but that's not to say that there couldn't be problems.  I worry a bit that
> > the mere presence of threads could in some way mess things up, but I don't
> > know what the mechanism for that would be, and I don't want to postpone
> > shipping useful features based on nebulous fears.
>
> One thing that'd be good to tests for is cancelling in-progress server-side
> compression.  And perhaps a few assertions that ensure that we don't escape
> with some threads still running. That'd have to be platform dependent, but I
> don't see a problem with that in this case.

More specific suggestions, please?

> > For both parallel and non-parallel zstd compression, I see differences
> > between the compressed size depending on where the compression is
> > done. I don't know whether this is an expected behavior of the zstd
> > library or a bug. Both files uncompress OK and pass pg_verifybackup,
> > but that doesn't mean we're not, for example, selecting different
> > compression levels where we shouldn't be. I'll try to figure out
> > what's going on here.
> >
> > zstd, client-side: 1.7GB, 17 seconds
> > zstd, server-side: 1.3GB, 25 seconds
> > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds
>
> What causes this fairly massive client-side/server-side size difference?

You seem not to have read what I wrote about this exact point in the
text which you quoted.

> Will this cause test failures on systems with older zstd?

I put a bunch of logic in the test case to try to avoid that, so
hopefully not, but if it does, we can adjust the logic.

-- 
Robert Haas
EDB: http://www.enterprisedb.com



pgsql-hackers by date:

Previous
From: Tomas Vondra
Date:
Subject: Re: logical decoding and replication of sequences
Next
From: Peter Eisentraut
Date:
Subject: Re: XID formatting and SLRU refactorings (was: Add 64-bit XIDs into PostgreSQL 15)