Re: multithreaded zstd backup compression for client and server - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: multithreaded zstd backup compression for client and server |
Date | |
Msg-id | CA+TgmoZms1KvLSXsAhXwKJ0FzY-mVe-B3=m-sePW6i-0k8vrTA@mail.gmail.com Whole thread Raw |
In response to | Re: multithreaded zstd backup compression for client and server (Andres Freund <andres@anarazel.de>) |
Responses |
Re: multithreaded zstd backup compression for client and server
|
List | pgsql-hackers |
On Wed, Mar 23, 2022 at 5:14 PM Andres Freund <andres@anarazel.de> wrote: > The most likely source of problem would errors thrown while zstd threads are > alive. Should make sure that that can't happen. > > What is the lifetime of the threads zstd spawns? Are they tied to a single > compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof > is our code ensuring that we don't leak such contexts? I haven't found any real documentation explaining how libzstd manages its threads. I am assuming that it is tied to the ZSTD_CCtx, but I don't know. I guess I could try to figure it out from the source code. Anyway, what we have now is a PG_TRY()/PG_CATCH() block around the code that uses the basink which will cause bbsink_zstd_cleanup() to get called in the event of an error. That will do ZSTD_freeCCtx(). It's probably also worth mentioning here that even if, contrary to expectations, the compression threads hang around to the end of time and chill, in practice nobody is likely to run BASE_BACKUP and then keep the connection open for a long time afterward. So it probably wouldn't really affect resource utilization in real-world scenarios even if the threads never exited, as long as they didn't, you know, busy-loop in the background. And I assume the actual library behavior can't be nearly that bad. This is a pretty mainstream piece of software. > If they're short-lived, are we compressing large enough batches to not waste a > lot of time starting/stopping threads? Well, we're using a single ZSTD_CCtx for an entire base backup. Again, I haven't found documentation explaining with libzstd is actually doing, but it's hard to see how we could make the batch any bigger than that. The context gets reset for each new tablespace, which may or may not do anything to the compression threads. > > but that's not to say that there couldn't be problems. I worry a bit that > > the mere presence of threads could in some way mess things up, but I don't > > know what the mechanism for that would be, and I don't want to postpone > > shipping useful features based on nebulous fears. > > One thing that'd be good to tests for is cancelling in-progress server-side > compression. And perhaps a few assertions that ensure that we don't escape > with some threads still running. That'd have to be platform dependent, but I > don't see a problem with that in this case. More specific suggestions, please? > > For both parallel and non-parallel zstd compression, I see differences > > between the compressed size depending on where the compression is > > done. I don't know whether this is an expected behavior of the zstd > > library or a bug. Both files uncompress OK and pass pg_verifybackup, > > but that doesn't mean we're not, for example, selecting different > > compression levels where we shouldn't be. I'll try to figure out > > what's going on here. > > > > zstd, client-side: 1.7GB, 17 seconds > > zstd, server-side: 1.3GB, 25 seconds > > parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds > > parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds > > What causes this fairly massive client-side/server-side size difference? You seem not to have read what I wrote about this exact point in the text which you quoted. > Will this cause test failures on systems with older zstd? I put a bunch of logic in the test case to try to avoid that, so hopefully not, but if it does, we can adjust the logic. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: