Re: multithreaded zstd backup compression for client and server - Mailing list pgsql-hackers

From Andres Freund
Subject Re: multithreaded zstd backup compression for client and server
Date
Msg-id 20220323211428.frazv4typmnm2t3x@alap3.anarazel.de
Whole thread Raw
In response to multithreaded zstd backup compression for client and server  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: multithreaded zstd backup compression for client and server  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Hi,

On 2022-03-23 16:34:04 -0400, Robert Haas wrote:
> Therefore, I think it might be safe to just ...  turn this on. One reason I
> think that is that this whole approach was recommended to me by Andres ...

I didn't do a super careful analysis of the issues... But I do think it's
pretty much the one case where it "should" be safe.

The most likely source of problem would errors thrown while zstd threads are
alive. Should make sure that that can't happen.


What is the lifetime of the threads zstd spawns? Are they tied to a single
compression call? A single ZSTD_createCCtx()? If the latter, how bulletproof
is our code ensuring that we don't leak such contexts?

If they're short-lived, are we compressing large enough batches to not waste a
lot of time starting/stopping threads?


> but that's not to say that there couldn't be problems.  I worry a bit that
> the mere presence of threads could in some way mess things up, but I don't
> know what the mechanism for that would be, and I don't want to postpone
> shipping useful features based on nebulous fears.

One thing that'd be good to tests for is cancelling in-progress server-side
compression.  And perhaps a few assertions that ensure that we don't escape
with some threads still running. That'd have to be platform dependent, but I
don't see a problem with that in this case.



> For both parallel and non-parallel zstd compression, I see differences
> between the compressed size depending on where the compression is
> done. I don't know whether this is an expected behavior of the zstd
> library or a bug. Both files uncompress OK and pass pg_verifybackup,
> but that doesn't mean we're not, for example, selecting different
> compression levels where we shouldn't be. I'll try to figure out
> what's going on here.
>
> zstd, client-side: 1.7GB, 17 seconds
> zstd, server-side: 1.3GB, 25 seconds
> parallel zstd, 4 workers, client-side: 1.7GB, 7.5 seconds
> parallel zstd, 4 workers, server-side: 1.3GB, 7.2 seconds

What causes this fairly massive client-side/server-side size difference?



> +    /*
> +     * We check for failure here because (1) older versions of the library
> +     * do not support ZSTD_c_nbWorkers and (2) the library might want to
> +     * reject unreasonable values (though in practice it does not seem to do
> +     * so).
> +     */
> +    ret = ZSTD_CCtx_setParameter(streamer->cctx, ZSTD_c_nbWorkers,
> +                                 compress->workers);
> +    if (ZSTD_isError(ret))
> +    {
> +        pg_log_error("could not set compression worker count to %d: %s",
> +                     compress->workers, ZSTD_getErrorName(ret));
> +        exit(1);
> +    }

Will this cause test failures on systems with older zstd?


Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Pavel Borisov
Date:
Subject: Re: [PATCH] Remove workarounds to format [u]int64's
Next
From: "Euler Taveira"
Date:
Subject: Re: logical replication restrictions