Sounds like there’s something else which is the bottleneck once you have process-max at 30. I suspect you could reduce that process-max value and have around the same time still with zstd. Ultimately if you want it to be faster then you’ll need to figure out what the bottleneck is (seemingly not CPU, unlikely to be memory, so that leaves network or storage) and address that.
We’ve seen numbers approaching 10TB/hr with lots of processes and zstd and fast storage on high end physical hardware.