Re: design for parallel backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: design for parallel backup
Date
Msg-id CA+Tgmobc9MqRvwOOZcd9cxX8fNuMN8eKDMmywsuyLeg8ri+Vjg@mail.gmail.com
Whole thread Raw
In response to Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
Responses Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Wed, Apr 22, 2020 at 2:06 PM Andres Freund <andres@anarazel.de> wrote:
> I also can see a case for using N backends and one connection, but I
> think that'll be too complicated / too much bound by lcoking around the
> socket etc.

Agreed.

> Oh? I find it *extremely* exciting here. This is pretty close to the
> worst case compressability-wise, and zstd takes only ~22% of the time as
> gzip does, while still delivering better compression.  A nearly 5x
> improvement in compression times seems pretty exciting to me.
>
> Or do you mean for zstd over lz4, rather than anything over gzip?  1.8x
> -> 2.3x is a pretty decent improvement still, no? And being able to do
> do it in 1/3 of the wall time seems pretty helpful.

I meant the latter thing, not the former. I'm taking it as given that
we don't want gzip as the only option. Yes, 1.8x -> 2.3x is decent,
but not as earth-shattering as 8.8x -> ~24x.

In any case, I lean towards adding both lz4 and zstd as options, so I
guess we're not really disagreeing here

> > Parallel zstd still compresses somewhat better than single-core lz4,
> > but the difference in compression ratio is far less, and the amount of
> > CPU you have to burn in order to get that extra compression is pretty
> > large.
>
> It's "just" a ~2x difference for "level 1" compression, right? For
> having 1.9GiB less to write / permanently store of a 16GiB base
> backup that doesn't seem that bad to me.

Sure, sure. I'm just saying some people may not be OK with ramping up
to 10 or more compression threads on their master server, if it's
already heavily loaded, and maybe only has 4 vCPUs or whatever, so we
should have lighter-weight options for those people. I'm not trying to
argue against zstd or against the idea of ramping up large numbers of
compression threads, just saying that lz4 looks awfully nice for
people who need some compression but are tight on CPU cycles.

> I agree we should pick one. I think tar is not a great choice. .zip
> seems like it'd be a significant improvement - but not necessarily
> optimal.

Other ideas?

> > I don't want to get so caught up in advanced features here that we
> > don't make any useful progress at all. If we can add better features
> > without a large complexity increment, and without drawing objections
> > from others on this list, great. If not, I'm prepared to summarily
> > jettison it as nice-to-have but not essential.
>
> Just to be clear: I am not at all advocating tying a change of the
> archive format to compression method / parallelism changes or anything.

Good, thanks.

--
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: More efficient RI checks - take 2
Next
From: Andres Freund
Date:
Subject: Re: More efficient RI checks - take 2