Home > mailing lists

Re: design for parallel backup - Mailing list pgsql-hackers

From	Amit Kapila
Subject	Re: design for parallel backup
Date	April 21, 2020 04:50:01
Msg-id	CAA4eK1LQuN6LCaiHduaF0L47DUQ2e5cL_QNMytPRVUNidOA0dw@mail.gmail.com Whole thread
In response to	Re: design for parallel backup (Andres Freund <andres@anarazel.de>)
Responses	Re: design for parallel backup
List	pgsql-hackers

Tree view

On Tue, Apr 21, 2020 at 2:40 AM Andres Freund <andres@anarazel.de> wrote:
>
> On 2020-04-20 16:36:16 -0400, Robert Haas wrote:
>
> > If a backup client - either current or hypothetical - is compressing
> > and encrypting, then it doesn't have either a filesystem I/O or a
> > network I/O in progress while it's doing so. You take not only the hit
> > of the time required for compression and/or encryption, but also use
> > that much less of the available network and/or I/O capacity.
>
> I don't think it's really the time for network/file I/O that's the
> issue. Sure memcpy()'ing from the kernel takes time, but compared to
> encryption/compression it's not that much.  Especially for compression,
> it's not really lack of cycles for networking that prevent a higher
> throughput, it's that after buffering a few MB there's just no point
> buffering more, given compression will plod along with 20-100MB/s.
>

It is quite likely that compression can benefit more from parallelism
as compared to the network I/O as that is mostly a CPU intensive
operation but I am not sure if we can just ignore the benefit of
utilizing the network bandwidth.  In our case, after copying from the
network we do write that data to disk, so during filesystem I/O the
network can be used if there is some other parallel worker processing
other parts of data.

Also, there may be some users who don't want their data to be
compressed due to some reason like the overhead of decompression is so
high that restore takes more time and they are not comfortable with
that as for them faster restore is much more critical then compressed
or fast back up.  So, for such things, the parallelism during backup
as being discussed in this thread will still be helpful.

OTOH, I think without some measurements it is difficult to say that we
have significant benefit by paralysing the backup without compression.
I have scanned the other thread [1] where the patch for parallel
backup was discussed and didn't find any performance numbers, so
probably having some performance data with that patch might give us a
better understanding of introducing parallelism in the backup.

[1] - https://www.postgresql.org/message-id/CADM=JehKgobEknb+_nab9179HzGj=9EiTzWMOd2mpqr_rifm0Q@mail.gmail.com

-- 
With Regards,
Amit Kapila.
EnterpriseDB: http://www.enterprisedb.com

pgsql-hackers by date:

From: Tom Lane
Date: 21 April 2020, 04:42:40
Subject: Bogus documentation for bogus geometric operators

From: Michael Paquier
Date: 21 April 2020, 04:52:46
Subject: Re: Do we need to handle orphaned prepared transactions in theserver?

Re: design for parallel backup - Mailing list pgsql-hackers

Previous

Next