Re: design for parallel backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: design for parallel backup
Date
Msg-id CA+TgmoYgNd-SyK8bDwZAYtKLdf1PrVZL2gPeEfxxkSkeDaUH_g@mail.gmail.com
Whole thread Raw
In response to Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
Responses Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Apr 21, 2020 at 4:14 PM Andres Freund <andres@anarazel.de> wrote:
> It was local TCP. The speeds I can reach are faster than the 10GiB/s
> (unidirectional) I can do between the laptop & workstation, so testing
> it over "actual" network isn't informative - I basically can reach line
> speed between them with any method.

Is that really a conclusive test, though? In the case of either local
TCP or a fast local interconnect, you'll have negligible latency. It
seems at least possible that saturating the available bandwidth is
harder on a higher-latency connection. Cross-region data center
connections figure to have way higher latency than a local wired
network, let alone the loopback interface.

> It was in kernel buffer cache. But I can reach 100% utilization of
> storage too (which is slightly slower than what I can do over unix
> socket).
>
> pg_basebackup --manifest-checksums=none -h /tmp/ -D- -Ft -cfast -Xnone |pv -B16M -r -a > /dev/null
> 2.59GiB/s
> find /srv/dev/pgdev-dev/base/ -type f -exec dd if={} bs=32k status=none \; |pv -B16M -r -a > /dev/null
> 2.53GiB/s
> find /srv/dev/pgdev-dev/base/ -type f -exec cat {} + |pv -B16M -r -a > /dev/null
> 2.42GiB/s
>
> I tested this with a -s 5000 DB, FWIW.

But that's not a real test either, because you're not writing the data
anywhere. It's going to be a whole lot easier to saturate the read
side if the write side is always zero latency.

> > How do you expect to take advantage of I/O parallelism without
> > multiple processes/connections?
>
> Which kind of I/O parallelism are you thinking of? Independent
> tablespaces? Or devices that can handle multiple in-flight IOs? WRT the
> latter, at least linux will keep many IOs in-flight for sequential
> buffered reads.

Both. I know that the kernel will prefetch for sequential reads, but
it won't know what file you're going to access next, so I think you'll
tend to stall when you reach the end of each file. It also seems
possible that on a large disk array, you could read N files at a time
with greater aggregate bandwidth than you can read a single file.

> > It seems to me that the interesting cases may involve having lots of
> > available CPUs and lots of disk spindles, but a comparatively slow
> > pipe between the machines.
>
> Hm, I'm not sure I am following. If network is the bottleneck, we'd
> immediately fill the buffers, and that'd be that?
>
> ISTM all of this is only really relevant if either pg_basebackup or
> walsender is the bottleneck?

I agree that if neither pg_basebackup nor walsender is the bottleneck,
parallelism is unlikely to be very effective. I have realized as a
result of your comments that I actually don't care intrinsically about
parallel backup; what I actually care about is making backups very,
very fast. I suspect that parallelism is a useful means to that end,
but I interpret your comments as questioning that, and specifically
drawing attention to the question of where the bottlenecks might be.
So I'm trying to think about that.

> I think it's fairly obvious that we need faster compression - and that
> while we clearly can win a lot by just using a faster
> algorithm/implementation than standard zlib, we'll likely also need
> parallelism in some form.  I'm doubtful that using multiple connections
> and multiple backends is the best way to achieve that, but it'd be a
> way.

I think it has a good chance of being pretty effective, but it's
certainly worth casting about for other possibilities that might
deliver more benefit or be less work. In terms of better compression,
I did a little looking around and it seems like LZ4 is generally
agreed to be a lot faster than gzip, and also significantly faster
than most other things that one might choose to use. On the other
hand, the compression ratio may not be as good; e.g.
https://facebook.github.io/zstd/ cites a 2.1 ratio (on some data set)
for lz4 and a 2.9 ratio for zstd. While the compression and
decompression speeds are slower, they are close enough that you might
be able to make up the difference by using 2x the cores for
compression and 3x for decompression. I don't know if that sort of
thing is worth considering. If your limitation is the line speed, and
you have have CPU cores to burn, a significantly higher compression
ratio means significantly faster backups. On the other hand, if you're
backing up over the LAN and the machine is heavily taxed, that's
probably not an appealing trade.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: More efficient RI checks - take 2
Next
From: Thomas Munro
Date:
Subject: Re: [IBM z Systems] Getting server crash when jit_above_cost =0