Re: design for parallel backup - Mailing list pgsql-hackers

From Robert Haas
Subject Re: design for parallel backup
Date
Msg-id CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com
Whole thread Raw
In response to Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
Responses Re: design for parallel backup  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On Tue, Apr 21, 2020 at 6:57 PM Andres Freund <andres@anarazel.de> wrote:
> I agree that trying to make backups very fast is a good goal (or well, I
> think not very slow would be a good descriptor for the current
> situation). I am just trying to make sure we tackle the right problems
> for that. My gut feeling is that we have to tackle compression first,
> because without addressing that "all hope is lost" ;)

OK. I have no objection to the idea of starting with (1) server side
compression and (2) a better compression algorithm. However, I'm not
very sold on the idea of relying on parallelism that is specific to
compression. I think that parallelism across the whole operation -
multiple connections, multiple processes, etc. - may be a more
promising approach than trying to parallelize specific stages of the
process. I am not sure about that; it could be wrong, and I'm open to
the possibility that it is, in fact, wrong.

Leaving out all the three and four digit wall times from your table:

> method  level   parallelism     wall-time       cpu-user-time   cpu-kernel-time size            rate    format
> pigz    1       10              34.35           364.14          23.55           3892401867      16.6    .gz
> zstd    1       1               82.95           67.97           11.82           2853193736      22.6    .zstd
> zstd    1       10              25.05           151.84          13.35           2847414913      22.7    .zstd
> zstd    6       10              43.47           374.30          12.37           2745211100      23.5    .zstd
> zstd    6       20              32.50           468.18          13.44           2745211100      23.5    .zstd
> zstd    9       20              57.99           949.91          14.13           2606535138      24.8    .zstd
> lz4     1       1               49.94           36.60           13.33           7318668265      8.8     .lz4
> pixz    1       10              92.54           925.52          37.00           1199499772      53.8    .xz

It's notable that almost all of the fast wall times here are with
zstd; the surviving entries with pigz and pixz are with ten-way
parallelism, and both pigz and lz4 have worse compression ratios than
zstd. My impression, though, is that LZ4 might be getting a bit of a
raw deal here because of the repetitive nature of the data. I theorize
based on some reading I did yesterday, and general hand-waving, that
maybe the compression ratios would be closer together on a more
realistic data set. It's also notable that lz1 -1 is BY FAR the winner
in terms of absolute CPU consumption. So I kinda wonder whether
supporting both LZ4 and ZSTD might be the way to go, especially since
once we have the LZ4 code we might be able to use it for other things,
too.

> One thing this reminded me of is whether using a format (tar) that
> doesn't allow efficient addressing of individual files is a good idea
> for base backups. The compression rates very likely will be better when
> not compressing tiny files individually, but at the same time it'd be
> very useful to be able to access individual files more efficiently than
> O(N). I can imagine that being important for some cases of incremental
> backup assembly.

Yeah, being able to operate directly on the compressed version of the
file would be very useful, but I'm not sure that we have great options
available there. I think the only widely-used format that supports
that is ".zip", and I'm not too sure about emitting zip files.
Apparently, pixz also supports random access to archive members, and
it did have on entry that survived my arbitrary cut in the table
above, but the last release was in 2015, and it seems to be only a
command-line tool, not a library. It also depends on libarchive and
liblzma, which is not awful, but I'm not sure we want to suck in that
many dependencies. But that's really a secondary thing: I can't
imagine us depending on something that hasn't had a release in 5
years, and has less than 300 total commits.

Now, it is based on xz/liblzma, and those seems to have some built-in
indexing capabilities which it may be leveraging, so possibly we could
roll our own. I'm not too sure about that, though, and it would limit
us to using only that form of compression.

Other options include, perhaps, (1) emitting a tarfile of compressed
files instead of a compressed tarfile, and (2) writing our own index
files. We don't know when we begin emitting the tarfile what files
we're going to find our how big they will be, so we can't really emit
a directory at the beginning of the file. Even if we thought we knew,
files can disappear or be truncated before we get around to archiving
them. However, when we reach the end of the file, we do know what we
included and how big it was, so possibly we could generate an index
for each tar file, or include something in the backup manifest.

> The other big benefit is that zstd's library has multi-threaded
> compression built in, whereas that's not the case for other libraries
> that I am aware of.

Wouldn't it be a problem to let the backend become multi-threaded, at
least on Windows?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: WAL usage calculation patch
Next
From: Ranier Vilela
Date:
Subject: Re: PG compilation error with Visual Studio 2015/2017/2019