refactoring basebackup.c - Mailing list pgsql-hackers

From Robert Haas
Subject refactoring basebackup.c
Date
Msg-id CA+TgmoZGwR=ZVWFeecncubEyPdwghnvfkkdBe9BLccLSiqdf9Q@mail.gmail.com
Whole thread Raw
Responses Re: refactoring basebackup.c
Re: refactoring basebackup.c
Re: refactoring basebackup.c
List pgsql-hackers
Hi,

I'd like to propose a fairly major refactoring of the server's
basebackup.c. The current code isn't horrific or anything, but the
base backup mechanism has grown quite a few features over the years
and all of the code knows about all of the features. This is going to
make it progressively more difficult to add additional features, and I
have a few in mind that I'd like to add, as discussed below and also
on several other recent threads.[1][2] The attached patch set shows
what I have in mind. It needs more work, but I believe that there's
enough here for someone to review the overall direction, and even some
of the specifics, and hopefully give me some useful feedback.

This patch set is built around the idea of creating two new
abstractions, a base backup sink -- or bbsink -- and a base backup
archiver -- or bbarchiver.  Each of these works like a foreign data
wrapper or custom scan or TupleTableSlot. That is, there's a table of
function pointers that act like method callbacks. Every implementation
can allocate a struct of sufficient size for its own bookkeeping data,
and the first member of the struct is always the same, and basically
holds the data that all implementations must store, including a
pointer to the table of function pointers. If we were using C++,
bbarchiver and bbsink would be abstract base classes.

They represent closely-related concepts, so much so that I initially
thought we could get by with just one new abstraction layer. I found
on experimentation that this did not work well, so I split it up into
two and that worked a lot better. The distinction is this: a bbsink is
something to which you can send a bunch of archives -- currently, each
would be a tarfile -- and also a backup manifest. A bbarchiver is
something to which you send every file in the data directory
individually, or at least the ones that are getting backed up, plus
any that are being injected into the backup (e.g. the backup_label).
Commonly, a bbsink will do something with the data and then forward it
to a subsequent bbsink, or a bbarchiver will do something with the
data and then forward it to a subsequent bbarchiver or bbsink. For
example, there's a bbarchiver_tar object which, like any bbarchiver,
sees all the files and their contents as input. The output is a
tarfile, which gets send to a bbsink. As things stand in the patch set
now, the tar archives are ultimately sent to the "libpq" bbsink, which
sends them to the client.

In the future, we could have other bbarchivers. For example, we could
add "pax", "zip", or "cpio" bbarchiver which produces archives of that
format, and any given backup could choose which one to use. Or, we
could have a bbarchiver that runs each individual file through a
compression algorithm and then forwards the resulting data to a
subsequent bbarchiver. That would make it easy to produce a tarfile of
individually compressed files, which is one possible way of creating a
seekable achive.[3] Likewise, we could have other bbsinks. For
example, we could have a "localdisk" bbsink that cause the server to
write the backup somewhere in the local filesystem instead of
streaming it out over libpq. Or, we could have an "s3" bbsink that
writes the archives to S3. We could also have bbsinks that compresses
the input archives using some compressor (e.g. lz4, zstd, bzip2, ...)
and forward the resulting compressed archives to the next bbsink in
the chain. I'm not trying to pass judgement on whether any of these
particular things are things we want to do, nor am I saying that this
patch set solves all the problems with doing them. However, I believe
it will make such things a whole lot easier to implement, because all
of the knowledge about whatever new functionality is being added is
centralized in one place, rather than being spread across the entirety
of basebackup.c. As an example of this, look at how 0010 changes
basebackup.c and basebackup_tar.c: afterwards, basebackup.c no longer
knows anything that is tar-specific, whereas right now it knows about
tar-specific things in many places.

Here's an overview of this patch set:

0001-0003 are cleanup patches that I have posted for review on
separate threads.[4][5] They are included here to make it easy to
apply this whole series if someone wishes to do so.

0004 is a minor refactoring that reduces by 1 the number of functions
in basebackup.c that know about the specifics of tarfiles. It is just
a preparatory patch and probably not very interesting.

0005 invents the bbsink abstraction.

0006 creates basebackup_libpq.c and moves all code that knows about
the details of sending archives via libpq there. The functionality is
exposed for use by basebackup.c as a new type of bbsink, bbsink_libpq.

0007 creates basebackup_throttle.c and moves all code that knows about
throttling backups there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_throttle. This means that
the throttling logic could be reused to throttle output to any final
destination. Essentially, this is a bbsink that just passes everything
it gets through to the next bbsink, but with a rate limit. If
throttling's not enabled, no bbsink_throttle object is created, so all
of the throttling code is completely out of the execution pipeline.

0008 creates basebackup_progress.c and moves all code that knows about
progress reporting there. The functionality is exposed for use by
basebackup.c as a new type of bbsink, bbsink_progress. Since the
abstraction doesn't fit perfectly in this case, some extra functions
are added to work around the problem. This is not entirely elegant,
but I don't think it's still an improvement over what we have now, and
I don't have a better idea.

0009 invents the bbarchiver abstraction.

0010 invents two new bbarchivers, a tar bbarchiver and a tarsize
bbarchiver, and refactors basebackup.c to make use of them. The tar
bbarchiver puts the files it sees into tar archives and forwards the
resulting archives to a bbsink. The tarsize bbarchiver is used to
support the PROGRESS option to the BASE_BACKUP command. It just
estimates the size of the backup by summing up the file sizes without
reading them. This approach is good for a couple of reasons. First,
without something like this, it's impossible to keep basebackup.c from
knowing something about the tar format, because the PROGRESS option
doesn't just figure out how big the files to be backed up are: it
figures out how big it thinks the archives will be, and that involves
tar-specific considerations. This area needs more work, as the whole
idea of measuring progress by estimating the archive size is going to
break down as soon as server-side compression is in the picture.
Second, this makes the code path that we use for figuring out the
backup size details much more similar to the path we use for
performing the actual backup. For instance, with this patch, we
include the exact same files in the calculation that we will include
in the backup, and in the same order, something that's not true today.
The basebackup_tar.c file added by this patch is sadly lacking in
comments, which I will add in a future version of the patch set. I
think, though, that it will not be too unclear what's going on here.

0011 invents another new kind of bbarchiver. This bbarchiver just
eavesdrops on the stream of files to facilitate backup manifest
construction, and then forwards everything through to a subsequent
bbarchiver. Like bbsink_throttle, it can be entirely omitted if not
used. This patch is a bit clunky at the moment and needs some polish,
but it is another demonstration of how these abstractions can be used
to simplify basebackup.c, so that basebackup.c only has to worry about
determining what should be backed up and not have to worry much about
all the specific things that need to be done as part of that.

Although this patch set adds quite a bit of code on net, it makes
basebackup.c considerably smaller and simpler, removing more than 400
lines of code from that file, about 20% of the current total. There
are some gratifying changes vs. the status quo. For example, in
master, we have this:

sendDir(const char *path, int basepathlen, bool sizeonly, List *tablespaces,
                bool sendtblspclinks, backup_manifest_info *manifest,
                const char *spcoid)

Notably, the sizeonly flag makes the function not do what the name of
the function suggests that it does. Also, we've got to pass some extra
fields through to enable specific features. With the patch set, the
equivalent function looks like this:

archive_directory(bbarchiver *archiver, const char *path, int basepathlen,
                                  List *tablespaces, bool sendtblspclinks)

The question "what should I do with the directories and files we find
as we recurse?" is now answered by the choice of which bbarchiver to
pass to the function, rather than by the values of sizeonly, manifest,
and spcoid. That's not night and day, but I think it's better,
especially as you imagine adding more features in the future. The
really important part, for me, is that you can make the bbarchiver do
anything you like without needing to make any more changes to this
function. It just arranges to invoke your callbacks. You take it from
there.

One pretty major question that this patch set doesn't address is what
the user interface for any of the hypothetical features mentioned
above ought to look like, or how basebackup.c ought to support them.
The syntax for the BASE_BACKUP command, like the contents of
basebackup.c, has grown organically, and doesn't seem to be very
scalable. Also, the wire protocol - a series of CopyData results which
the client is entirely responsible for knowing how to interpret and
about which the server provides only minimal information - doesn't
much lend itself to extensibility. Some careful design work is likely
needed in both areas, and this patch does not try to do any of it. I
am quite interested in discussing those questions, but I felt that
they weren't the most important problems to solve first.

What do you all think?

Thanks,

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company

[1] http://postgr.es/m/CA+TgmoZubLXYR+Pd_gi3MVgyv5hQdLm-GBrVXkun-Lewaw12Kg@mail.gmail.com
[2] http://postgr.es/m/CA+TgmoYr7+-0_vyQoHbTP5H3QGZFgfhnrn6ewDteF=kUqkG=Fw@mail.gmail.com
[3] http://postgr.es/m/CA+TgmoZQCoCyPv6fGoovtPEZF98AXCwYDnSB0=p5XtxNY68r_A@mail.gmail.com
and following
[4] http://postgr.es/m/CA+TgmoYq+59SJ2zBbP891ngWPA9fymOqntqYcweSDYXS2a620A@mail.gmail.com
[5] http://postgr.es/m/CA+TgmobWbfReO9-XFk8urR1K4wTNwqoHx_v56t7=T8KaiEoKNw@mail.gmail.com



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Back-branch minor release notes are up for review
Next
From: Robert Haas
Date:
Subject: Re: refactoring basebackup.c