Re: where should I stick that backup? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: where should I stick that backup?
Date
Msg-id 20200412191702.ul7ohgv5gus3tsvo@alap3.anarazel.de
Whole thread Raw
In response to Re: where should I stick that backup?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: where should I stick that backup?
Re: where should I stick that backup?
List pgsql-hackers
Hi,

On 2020-04-11 16:22:09 -0400, Robert Haas wrote:
> On Fri, Apr 10, 2020 at 3:38 PM Andres Freund <andres@anarazel.de> wrote:
> > Wouldn't there be state like a S3/ssh/https/... connection? And perhaps
> > a 'backup_id' in the backup metadata DB that'd one would want to update
> > at the end?
> 
> Good question. I don't know that there would be but, uh, maybe? It's
> not obvious to me why all of that would need to be done using the same
> connection, but if it is, the idea I proposed isn't going to work very
> nicely.

Well, it depends on what you want to support. If you're only interested
in supporting tarball mode ([1]), *maybe* you can get away without
longer lived sessions (but I'm doubtful). But if you're interested in
also supporting archiving plain files, then the cost of establishing
sessions, and the latency penalty of having to wait for command
completion would imo be prohibitive.  A lot of solutions for storing
backups can achieve pretty decent throughput, but have very significant
latency. That's of course in addition to network latency itself.


[1] I don't think we should restrict it that way. Would make it much
  more complicated to support incremental backup, pg_rewind,
  deduplication, etc.


> More generally, can you think of any ideas for how to structure an API
> here that are easier to use than "write some C code"? Or do you think
> we should tell people to write some C code if they want to
> compress/encrypt/relocate their backup in some non-standard way?

> For the record, I'm not against eventually having more than one way to
> do this, maybe a shell-script interface for simpler things and some
> kind of API for more complex needs (e.g. NetBackup integration,
> perhaps). And I did wonder if there was some other way we could do
> this.

I'm doubtful that an API based on string replacement is the way to
go. It's hard for me to see how that's not either going to substantially
restrict the way the "tasks" are done, or yield a very complicated
interface.

I wonder whether the best approach here could be that pg_basebackup (and
perhaps other tools) opens pipes to/from a subcommand and over the pipe
it communicates with the subtask using a textual ([2]) description of
tasks. Like:

backup mode=files base_directory=/path/to/data/directory
backup_file name=base/14037/16396.14 size=1073741824
backup_file name=pg_wal/XXXX size=16777216
or
backup mode=tar
base_directory /path/to/data/
backup_tar name=dir.tar size=983498875687487

The obvious problem with that proposal is that we don't want to
unnecessarily store the incoming data on the system pg_basebackup is
running on, just for the subcommand to get access to them. More on that
in a second.

A huge advantage of a scheme like this would be that it wouldn't have to
be specific to pg_basebackup. It could just as well work directly on the
server, avoiding an unnecesary loop through the network. Which
e.g. could integrate with filesystem snapshots etc.  Without needing to
build the 'archive target' once with server libraries, and once with
client libraries.

One reason I think something like this could be advantageous over a C
API is that it's quite feasible to implement it from a number of
different language, including shell if really desired, without needing
to provide a C API via a FFI.

It'd also make it quite natural to split out compression from
pg_basebackup's main process, which IME currently makes it not really
feasible to use pg_basebackup's compression.


There's various ways we could address the issue for how the subcommand
can access the file data. The most flexible probably would be to rely on
exchanging file descriptors between basebackup and the subprocess (these
days all supported platforms have that, I think).  Alternatively we
could invoke the subcommand before really starting the backup, and ask
how many files it'd like to receive in parallel, and restart the
subcommand with that number of file descriptors open.

If we relied on FDs, here's an example for how a trace between
pg_basebackup (BB) a backup target command (TC) could look like:

BB: required_capabilities fd_send files
BB: provided_capabilities fd_send file_size files tar
TC: required_capabilities fd_send files file_size
BB: backup mode=files base_directory=/path/to/data/directory
BB: backup_file method=fd name=base/14037/16396.1 size=1073741824
BB: backup_file method=fd name=base/14037/16396.2 size=1073741824
BB: backup_file method=fd name=base/14037/16396.3 size=1073741824
TC: fd name=base/14037/16396.1 (contains TC fd 4)
TC: fd name=base/14037/16396.2 (contains TC fd 5)
BB: backup_file method=fd name=base/14037/16396.4 size=1073741824
TC: fd name=base/14037/16396.3 (contains TC fd 6)
BB: backup_file method=fd name=base/14037/16396.5 size=1073741824
TC: fd name=base/14037/16396.4 (contains TC fd 4)
TC: fd name=base/14037/16396.5 (contains TC fd 5)
BB: done
TC: done

backup_file type=fd mode=fd base/14037/16396.4 1073741824
or
backup_features tar
backup_mode tar
base_directory /path/to/data/
backup_tar dir.tar 983498875687487


[2] yes, I already hear json. A line deliminated format would have some
advantages though.

Greetings,

Andres Freund



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: execExprInterp() questions / How to improve scalar array op expreval?
Next
From: Robert Haas
Date:
Subject: Re: cleaning perl code