Re: where should I stick that backup? - Mailing list pgsql-hackers

From David Steele
Subject Re: where should I stick that backup?
Date
Msg-id 8d106ed1-10d7-f94f-8e4d-860865c55269@pgmasters.net
Whole thread Raw
In response to Re: where should I stick that backup?  (Andres Freund <andres@anarazel.de>)
Responses Re: where should I stick that backup?  (Andres Freund <andres@anarazel.de>)
List pgsql-hackers
On 4/12/20 3:17 PM, Andres Freund wrote:
> 
>> More generally, can you think of any ideas for how to structure an API
>> here that are easier to use than "write some C code"? Or do you think
>> we should tell people to write some C code if they want to
>> compress/encrypt/relocate their backup in some non-standard way?
> 
>> For the record, I'm not against eventually having more than one way to
>> do this, maybe a shell-script interface for simpler things and some
>> kind of API for more complex needs (e.g. NetBackup integration,
>> perhaps). And I did wonder if there was some other way we could do
>> this.
> 
> I'm doubtful that an API based on string replacement is the way to
> go. It's hard for me to see how that's not either going to substantially
> restrict the way the "tasks" are done, or yield a very complicated
> interface.
> 
> I wonder whether the best approach here could be that pg_basebackup (and
> perhaps other tools) opens pipes to/from a subcommand and over the pipe
> it communicates with the subtask using a textual ([2]) description of
> tasks. Like:
> 
> backup mode=files base_directory=/path/to/data/directory
> backup_file name=base/14037/16396.14 size=1073741824
> backup_file name=pg_wal/XXXX size=16777216
> or
> backup mode=tar
> base_directory /path/to/data/
> backup_tar name=dir.tar size=983498875687487

This is pretty much what pgBackRest does. We call them "local" processes 
and they do most of the work during backup/restore/archive-get/archive-push.

> The obvious problem with that proposal is that we don't want to
> unnecessarily store the incoming data on the system pg_basebackup is
> running on, just for the subcommand to get access to them. More on that
> in a second.

We also implement "remote" processes so the local processes can get data 
that doesn't happen to be local, i.e. on a remote PostgreSQL cluster.

> A huge advantage of a scheme like this would be that it wouldn't have to
> be specific to pg_basebackup. It could just as well work directly on the
> server, avoiding an unnecesary loop through the network. Which
> e.g. could integrate with filesystem snapshots etc.  Without needing to
> build the 'archive target' once with server libraries, and once with
> client libraries.

Yes -- needing to store the data locally or stream it through one main 
process is a major bottleneck.

Working on the server is key because it allows you to compress before 
transferring the data. With parallel processing it is trivial to flood a 
network. We have a recent example from a community user of backing up 
25TB in 4 hours. Compression on the server makes this possible (and a 
fast network, in this case).

For security reasons, it's also nice to be able to encrypt data before 
it leaves the database server. Calculating checksums/size at the source 
is also ideal.

> One reason I think something like this could be advantageous over a C
> API is that it's quite feasible to implement it from a number of
> different language, including shell if really desired, without needing
> to provide a C API via a FFI.

We migrated from Perl to C and kept our local/remote protocol the same, 
which really helped. So, we had times when the C code was using a Perl 
local/remote and vice versa. The idea is certainly workable in our 
experience.

> It'd also make it quite natural to split out compression from
> pg_basebackup's main process, which IME currently makes it not really
> feasible to use pg_basebackup's compression.

This is a major advantage.

> There's various ways we could address the issue for how the subcommand
> can access the file data. The most flexible probably would be to rely on
> exchanging file descriptors between basebackup and the subprocess (these
> days all supported platforms have that, I think).  Alternatively we
> could invoke the subcommand before really starting the backup, and ask
> how many files it'd like to receive in parallel, and restart the
> subcommand with that number of file descriptors open.

We don't exchange FDs. Each local is responsible for getting the data 
from PostgreSQL or the repo based on knowing the data source and a path. 
For pg_basebackup, however, I'd imagine each local would want a 
replication connection with the ability to request specific files that 
were passed to it by the main process.

> [2] yes, I already hear json. A line deliminated format would have some
> advantages though.

We use JSON, but each protocol request/response is linefeed-delimited. 
So for example here's what it looks like when the main process requests 
a local process to backup a specific file:


{"{"cmd":"backupFile","param":["base/32768/33001",true,65536,null,true,0,"pg_data/base/32768/33001",false,0,3,"20200412-213313F",false,null]}"}

And the local responds with:


{"{"out":[1,65536,65536,"6bf316f11d28c28914ea9be92c00de9bea6d9a6b",{"align":true,"error":[0,[3,5],7],"valid":false}]}"}

We use arrays for parameters but of course these could be done with 
objects for more readability.

We are considering a move to HTTP since lots of services (e.g. S3, GCS, 
Azure, etc.) require it (so we implement it) and we're not sure it makes 
sense to maintain our own protocol format. That said, we'd still prefer 
to use JSON for our payloads (like GCS) rather than XML (as S3 does).

Regards,
-- 
-David
david@pgmasters.net



pgsql-hackers by date:

Previous
From: Justin Pryzby
Date:
Subject: Re: doc review for v13
Next
From: Andrew Dunstan
Date:
Subject: Re: cleaning perl code