Re: where should I stick that backup? - Mailing list pgsql-hackers

From David Steele
Subject Re: where should I stick that backup?
Date
Msg-id f6d3048d-99a1-8258-23d1-db8a9fa93506@pgmasters.net
Whole thread Raw
In response to Re: where should I stick that backup?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 4/12/20 11:04 AM, Robert Haas wrote:
> On Sun, Apr 12, 2020 at 10:09 AM Magnus Hagander <magnus@hagander.net> wrote:
>> There are certainly cases for it. It might not be they have to be the same connection, but still be the same
session,meaning before the first time you perform some step of authentication, get a token, and then use that for all
thefiles. You'd need somewhere to maintain that state, even if it doesn't happen to be a socket. But there are
definitelyplenty of cases where keeping an open socket can be a huge performance gain -- especially when it comes to
notre-negotiating encryption etc.
 
> 
> Hmm, OK.

When we implemented connection-sharing for S3 in pgBackRest it was a 
significant performance boost, even for large files since they must be 
uploaded in parts. The same goes for files transferred over SSH, though 
in this case the overhead is per-file and can be mitigated with control 
master.

We originally (late 2013) implemented everything with commmand-line 
tools during the POC phase. The idea was to get something viable quickly 
and then improve as needed. At the time our config file had entries 
something like this:

[global:command]
compress=/usr/bin/gzip --stdout %file%
decompress=/usr/bin/gzip -dc %file%
checksum=/usr/bin/shasum %file% | awk '{print $1}'
manifest=/opt/local/bin/gfind %path% -printf 
'%P\t%y\t%u\t%g\t%m\t%T@\t%i\t%s\t%l\n'
psql=/Library/PostgreSQL/9.3/bin/psql -X %option%

[db]
psql_options=--cluster=9.3/main

[db:command:option]
psql=--port=6001

These appear to be for MacOS, but Linux would be similar.

This *did* work, but it was really hard to debug when things went wrong, 
the per-file cost was high, and the slight differences between the 
command-line tools on different platforms was maddening. For example, 
lots of versions of 'find' would error if a file disappeared while 
building the manifest, which is a pretty common occurrence in PostgreSQL 
(most newer distros had an option to fix this). I know that doesn't 
apply here, but it's an example. Also, debugging was complicated with so 
many processes, with any degree of parallelism the process list got 
pretty crazy, fsync was not happening, etc. It's been a long time but I 
don't have any good memories of the solution that used all command-line 
tools.

Once we had a POC that solved our basic problem, i.e. backup up about 
50TB of data reasonably efficiently, we immediately started working on a 
version that did not rely on command-line tools and we never looked 
back. Currently the only command-line tool we use is ssh.

I'm sure it would be possible to create a solution that worked better 
than ours, but I'm pretty certain it would still be hard for users to 
make it work correctly and to prove it worked correctly.

>> For compression and encryption, it could perhaps be as simple as "the command has to be pipe on both input and
output"and basically send the response back to pg_basebackup.
 
>>
>> But that won't help if the target is to relocate things...
> 
> Right. And, also, it forces things to be sequential in a way I'm not
> too happy about. Like, if we have some kind of parallel backup, which
> I hope we will, then you can imagine (among other possibilities)
> getting files for each tablespace concurrently, and piping them
> through the output command concurrently. But if we emit the result in
> a tarfile, then it has to be sequential; there's just no other choice.
> I think we should try to come up with something that can work in a
> multi-threaded environment.
> 
>> That is one way to go for it -- and in a case like that, I'd suggest the shellscript interface would be an
implementationof the other API. A number of times through the years I've bounced ideas around for what to do with
archive_commandwith different people (never quite to the level of "it's time to write a patch"), and it's mostly come
downto some sort of shlib api where in turn we'd ship a backwards compatible implementation that would behave like
archive_command.I'd envision something similar here.
 
> 
> I agree. Let's imagine that there are a conceptually unlimited number
> of "targets" and "filters". Targets and filters accept data via the
> same API, but a target is expected to dispose of the data, whereas a
> filter is expected to pass it, via that same API, to a subsequent
> filter or target. So filters could include things like "gzip", "lz4",
> and "encrypt-with-rot13", whereas targets would include things like
> "file" (the thing we have today - write my data into some local
> files!), "shell" (which writes my data to a shell command, as
> originally proposed), and maybe eventually things like "netbackup" and
> "s3". Ideally this will all eventually be via a loadable module
> interface so that third-party filters and targets can be fully
> supported, but perhaps we could consider that an optional feature for
> v1. Note that there is quite a bit of work to do here just to
> reorganize the code.
> 
> I would expect that we would want to provide a flexible way for a
> target or filter to be passed options from the pg_basebackup command
> line. So one might for example write this:
> 
> pg_basebackup --filter='lz4 -9' --filter='encrypt-with-rot13
> rotations=2' --target='shell ssh rhaas@depository pgfile
> create-exclusive - %f.lz4'
> 
> The idea is that the first word of the filter or target identifies
> which one should be used, and the rest is just options text in
> whatever form the provider cares to accept them; but with some
> %<character> substitutions allowed, for things like the file name.
> (The aforementioned escaping problems for things like filenames with
> spaces in them still need to be sorted out, but this is just a sketch,
> so while I think it's quite solvable, I am going to refrain from
> proposing a precise solution here.)

This is basically the solution we have landed on after many iterations.

We implement two types of filters, In and InOut.  The In filters process 
data and produce a result, e.g. SHA1, size, page checksum, etc. The 
InOut filters modify data, e.g. compression, encryption. Yeah, the names 
could probably be better...

I have attached our filter interface (filter.intern.h) as a concrete 
example of how this works.

We call 'targets' storage and have a standard interface for creating 
storage drivers. I have also attached our storage interface 
(storage.intern.h) as a concrete example of how this works.

Note that for just performing backup this is overkill, but once you 
consider verify this is pretty much the minimum storage interface 
needed, according to our experience.

Regards,
-- 
-David
david@pgmasters.net

Attachment

pgsql-hackers by date:

Previous
From: David Steele
Date:
Subject: Re: cleaning perl code
Next
From: Justin Pryzby
Date:
Subject: Re: doc review for v13