Re: where should I stick that backup? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: where should I stick that backup?
Date
Msg-id 20200409224448.GC26811@momjian.us
Whole thread Raw
In response to Re: where should I stick that backup?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: where should I stick that backup?
Re: where should I stick that backup?
List pgsql-hackers
On Thu, Apr  9, 2020 at 04:15:07PM -0400, Stephen Frost wrote:
> Greetings,
> 
> * Bruce Momjian (bruce@momjian.us) wrote:
> > I think we need to step back and look at the larger issue.  The real
> > argument goes back to the Unix command-line API vs the VMS/Windows API. 
> > The former has discrete parts that can be stitched together, while the
> > VMS/Windows API presents a more duplicative but more holistic API for
> > every piece.  We have discussed using shell commands for
> > archive_command, and even more recently, for the server pass phrase.  
> 
> When it comes to something like the server pass phrase, it seems much
> more reasonable to consider using a shell script (though still perhaps
> not ideal) because it's not involved directly in ensuring that the data
> is reliably stored and it's pretty clear that if it doesn't work the
> worst thing that happens is that the database doesn't start up, but it
> won't corrupt any data or destroy it or do other bad things.

Well, the pass phrase relates to security, so it is important too.  I
don't think the _importance_ of the action is the most determining
issue.  Rather, I think it is how well the action fits the shell script
API.

> > To get more specific, I think we have to understand how the
> > _requirements_ of the job match the shell script API, with stdin,
> > stdout, stderr, return code, and command-line arguments.  Looking at
> > archive_command, the command-line arguments allow specification of file
> > names, but quoting can be complex.  The error return code and stderr
> > output seem to work fine.  There is no clean API for fsync and testing
> > if the file exists, so that all that has to be hand done in one
> > command-line.  This is why many users use pre-written archive_command
> > shell scripts.
> 
> We aren't considering all of the use-cases really though, in specific,
> things like pushing to s3 or gcs require, at least, good retry logic,
> and that's without starting to think about things like high-rate systems
> (spawning lots of new processes isn't free, particularly if they're
> written in shell script but any interpreted language is expensive) and
> wanting to parallelize.

Good point, but if there are multiple APIs, it makes shell script
flexibility even more useful.

> > This brings up a few questions:
> > 
> > *  Should we have split apart archive_command into file-exists, copy,
> > fsync-file?  Should we add that now?
> 
> No..  The right approach to improving on archive command is to add a way
> for an extension to take over that job, maybe with a complete background
> worker of its own, or perhaps a shared library that can be loaded by the
> archiver process, at least if we're talking about how to allow people to
> extend it.

That seems quite vague, which is the issue we had years ago when
considering doing archive_command as a link to a C library.

> Potentially a better answer is to just build this stuff into PG- things
> like "archive WAL to s3/GCS with these credentials" are what an awful
> lot of users want.  There's then some who want "archive first to this
> other server, and then archive to s3/GCS", or more complex options.

Yes, we certainly know how to do a file system copy, but what about
copying files to other things like S3?  I don't know how we would do
that and allow users to change things like file paths or URLs.

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Andres Freund
Date:
Subject: Re: Catalog invalidations vs catalog scans vs ScanPgRelation()
Next
From: Tom Lane
Date:
Subject: Re: Catalog invalidations vs catalog scans vs ScanPgRelation()