Re: where should I stick that backup? - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: where should I stick that backup?
Date
Msg-id 20200409195700.GA26811@momjian.us
Whole thread Raw
In response to Re: where should I stick that backup?  (Magnus Hagander <magnus@hagander.net>)
Responses Re: where should I stick that backup?
List pgsql-hackers
On Mon, Apr  6, 2020 at 07:32:45PM +0200, Magnus Hagander wrote:
> On Mon, Apr 6, 2020 at 4:45 PM Stephen Frost <sfrost@snowman.net> wrote:
> > For my 2c, at least, introducing more shell commands into critical parts
> > of the system is absolutely the wrong direction to go in.
> > archive_command continues to be a mess that we refuse to clean up or
> > even properly document and the project would be much better off by
> > trying to eliminate it rather than add in new ways for users to end up
> > with bad or invalid backups.
> 
> I think the bigger problem with archive_command more comes from how
> it's defined to work tbh. Which leaves a lot of things open.
> 
> This sounds to me like a much narrower use-case, which makes it a lot
> more OK. But I agree we have to be careful not to get back into that
> whole mess. One thing would be to clearly document such things *from
> the beginning*, and not try to retrofit it years later like we ended
> up doing with archive_command.
> 
> And as Robert mentions downthread, the fsync() issue is definitely a
> real one, but if that is documented clearly ahead of time, that's a
> reasonable level foot-gun I'd say.

I think we need to step back and look at the larger issue.  The real
argument goes back to the Unix command-line API vs the VMS/Windows API. 
The former has discrete parts that can be stitched together, while the
VMS/Windows API presents a more duplicative but more holistic API for
every piece.  We have discussed using shell commands for
archive_command, and even more recently, for the server pass phrase.  

To get more specific, I think we have to understand how the
_requirements_ of the job match the shell script API, with stdin,
stdout, stderr, return code, and command-line arguments.  Looking at
archive_command, the command-line arguments allow specification of file
names, but quoting can be complex.  The error return code and stderr
output seem to work fine.  There is no clean API for fsync and testing
if the file exists, so that all that has to be hand done in one
command-line.  This is why many users use pre-written archive_command
shell scripts.

This brings up a few questions:

*  Should we have split apart archive_command into file-exists, copy,
fsync-file?  Should we add that now?

*  How well does this backup requirement match with the shell command
API?

-- 
  Bruce Momjian  <bruce@momjian.us>        https://momjian.us
  EnterpriseDB                             https://enterprisedb.com

+ As you are, so once was I.  As I am, so you will be. +
+                      Ancient Roman grave inscription +



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: BUG #16345: ts_headline does not find phrase matches correctly
Next
From: Andres Freund
Date:
Subject: Re: Parallel copy