Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | 20200409224448.GC26811@momjian.us Whole thread Raw |
In response to | Re: where should I stick that backup? (Stephen Frost <sfrost@snowman.net>) |
Responses |
Re: where should I stick that backup?
Re: where should I stick that backup? |
List | pgsql-hackers |
On Thu, Apr 9, 2020 at 04:15:07PM -0400, Stephen Frost wrote: > Greetings, > > * Bruce Momjian (bruce@momjian.us) wrote: > > I think we need to step back and look at the larger issue. The real > > argument goes back to the Unix command-line API vs the VMS/Windows API. > > The former has discrete parts that can be stitched together, while the > > VMS/Windows API presents a more duplicative but more holistic API for > > every piece. We have discussed using shell commands for > > archive_command, and even more recently, for the server pass phrase. > > When it comes to something like the server pass phrase, it seems much > more reasonable to consider using a shell script (though still perhaps > not ideal) because it's not involved directly in ensuring that the data > is reliably stored and it's pretty clear that if it doesn't work the > worst thing that happens is that the database doesn't start up, but it > won't corrupt any data or destroy it or do other bad things. Well, the pass phrase relates to security, so it is important too. I don't think the _importance_ of the action is the most determining issue. Rather, I think it is how well the action fits the shell script API. > > To get more specific, I think we have to understand how the > > _requirements_ of the job match the shell script API, with stdin, > > stdout, stderr, return code, and command-line arguments. Looking at > > archive_command, the command-line arguments allow specification of file > > names, but quoting can be complex. The error return code and stderr > > output seem to work fine. There is no clean API for fsync and testing > > if the file exists, so that all that has to be hand done in one > > command-line. This is why many users use pre-written archive_command > > shell scripts. > > We aren't considering all of the use-cases really though, in specific, > things like pushing to s3 or gcs require, at least, good retry logic, > and that's without starting to think about things like high-rate systems > (spawning lots of new processes isn't free, particularly if they're > written in shell script but any interpreted language is expensive) and > wanting to parallelize. Good point, but if there are multiple APIs, it makes shell script flexibility even more useful. > > This brings up a few questions: > > > > * Should we have split apart archive_command into file-exists, copy, > > fsync-file? Should we add that now? > > No.. The right approach to improving on archive command is to add a way > for an extension to take over that job, maybe with a complete background > worker of its own, or perhaps a shared library that can be loaded by the > archiver process, at least if we're talking about how to allow people to > extend it. That seems quite vague, which is the issue we had years ago when considering doing archive_command as a link to a C library. > Potentially a better answer is to just build this stuff into PG- things > like "archive WAL to s3/GCS with these credentials" are what an awful > lot of users want. There's then some who want "archive first to this > other server, and then archive to s3/GCS", or more complex options. Yes, we certainly know how to do a file system copy, but what about copying files to other things like S3? I don't know how we would do that and allow users to change things like file paths or URLs. -- Bruce Momjian <bruce@momjian.us> https://momjian.us EnterpriseDB https://enterprisedb.com + As you are, so once was I. As I am, so you will be. + + Ancient Roman grave inscription +
pgsql-hackers by date: