Re: where should I stick that backup? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: where should I stick that backup?
Date
Msg-id CA+TgmobU1gqsEuoPeB-TKKiwM1sEqkFJttKYzfxt1rcM765RGw@mail.gmail.com
Whole thread Raw
In response to Re: where should I stick that backup?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Sun, Apr 12, 2020 at 9:18 PM Stephen Frost <sfrost@snowman.net> wrote:
> There's two different questions we're talking about here and I feel like
> they're being conflated.  To try and clarify:
>
> - Could you implement FDWs with shell scripts, and custom programs?  I'm
>   pretty confident that the answer is yes, but the thrust of that
>   argument is primarily to show that you *can* implement just about
>   anything using a shell script "API", so just saying it's possible to
>   do doesn't make it necessarily a good solution.  The FDW system is
>   complicated, and also good, because we made it so and because it's
>   possible to do more sophisticated things with a C API, but it could
>   have started out with shell scripts that just returned data in much
>   the same way that COPY PROGRAM works today.  What matters is that
>   forward thinking to consider what you're going to want to do tomorrow,
>   not just thinking about how you can solve for the simple cases today
>   with a shell out to an existing command.
>
> - Does providing a C-library interface deter people from implementing
>   solutions that use that interface?  Perhaps it does, but it doesn't
>   have nearly the dampening effect that is being portrayed here, and we
>   can see that pretty clearly from the FDW situation.  Sure, not all of
>   those are good solutions, but lots and lots of archive command shell
>   scripts are also pretty terrible, and there *are* a few good solutions
>   out there, including the ones that we ourselves ship.  At least when
>   it comes to FDWs, there's an option there for us to ship a *good*
>   answer ourselves for certain (and, in particular, the very very
>   common) use-cases.
>
> > - We're only talking about writing a handful of tar files, and that's
> > in the context of a full-database backup, which is a much
> > heavier-weight operation than a query.
>
> This is true for -Ft, but not -Fp, and I don't think there's enough
> thought being put into this when it comes to parallelism and that you
> don't want to be limited to one process per tablespace.
>
> > - There is not really any state that needs to be maintained across calls.
>
> As mentioned elsewhere, this isn't really true.

These are fair points, and my thinking has been somewhat refined by
this discussion, so let me try to clarify my (current) position a bit.
I believe that there are two subtly different questions here.

Question #1 is "Would it be useful to people to be able to pipe the
tar files that they get from pg_basebackup into some other command
rather than writing them to the filesystem, and should we give them
the option to do so?"

Question #2 is "Is piping the tar files that pg_basebackup would
produce into some other program the best possible way of providing
more flexibility about where backups get written?"

I'm prepared to concede that the answer to question #2 is no. I had
earlier assumed that establishing connections was pretty fast and
that, even if not, there were solutions to that problem, like setting
up an SSH tunnel in advance. Several people have said, well, no,
establishing connections is a problem. As I acknowledged from the
beginning, plain format backups are a problem. So I think a convincing
argument has been made that a shell command won't meet everyone's
needs, and a more complex API is required for some cases.

But I still think the answer to question #1 is yes. I disagree
entirely with any argument to the effect that because some users might
do unsafe things with the option, we ought not to provide it.
Practically speaking, it would work fine for many people even with no
other changes, and if we add something like pgfile, which I'm willing
to do, it would work for more people in more situations. It is a
useful thing to have, full stop.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: where should I stick that backup?
Next
From: James Coleman
Date:
Subject: Re: execExprInterp() questions / How to improve scalar array op expr eval?