Re: where should I stick that backup? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: where should I stick that backup?
Date
Msg-id CA+TgmoawpBgHPG7WESFaDHGGd7Ft-+k9GThtKL6+PJ7mn6h1Cg@mail.gmail.com
Whole thread Raw
In response to Re: where should I stick that backup?  (Andres Freund <andres@anarazel.de>)
Responses Re: where should I stick that backup?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Sun, Apr 12, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:
> > That's quite appealing. One downside - IMHO significant - is that you
> > have to have a separate process to do *anything*. If you want to add a
> > filter that just logs everything it's asked to do, for example, you've
> > gotta have a whole process for that, which likely adds a lot of
> > overhead even if you can somehow avoid passing all the data through an
> > extra set of pipes. The interface I proposed would allow you to inject
> > very lightweight filters at very low cost. This design really doesn't.
>
> Well, in what you described it'd still be all done inside pg_basebackup,
> or did I misunderstand? Once you fetched it from the server, I can't
> imagine the overhead of filtering it a bit differently would matter.
>
> But even if, the "target" could just reply with "skip" or such, instead
> of providing an fd.
>
> What kind of filtering are you thinking of where this is a problem?
> Besides just logging the filenames?  I just can't imagine how that's a
> relevant overhead compared to having to do things like
> 'shell ssh rhaas@depository pgfile create-exclusive - %f.lz4'

Anything you want to do in the same process. I mean, right now we have
basically one target (filesystem) and one filter (compression).
Neither of those things spawn a process. It seems logical to imagine
that there might be other things that are similar in the future. It
seems to me that there are definitely things where you will want to
spawn a process; that's why I like having shell commands as one
option. But I don't think we should require that you can't have a
filter or a target unless you also spawn a process for it.

> I really think we want the option to eventually do this server-side. And
> I don't quite see it as viable to go for an API that allows to specify
> shell fragments that are going to be executed server side.

The server-side thing is a good point, but I think it adds quite a bit
of complexity, too. I'm worried that this is ballooning to an
unworkable amount of complexity - and not just code complexity, but
bikeshedding complexity, too. Like, I started with a command-line
option that could probably have been implemented in a few hundred
lines of code. Now, we're up to something where you have to build
custom processes that speak a novel protocol and work on both the
client and the server side. That's at least several thousand lines of
code, maybe over ten thousand if the sample binaries that use the new
protocol are more than just simple demonstrations of how to code to
the interface. More importantly, it means agreeing on the nature of
this custom protocol, which seems like something where I could put in
a ton of effort to create something and then have somebody complain
because it's not JSON, or because it is JSON, or because the
capability negotiation system isn't right, or whatever. I'm not
exactly saying that we shouldn't do it; I think it has some appeal.
But I'd sure like to find some way of getting started that doesn't
involve having to do everything in one patch, and then getting told to
change it all again - possibly with different people wanting
contradictory things.

> > Note that you could build this on top of what I proposed, but not the
> > other way around.
>
> Why should it not be possible the other way round?

Because a C function call API lets you decide to spawn a process, but
if the framework inherently spawns a process, you can't decide not to
do so in a particular case.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Andreas Karlsson
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Amit Langote
Date:
Subject: Re: index paths and enable_indexscan