Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | CA+TgmoawpBgHPG7WESFaDHGGd7Ft-+k9GThtKL6+PJ7mn6h1Cg@mail.gmail.com Whole thread Raw |
In response to | Re: where should I stick that backup? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: where should I stick that backup?
|
List | pgsql-hackers |
On Sun, Apr 12, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote: > > That's quite appealing. One downside - IMHO significant - is that you > > have to have a separate process to do *anything*. If you want to add a > > filter that just logs everything it's asked to do, for example, you've > > gotta have a whole process for that, which likely adds a lot of > > overhead even if you can somehow avoid passing all the data through an > > extra set of pipes. The interface I proposed would allow you to inject > > very lightweight filters at very low cost. This design really doesn't. > > Well, in what you described it'd still be all done inside pg_basebackup, > or did I misunderstand? Once you fetched it from the server, I can't > imagine the overhead of filtering it a bit differently would matter. > > But even if, the "target" could just reply with "skip" or such, instead > of providing an fd. > > What kind of filtering are you thinking of where this is a problem? > Besides just logging the filenames? I just can't imagine how that's a > relevant overhead compared to having to do things like > 'shell ssh rhaas@depository pgfile create-exclusive - %f.lz4' Anything you want to do in the same process. I mean, right now we have basically one target (filesystem) and one filter (compression). Neither of those things spawn a process. It seems logical to imagine that there might be other things that are similar in the future. It seems to me that there are definitely things where you will want to spawn a process; that's why I like having shell commands as one option. But I don't think we should require that you can't have a filter or a target unless you also spawn a process for it. > I really think we want the option to eventually do this server-side. And > I don't quite see it as viable to go for an API that allows to specify > shell fragments that are going to be executed server side. The server-side thing is a good point, but I think it adds quite a bit of complexity, too. I'm worried that this is ballooning to an unworkable amount of complexity - and not just code complexity, but bikeshedding complexity, too. Like, I started with a command-line option that could probably have been implemented in a few hundred lines of code. Now, we're up to something where you have to build custom processes that speak a novel protocol and work on both the client and the server side. That's at least several thousand lines of code, maybe over ten thousand if the sample binaries that use the new protocol are more than just simple demonstrations of how to code to the interface. More importantly, it means agreeing on the nature of this custom protocol, which seems like something where I could put in a ton of effort to create something and then have somebody complain because it's not JSON, or because it is JSON, or because the capability negotiation system isn't right, or whatever. I'm not exactly saying that we shouldn't do it; I think it has some appeal. But I'd sure like to find some way of getting started that doesn't involve having to do everything in one patch, and then getting told to change it all again - possibly with different people wanting contradictory things. > > Note that you could build this on top of what I proposed, but not the > > other way around. > > Why should it not be possible the other way round? Because a C function call API lets you decide to spawn a process, but if the framework inherently spawns a process, you can't decide not to do so in a particular case. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: