Re: where should I stick that backup? - Mailing list pgsql-hackers

From Stephen Frost
Subject Re: where should I stick that backup?
Date
Msg-id 20200414150825.GI13712@tamriel.snowman.net
Whole thread Raw
In response to Re: where should I stick that backup?  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: where should I stick that backup?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
Greetings,

* Robert Haas (robertmhaas@gmail.com) wrote:
> On Sun, Apr 12, 2020 at 8:27 PM Andres Freund <andres@anarazel.de> wrote:
> > I really think we want the option to eventually do this server-side. And
> > I don't quite see it as viable to go for an API that allows to specify
> > shell fragments that are going to be executed server side.
>
> The server-side thing is a good point, but I think it adds quite a bit
> of complexity, too. I'm worried that this is ballooning to an
> unworkable amount of complexity - and not just code complexity, but
> bikeshedding complexity, too. Like, I started with a command-line
> option that could probably have been implemented in a few hundred
> lines of code. Now, we're up to something where you have to build
> custom processes that speak a novel protocol and work on both the
> client and the server side. That's at least several thousand lines of
> code, maybe over ten thousand if the sample binaries that use the new
> protocol are more than just simple demonstrations of how to code to
> the interface. More importantly, it means agreeing on the nature of
> this custom protocol, which seems like something where I could put in
> a ton of effort to create something and then have somebody complain
> because it's not JSON, or because it is JSON, or because the
> capability negotiation system isn't right, or whatever. I'm not
> exactly saying that we shouldn't do it; I think it has some appeal.
> But I'd sure like to find some way of getting started that doesn't
> involve having to do everything in one patch, and then getting told to
> change it all again - possibly with different people wanting
> contradictory things.

Doing things incrementally and not all in one patch absolutely makes a
lot of sense and is a good idea.

Wouldn't it make sense to, given that we have some idea of what we want
it to eventually look like, to make progress in that direction though?

That is- I tend to agree with Andres that having this supported
server-side eventually is what we should be thinking about as an
end-goal (what is the point of pg_basebackup in all of this, after all,
if the goal is to get a backup of PG from the PG server to s3..?  why
go through some other program or through the replication protocol?) and
having the server exec'ing out to run shell script fragments to make
that happen looks like it would be really awkward and full of potential
risks and issues and agreement that it wouldn't be a good fit.

If, instead, we worked on a C-based interface which includes filters and
storage drivers, and was implemented through libpgcommon, we could start
with that being all done through pg_basebackup and work to hammer out
the complications and issues that we run into there and, once it seems
reasonably stable and works well, we could potentially pull that into
the backend to be run directly without having to have pg_basebackup
involved in the process.

There's been good progress in the direction of having more done by the
backend already, and that's thanks to you and it's good work-
specifically that the backend now has the ability to generate a
manifest, with checksums included as the backup is being run, which is
definitely an important piece.

Thanks,

Stephen

Attachment

pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?
Next
From: Tom Lane
Date:
Subject: Re: Poll: are people okay with function/operator table redesign?