Re: where should I stick that backup? - Mailing list pgsql-hackers

From Robert Haas
Subject Re: where should I stick that backup?
Date
Msg-id CA+TgmoZFsXfwdbL39tO7x4GUeYFsMWw-5aMVysP6DFrybb9Ruw@mail.gmail.com
Whole thread Raw
In response to Re: where should I stick that backup?  (Stephen Frost <sfrost@snowman.net>)
Responses Re: where should I stick that backup?  (Andres Freund <andres@anarazel.de>)
Re: where should I stick that backup?  (Stephen Frost <sfrost@snowman.net>)
List pgsql-hackers
On Fri, Apr 10, 2020 at 10:54 AM Stephen Frost <sfrost@snowman.net> wrote:
> So, this goes to what I was just mentioning to Bruce independently- you
> could have made the same argument about FDWs, but it just doesn't
> actually hold any water.  Sure, some of the FDWs aren't great, but
> there's certainly no shortage of them, and the ones that are
> particularly important (like postgres_fdw) are well written and in core.

That's a fairly different use case. In the case of the FDW interface:

- The number of interface method calls is very high, at least one per
tuple and a bunch of extra ones for each query.
- There is a significant amount of complex state that needs to be
maintained across API calls.
- The return values are often tuples, which are themselves an
in-memory data structure.

But here:

- We're only talking about writing a handful of tar files, and that's
in the context of a full-database backup, which is a much
heavier-weight operation than a query.
- There is not really any state that needs to be maintained across calls.
- The expected result is that a file gets written someplace, which is
not an in-memory data structure but something that gets written to a
place outside of PostgreSQL.

> The concerns about there being too many possibilities and new ones
> coming up all the time could be applied equally to FDWs, but rather than
> ending up with a dearth of options and external solutions there, what
> we've actually seen is an explosion of options and externally written
> libraries for a large variety of options.

Sure, but a lot of those FDWs are relatively low-quality, and it's
often hard to find one that does what you want. And even if you do,
you don't really know how good it is. Unfortunately, in that case
there's no real alternative, because implementing something based on
shell commands couldn't ever have reasonable performance or a halfway
decent feature set. That's not the case here.

> How does this solution give them a good way to do the right thing
> though?  In a way that will work with large databases and complex
> requirements?  The answer seems to be "well, everyone will have to write
> their own tool to do that" and that basically means that, at best, we're
> only providing half of a solution and expecting all of our users to
> provide the other half, and to always do it correctly and in a well
> written way.  Acknowledging that most users aren't going to actually do
> that and instead they'll implement half measures that aren't reliable
> shouldn't be seen as an endorsement of this approach.

I don't acknowledge that. I think it's possible to use tools like the
proposed option in a perfectly reliable way, and I've already given a
bunch of examples of how it could be done. Writing a file is not such
a complex operation that every bit of code that writes one reliably
has to be written by someone associated with the PostgreSQL project. I
strongly suspect that people who use a cloud provider's tools to
upload their backup files will be quite happy with the results, and if
they aren't, I hope they will blame the cloud provider's tool for
eating the data rather than this option for making it easy to give the
data to the thing that ate it.

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] advanced partition matching algorithm for partition-wise join
Next
From: Alexandra Wang
Date:
Subject: Re: Report error position in partition bound check