Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | CA+TgmoZ3Ak5V+b6J8T9UOD-c6Jt0mQ1o0tM3cnY14Y3Zyi4EMQ@mail.gmail.com Whole thread Raw |
In response to | Re: where should I stick that backup? (Andres Freund <andres@anarazel.de>) |
Responses |
Re: where should I stick that backup?
|
List | pgsql-hackers |
On Tue, Apr 14, 2020 at 9:50 PM Andres Freund <andres@anarazel.de> wrote: > On 2020-04-14 11:38:03 -0400, Robert Haas wrote: > > I'm fairly deeply uncomfortable with what Andres is proposing. I see > > that it's very powerful, and can do a lot of things, and that if > > you're building something that does sophisticated things with storage, > > you probably want an API like that. It does a great job making > > complicated things possible. However, I feel that it does a lousy job > > making simple things simple. > > I think it's pretty much exactly the opposite. Your approach seems to > move all the complexity to the user, having to build entire combination > of commands themselves. Instead of having one or two default commands > that do backups in common situations, everyone has to assemble them from > pieces. I think we're mostly talking about different things. I was speaking mostly about the difficulty of developing it. I agree that a project which is easier to develop is likely to provide fewer benefits to the end user. On the other hand, it might be more likely to get done, and projects that don't get done provide few benefits to users. I strongly believe we need an incremental approach here. > In general, I think it's good to give expert users the ability to > customize things like backups and archiving. But defaulting to every > non-expert user having to all that expert work (or coyping it from bad > examples) is one of the most user hostile things in postgres. I'm not against adding more built-in compression algorithms, but I also believe (as I have several times now) that the world moves a lot faster than PostgreSQL, which has not added a single new compression algorithm to pg_basebackup ever. We had 1 compression algorithm in 2011, and we still have that same 1 algorithm today. So, either nobody cares, or adding new algorithms is sufficiently challenging - for either technical or political reasons - that nobody's managed to get it done. I think having a simple framework in pg_basebackup for plugging in new algorithms would make it noticeably simpler to add LZ4 or whatever your favorite compression algorithm is. And I think having that framework also be able to use shell commands, so that users don't have to wait a decade or more for new choices to show up, is also a good idea. I don't disagree that the situation around things like archive_command is awful, but a good part of that is that every time somebody shows up and says "hey, let's try to make a small improvement," between two and forty people show up and start explaining why it's still going to be terrible. Eventually the pile of requirements get so large, and/or there are enough contradictory opinions, that the person who made the proposal for how to improve things gives up and leaves. So then we still have the documentation suggesting "cp". When people - it happens to be me in this case, but the problem is much more general - show up and propose improvements to difficult areas, we can and should give them good advice on how to improve their proposals. But we should not insist that they have to build something incredibly complex and grandiose and solve every problem in that area. We should be happy if we get ANY improvement in a difficult area, not send dozens of angry emails complaining that their proposal is imperfect. > I really really don't understand this. Are you suggesting that for > server side compression etc we're going to add the ability to specify > shell commands as argument to the base backup command? That seems so > obviously a non-starter? A good default for backup configurations > should be that the PG user that the backup is done under is only allowed > to do that, and not that it directly has arbitrary remote command > execution. I hadn't really considered that aspect, and that's certainly a concern. But I also don't understand why you think it's somehow a big deal. My point is not that clients should have the ability to execute arbitrary commands on the server. It's that shelling out to an external binary provided by the operating system is a reasonable thing to do, versus having everything have to be done by binaries that we create. Which I think is what you are also saying right here: > But the tool speaking the protocol can just allow piping through > whatever tool? Given that there likely is benefits to either doing > things on the client side or on the server side, it seems inevitable > that there's multiple places that would make sense to have the > capability for? Unless I am misunderstanding you, this is exactly what i was proposing, and have been proposing since the first email on the thread. > > It's possibly not the exact same thing. A special might, for example, > > use multiple threads for parallel compression rather than multiple > > processes, perhaps gaining a bit of efficiency. But it's doubtful > > whether all users care about such marginal improvements. > > Marginal improvements? Compression scales decently well with the number > of cores. pg_basebackup's compression is useless because it's so slow > (and because its clientside, but that's IME the lesser issue). I feel I > must be misunderstanding what you mean here. > > gzip - vs pigz -p $numcores on my machine: 180MB/s vs 2.5GB/s. The > latter will still sometimes be a bottleneck (it's a bottlenck in pigz, > not available compression cycles), but a lot less commonly than 180. That's really, really, really not what I was talking about. I'm quite puzzled by your reading of this email. You seem to have missed my point entirely. I don't know whether that's because I did a poor job writing it or because you didn't read it carefully enough or what. What I'm saying is: I don't immediately wish to undertake the problem of building a new wire protocol that the client and server can use to talk to external binaries. I would prefer to start with a C API, because I think it will be far less work and still able to meet a number of important needs. The new wire protocol that can be used to talk to external binaries can be added later. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: