Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | 20200415015008.t5jmsoaanve2gavg@alap3.anarazel.de Whole thread Raw |
In response to | Re: where should I stick that backup? (Robert Haas <robertmhaas@gmail.com>) |
Responses |
Re: where should I stick that backup?
|
List | pgsql-hackers |
Hi, On 2020-04-14 11:38:03 -0400, Robert Haas wrote: > I'm fairly deeply uncomfortable with what Andres is proposing. I see > that it's very powerful, and can do a lot of things, and that if > you're building something that does sophisticated things with storage, > you probably want an API like that. It does a great job making > complicated things possible. However, I feel that it does a lousy job > making simple things simple. I think it's pretty much exactly the opposite. Your approach seems to move all the complexity to the user, having to build entire combination of commands themselves. Instead of having one or two default commands that do backups in common situations, everyone has to assemble them from pieces. Moved from later in your email, since it seems to make more sense to have it here: > All they're going to see is that they can use gzip and maybe lz4 > because we provide the necessary special magic tools to integrate with > those, but for some reason we don't have a special magic tool that > they can use with their own favorite compressor, and so they can't use > it. I think people are going to find that fairly unhelpful. I have no problem with providing people with the opportunity to use their personal favorite compressor, but forcing them to have to do that, and to ensure it's installed etc, strikes me as a spectacurly bad default situation. Most people don't have the time to research which compression algorithms work the best for which precise situation. How do you imagine a default scripted invocation of the new backup stuff to look like? Having to specify multiple commandline "fragments" for compression, storing files, ... can't be what we want the common case should look like. It'll just again lead to everyone copy & pasting examples that all are wrong in different ways. They'll not at all work across platforms (or often not across OS versions). In general, I think it's good to give expert users the ability to customize things like backups and archiving. But defaulting to every non-expert user having to all that expert work (or coyping it from bad examples) is one of the most user hostile things in postgres. > Also, I don't really see what's wrong with the server forking > processes that exec("/usr/bin/lz4") or whatever. We do similar things > in other places and, while it won't work for cases where you want to > compress a shazillion files, that's not really a problem here anyway. > At least at the moment, the server-side format is *always* tar, so the > problem of needing a separate subprocess for every file in the data > directory does not arise. I really really don't understand this. Are you suggesting that for server side compression etc we're going to add the ability to specify shell commands as argument to the base backup command? That seems so obviously a non-starter? A good default for backup configurations should be that the PG user that the backup is done under is only allowed to do that, and not that it directly has arbitrary remote command execution. > Suppose you want to compress using your favorite compression > program. Well, you can't. Your favorite compression program doesn't > speak the bespoke PostgreSQL protocol required for backup > plugins. Neither does your favorite encryption program. Either would > be perfectly happy to accept a tarfile on stdin and dump out a > compressed or encrypted version, as the case may be, on stdout, but > sorry, no such luck. You need a special program that speaks the magic > PostgreSQL protocol but otherwise does pretty much the exact same > thing as the standard one. But the tool speaking the protocol can just allow piping through whatever tool? Given that there likely is benefits to either doing things on the client side or on the server side, it seems inevitable that there's multiple places that would make sense to have the capability for? > It's possibly not the exact same thing. A special might, for example, > use multiple threads for parallel compression rather than multiple > processes, perhaps gaining a bit of efficiency. But it's doubtful > whether all users care about such marginal improvements. Marginal improvements? Compression scales decently well with the number of cores. pg_basebackup's compression is useless because it's so slow (and because its clientside, but that's IME the lesser issue). I feel I must be misunderstanding what you mean here. gzip - vs pigz -p $numcores on my machine: 180MB/s vs 2.5GB/s. The latter will still sometimes be a bottleneck (it's a bottlenck in pigz, not available compression cycles), but a lot less commonly than 180. Greetings, Andres Freund
pgsql-hackers by date: