Re: where should I stick that backup? - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: where should I stick that backup? |
Date | |
Msg-id | CA+TgmoYNWntsXvL9w=KCr00TR57J+yiZQW_OSvw1Sq4jLygtEg@mail.gmail.com Whole thread Raw |
In response to | Re: where should I stick that backup? (Bruce Momjian <bruce@momjian.us>) |
Responses |
Re: where should I stick that backup?
Re: where should I stick that backup? |
List | pgsql-hackers |
On Thu, Apr 9, 2020 at 6:44 PM Bruce Momjian <bruce@momjian.us> wrote: > Good point, but if there are multiple APIs, it makes shell script > flexibility even more useful. This is really the key point for me. There are so many existing tools that store a file someplace that we really can't ever hope to support them all in core, or even to have well-written extensions that support them all available on PGXN or wherever. We need to integrate with the tools that other people have created, not try to reinvent them all in PostgreSQL. Now what I understand Stephen to be saying is that a lot of those tools actually suck, and I think that's a completely valid point. But I also think that it's unwise to decide that such problems are our problems rather than problems with those tools. That's a hole with no bottom. One thing I do think would be realistic would be to invent a set of tools that are perform certain local filesystem operations in a "hardened" way. Maybe a single tool with subcommands and options. So you could say, e.g. 'pgfile cp SOURCE TARGET' and it would create a temporary file in the target directory, write the contents of the source into that file, fsync the file, rename it into place, and do more fsyncs to make sure it's all durable in case of a crash. You could have a variant of this that instead of using the temporary file and rename in place approach, does the thing where you open the target file with O_CREAT|O_EXCL, writes the bytes, and then closes and fsyncs it. And you could have other things too, like 'pgfile mkdir DIR' to create a directory and fsync it for durability. A toolset like this would probably help people write better archive commands - it would certainly been an improvement over what we have now, anyway, and it could also be used with the feature that I proposed upthread. For example, if you're concerned that bzip might overwrite an existing file and that it might not fsync, then instead of saying: pg_basebackup -Ft --pipe-output 'bzip > %f.bz2' You could instead write: pg_basebackup -Ft --pipe-output 'bzip | pgfile create-exclusive - %f.bz2' or whatever we pick for actual syntax. And that provides a kind of hardening that can be used with any other command line tool that can be used as a filter. If you want to compress with bzip, encrypt, and then copy the file to a remote system, you could do: pg_basebackup -Ft --pipe-output 'bzip | gpg -e | ssh someuser@somehost pgfile create-exclusive - /backups/tuesday/%f.bz2' It is of course not impossible to teach pg_basebackup to do all of that stuff internally, but I have a really difficult time imagining us ever getting it done. There are just too many possibilities, and new ones arise all the time. A 'pgfile' utility wouldn't help at all for people who are storing to S3 or whatever. They could use 'aws s3' as a target for --pipe-output, but if it turns out that said tool is insufficiently robust in terms of overwriting files or doing fsyncs or whatever, then they might have problems. Now, Stephen or anyone else could choose to provide alternative tools with more robust behavior, and that would be great. But even if he didn't, people could take their chances with what's already out there. To me, that's a good thing. Yeah, maybe they'll do dumb things that don't work, but realistically, they can do dumb stuff without the proposed option too. > Yes, we certainly know how to do a file system copy, but what about > copying files to other things like S3? I don't know how we would do > that and allow users to change things like file paths or URLs. Right. I think it's key that we provide people with tools that are highly flexible and, ideally, also highly composable. (Incidentally, pg_basebackup already has an option to output the entire backup as a tarfile on standard output, and a user can already pipe that into any tool they like. However, it doesn't work with tablespaces. So you could think of this proposal as extending the existing functionality to cover that case.) -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company
pgsql-hackers by date: