Re: Add -k/--link option to pg_combinebackup - Mailing list pgsql-hackers
From | Robert Haas |
---|---|
Subject | Re: Add -k/--link option to pg_combinebackup |
Date | |
Msg-id | CA+TgmobZDQp=AJH6YQNJ+bTm9MuEQ2We-QBT=OGO9bx0AErAQw@mail.gmail.com Whole thread Raw |
In response to | Re: Add -k/--link option to pg_combinebackup (Robert Haas <robertmhaas@gmail.com>) |
List | pgsql-hackers |
Some more review: + Use hard links instead of copying files to the synthetic backup. + Reconstruction of the synthetic backup might be faster (no file copying ) + and use less disk space. I think it would be good to add a caveat at the end of this sentence: and use less disk space, but care must be taken when using the output directory, because any modifications to that directory (for example, starting the server) can also affect the input directories. Likewise, changes to the input directories (for example, starting the server on the full backup) could affect the output directory. Thus, this option is best used when the input directories are only copies that will be removed after <application>pg_combienbackup</application> has completed. - which can result in near-instantaneous copying of the data files. + which can result in near-instantaneous copying of the data files, giving + the speed advantages of <option>-k</option>/<option>--link</option> + while leaving the input backups untouched. I don't think we need this hunk. + When <application>pg_combinebackup</application> is run with + <option>-k</option>/<option>--link</option>, the output synthetic backup may + share several files with the input backups because of the hard links it + creates. Any changes performed on files that were linked between any of the + input backups and the synthetic backup, will be reflected on both places. I don't like the wording of this for a couple of reasons. First, "several" means more than 1 and less than all, but we really have no idea: it could be none, one, some, many, or all. Second, we want to be a little careful not to define what a hard link means here. Also, these notes are a bit far from the documentation of --link, so somebody might miss them. I suggest that we can probably just leave out the notes you've added here if we add something like what I suggested above to the documentation of --link itself. + pg_log_warning_hint("If the input directories are not staging " + "directories, it is recommended to move the output" + "directory to another file system or machine " + "before performing changes to the files and/or " + "starting the cluster"); I don't think "staging directories" is a sufficiently well-known and unambiguous term that we should be using it here. Also, "move the output directory" seems like fuzzy wording. You can really only move files around within a filesystem; otherwise, you're making a copy and deleting the original. I think we can just drop this hint entirely. The primary warning message seems sufficient to me. I'm still not a fan of the changes to 010_links.pl; let's drop those. cfbot is fairly unhappy with your patch. See http://cfbot.cputube.org/israel-barth.html or https://cirrus-ci.com/github/postgresql-cfbot/postgresql/cf%2F5508 -- I haven't looked into what is going wrong here, and there may well be more than one thing, but nothing can be committed until this is all green. -- Robert Haas EDB: http://www.enterprisedb.com
pgsql-hackers by date: