Thread: [HACKERS] WIP: Restricting pg_rewind to data/wal dirs
+/* List of directories to synchronize:
+ * base data dirs (and ablespaces)
+ * wal/transaction data
+ * and that is it.
+ *
+ * This array is null-terminated to make
+ * it easy to expand
+ */
+
+const char *rewind_dirs[] = {
+ "base",
+ "global",
+ "pg_commit_ts",
+ "pg_logical",
+ "pg_multixact",
+ "pg_serial",
+ "pg_subtrans",
+ "pg_tblspc",
+ "pg_twophase",
+ "pg_wal",
+ "pg_xact",
+ NULL
+};
From there we iterate over this array for directory-based approaches in copy_fetch.c and with one query per directory in libpq_fetch.c. This also means shifting from the basic interface from PQexec to PQexecParams. It would be possible to move to binary formats too, but this was not done currently in order to simplify code review (that could be a separate independent patch at a later time).
Attachment
On Sat, Oct 28, 2017 at 4:52 PM, Chris Travers <chris.travers@adjust.com> wrote: > There are still some cleanup bits needed here but I wanted to get feedback > on my general direction. > > I hope to submit for commit fest soon if the general feedback is good. I think you should submit to the CommitFest regardless, to increase your chances of getting feedback. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris.travers@adjust.com> wrote: > The Solution: > The solution is a whitelist of directories specified which are the only ones > which are synchronised. The relevant part of this patch is: > > +/* List of directories to synchronize: > + * base data dirs (and ablespaces) > + * wal/transaction data > + * and that is it. > + * > + * This array is null-terminated to make > + * it easy to expand > + */ > > +const char *rewind_dirs[] = { > + "base", > + "global", > + "pg_commit_ts", > + "pg_logical", > + "pg_multixact", > + "pg_serial", > + "pg_subtrans", > + "pg_tblspc", > + "pg_twophase", > + "pg_wal", > + "pg_xact", > + NULL > +}; The problem with a white list, which is one reason why I do not favor it, is in the case where a feature adds in the data folder a new path for its data, particularly in the case where this is critical for a base backup. If this folder is not added in this list, then a rewind may be silently corrupted as well. There are also a set of directories in this list that we do not care about, pg_serial being one. pg_subtrans is a second, as it gets zeroed on startup. And if you think about it, pg_rewind is actually the *same* processing as a base backup taken using the replication protocol. So instead of this patch I would recommend using excludeFiles and excludeDirContents by moving this list to a common header where frontend and backend can refer to. Note that basebackup.h includes replnodes.h, so my thoughts is that you should split the current header with something like basebackup_internal.h which is backend-only, and have pg_rewind fetch the list of directories to automatically ignore as those ones. You can also simplify a bit the code of pg_rewind a bit so as things like postmaster.opts. On top of that, there is visibly no need to tweak the SQL query fetching the directory list (which is useful for debugging as shaped now actually), but just change process_source_file so as its content is not added in the file map. Then there is a second case where you do not want a set of folders to be included, but those can be useful by default, an example here being pg_wal where one might want to have an empty path to begin with. A third case is a set of folders that you do not care about, but depends on the deployment, being here "log" for example. For those you could just add an --exclude-path option which simply piles up paths into a linked list when called multiple times. This could happen with a second patch. > From there we iterate over this array for directory-based approaches in > copy_fetch.c and with one query per directory in libpq_fetch.c. This also > means shifting from the basic interface from PQexec to PQexecParams. It > would be possible to move to binary formats too, but this was not done > currently in order to simplify code review (that could be a separate > independent patch at a later time). - res = PQexec(conn, sql); + for (p = 0; rewind_dirs[p] != NULL; ++p) + { + const char *paths[1]; + paths[0] = rewind_dirs[p]; + res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0); Calling multiple times the query to list all directories is messy IMO. And this is N-costly processing if there are many files to look at, say many relation files. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Sat, Oct 28, 2017 at 4:22 AM, Chris Travers <chris.travers@adjust.com> wrote:
> The Solution:
> The solution is a whitelist of directories specified which are the only ones
> which are synchronised. The relevant part of this patch is:
>
> +/* List of directories to synchronize:
> + * base data dirs (and ablespaces)
> + * wal/transaction data
> + * and that is it.
> + *
> + * This array is null-terminated to make
> + * it easy to expand
> + */
>
> +const char *rewind_dirs[] = {
> + "base",
> + "global",
> + "pg_commit_ts",
> + "pg_logical",
> + "pg_multixact",
> + "pg_serial",
> + "pg_subtrans",
> + "pg_tblspc",
> + "pg_twophase",
> + "pg_wal",
> + "pg_xact",
> + NULL
> +};
The problem with a white list, which is one reason why I do not favour
it, is in the case where a feature adds in the data folder a new path
for its data, particularly in the case where this is critical for a
base backup. If this folder is not added in this list, then a rewind
may be silently corrupted as well. There are also a set of directories
in this list that we do not care about, pg_serial being one.
pg_subtrans is a second, as it gets zeroed on startup.
And if you think about it, pg_rewind is actually the *same* processing
as a base backup taken using the replication protocol. So instead of
this patch I would recommend using excludeFiles and excludeDirContents
by moving this list to a common header where frontend and backend can
refer to. Note that basebackup.h includes replnodes.h, so my thoughts
is that you should split the current header with something like
basebackup_internal.h which is backend-only, and have pg_rewind fetch
the list of directories to automatically ignore as those ones. You can
also simplify a bit the code of pg_rewind a bit so as things like
postmaster.opts. On top of that, there is visibly no need to tweak the
SQL query fetching the directory list (which is useful for debugging
as shaped now actually), but just change process_source_file so as its
content is not added in the file map.
Then there is a second case where you do not want a set of folders to
be included, but those can be useful by default, an example here being
pg_wal where one might want to have an empty path to begin with. A
third case is a set of folders that you do not care about, but depends
on the deployment, being here "log" for example. For those you could
just add an --exclude-path option which simply piles up paths into a
linked list when called multiple times. This could happen with a
second patch.
> From there we iterate over this array for directory-based approaches in
> copy_fetch.c and with one query per directory in libpq_fetch.c. This also
> means shifting from the basic interface from PQexec to PQexecParams. It
> would be possible to move to binary formats too, but this was not done
> currently in order to simplify code review (that could be a separate
> independent patch at a later time).
- res = PQexec(conn, sql);
+ for (p = 0; rewind_dirs[p] != NULL; ++p)
+ {
+ const char *paths[1];
+ paths[0] = rewind_dirs[p];
+ res = PQexecParams(conn, sql, 1, NULL, paths, NULL, NULL, 0);
Calling multiple times the query to list all directories is messy IMO.
And this is N-costly processing if there are many files to look at,
say many relation files.
--
Michael
On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris.travers@adjust.com> wrote: > Are there any cases right now where you have features added by extensions that write to directories which are requiredfor a rewind? In some of the stuff I maintain, I actually have one case now of a configuration file included with include_if_exists by postgresql.conf and is expected to be generated by a component that my code doing the rewind has no direct access on... I can control how pg_rewind kicks in, but I think that you would actually break silently the current code, which is scary especially if I don't control the code where pg_rewind is called when Postgres 11 gets integrated into the product I am thinking about, and even more scary if there is no way to include something. > The problem with an exclude list is that we cannot safely exclude > configuration files or logs (because these could be named anything), right? You have the exact same problem with base backups now, and any configuration files created by extensions are a per-case problem... The pattern that base backups now use is an exclude list. In the future I would rather see base backups and pg_rewind using the same infrastructure for both things: - pg_rewind should use the replication protocol with BASE_BACKUP. Having it rely on root access now is crazy. - BASE_BACKUP should have an option where it is possible to exclude custom paths. What you are proposing here would make both diverge more, which is in my opinion not a good approach. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 30, 2017 at 9:43 AM, Chris Travers <chris.travers@adjust.com> wrote:
> Are there any cases right now where you have features added by extensions that write to directories which are required for a rewind?
In some of the stuff I maintain, I actually have one case now of a
configuration file included with include_if_exists by postgresql.conf
and is expected to be generated by a component that my code doing the
rewind has no direct access on... I can control how pg_rewind kicks
in, but I think that you would actually break silently the current
code, which is scary especially if I don't control the code where
pg_rewind is called when Postgres 11 gets integrated into the product
I am thinking about, and even more scary if there is no way to include
something.
> The problem with an exclude list is that we cannot safely exclude
> configuration files or logs (because these could be named anything), right?
You have the exact same problem with base backups now, and any
configuration files created by extensions are a per-case problem...
The pattern that base backups now use is an exclude list. In the
future I would rather see base backups and pg_rewind using the same
infrastructure for both things:
- pg_rewind should use the replication protocol with BASE_BACKUP.
Having it rely on root access now is crazy.
- BASE_BACKUP should have an option where it is possible to exclude
custom paths.
What you are proposing here would make both diverge more, which is in
my opinion not a good approach.
--
Michael
On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers <chris.travers@adjust.com> wrote: > This also brings up a fairly major concern more generally about control by > the way. A lot of cases where pg_rewind is called, the user doesn't > necessarily have much control on how it is called. Moreover in many of > these cases, the user is probably not in a position to understand the > internals well enough to grasp what to check after. Likely they are not. > Right, but there is a use case difference between "I am taking a backup of a > server" and "I need to get the server into rejoin the replication as a > standby." The intersection of the first and second categories is not empty. Some take backups and use them to deploy standbys. > A really good example of where this is a big problem is with replication > slots. On a backup I would expect you want replication slots to be copied > in. I would actually expect the contrary, and please note that replication slots are not taken in a base backup, which is what the documentation says as well: https://www.postgresql.org/docs/10/static/protocol-replication.html "pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots, pg_stat_tmp, and pg_subtrans are copied as empty directories (even if they are symbolic links)." Some code I have with 9.6's pg_bsaebackup removes manually replication slots as this logic is new in 10 ;) >> The pattern that base backups now use is an exclude list. In the >> future I would rather see base backups and pg_rewind using the same >> infrastructure for both things: >> - pg_rewind should use the replication protocol with BASE_BACKUP. >> Having it rely on root access now is crazy. >> - BASE_BACKUP should have an option where it is possible to exclude >> custom paths. >> What you are proposing here would make both diverge more, which is in >> my opinion not a good approach. > > How does rep mgr or other programs using pg_rewind know what to exclude? Good question. Answers could come from folks such as David Steele (pgBackRest) or Marco (barman) whom I am attaching in CC. -- Michael -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers
<chris.travers@adjust.com> wrote:
> This also brings up a fairly major concern more generally about control by
> the way. A lot of cases where pg_rewind is called, the user doesn't
> necessarily have much control on how it is called. Moreover in many of
> these cases, the user is probably not in a position to understand the
> internals well enough to grasp what to check after.
Likely they are not.
> Right, but there is a use case difference between "I am taking a backup of a
> server" and "I need to get the server into rejoin the replication as a
> standby."
The intersection of the first and second categories is not empty. Some
take backups and use them to deploy standbys.
> A really good example of where this is a big problem is with replication
> slots. On a backup I would expect you want replication slots to be copied
> in.
I would actually expect the contrary, and please note that replication
slots are not taken in a base backup, which is what the documentation
says as well:
https://www.postgresql.org/docs/10/static/protocol- replication.html
"pg_dynshmem, pg_notify, pg_replslot, pg_serial, pg_snapshots,
pg_stat_tmp, and pg_subtrans are copied as empty directories (even if
they are symbolic links)."
Some code I have with 9.6's pg_bsaebackup removes manually replication
slots as this logic is new in 10 ;)
>> The pattern that base backups now use is an exclude list. In the
>> future I would rather see base backups and pg_rewind using the same
>> infrastructure for both things:
>> - pg_rewind should use the replication protocol with BASE_BACKUP.
>> Having it rely on root access now is crazy.
>> - BASE_BACKUP should have an option where it is possible to exclude
>> custom paths.
>> What you are proposing here would make both diverge more, which is in
>> my opinion not a good approach.
>
> How does rep mgr or other programs using pg_rewind know what to exclude?
Good question. Answers could come from folks such as David Steele
(pgBackRest) or Marco (barman) whom I am attaching in CC.
--
Michael
On 10/30/17 6:36 AM, Michael Paquier wrote: > On Mon, Oct 30, 2017 at 10:15 AM, Chris Travers >> >> How does rep mgr or other programs using pg_rewind know what to exclude? > > Good question. Answers could come from folks such as David Steele > (pgBackRest) or Marco (barman) whom I am attaching in CC. pgBackRest does not use pg_rewind. However, pgBackRest does use the same exclusion list for backups as pg_basebackup. -- -David david@pgmasters.net -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers
Hi;There are still some cleanup bits needed here but I wanted to get feedback on my general direction.I hope to submit for commit fest soon if the general feedback is good. Tests are passing (with adjustments intended for change of behaviour in one test script). I want to note that Crimson Thompson (also with Adjust) has provided some valuable discussion on this code.The Problem:pg_rewind makes no real guarantees regarding the final state of non-data files or directories. It.checks to see if the timeline has incremented (and therefore guarantees that if successful the data directories are on the same timeline) but for non-data files, these are clobbered if we rewind and left intact if not. These other files include postgresql.auto.conf, replication slots, and can include log files.Copying logs over to the new slave is something one probably never wants to do (same with replication slots), and the behaviours here can mask troubleshooting regarding what a particular master failed, cause wal segments to build up, automatic settings changes, and other undesirable behaviours. Together these make pg_rewind very difficult to use properly and push tasks to replication management tooling that the management tools are not in a good position to handle correctly.Designing the Fix:Two proposed fixes have been considered and one selected: Either we whitelist directories and only rewind those. The other was to allow shell globs to be provided that could be used to exclude files. The whitelisting solution was chosen for a number of reasons.When pg_rewind "succeeds" but not quite correctly the result is usually a corrupted installation which requires a base backup to replace it anyway. In a recovery situation, sometimes pressures occur which render human judgment less effective, and therefore glob exclusion strikes me as something which would probably do more harm than good, but maybe I don't understand the use case (comments as to why some people look at the other solution as preferable would be welcome).In going with the whitelisting solution, we chose to include all directories with WAL-related information. This allows more predicable interaction with other parts of the replication chain. Consequently not only do we copy pg_wal and pg_xact but also commit timestamps and so forth.The Solution:The solution is a whitelist of directories specified which are the only ones which are synchronised. The relevant part of this patch is:+/* List of directories to synchronize:
+ * base data dirs (and ablespaces)
+ * wal/transaction data
+ * and that is it.
+ *
+ * This array is null-terminated to make
+ * it easy to expand
+ */
+
+const char *rewind_dirs[] = {
+ "base",
+ "global",
+ "pg_commit_ts",
+ "pg_logical",
+ "pg_multixact",
+ "pg_serial",
+ "pg_subtrans",
+ "pg_tblspc",
+ "pg_twophase",
+ "pg_wal",
+ "pg_xact",
+ NULL
+};
From there we iterate over this array for directory-based approaches in copy_fetch.c and with one query per directory in libpq_fetch.c. This also means shifting from the basic interface from PQexec to PQexecParams. It would be possible to move to binary formats too, but this was not done currently in order to simplify code review (that could be a separate independent patch at a later time).
Testing Done:The extra files tests correctly test this change in behaviour. The tests have been modified, and diagnostics in cases of failures expanded, in this case. The other tests provide good coverage of whether pg_rewind does what it is supposed to do.Cleanup still required:I accidentally left Carp::Always in the PM in this perl module. It will be fixed.I renamed one of the functions used to have a more descriptive name but currently did not remove the old function yet.Feedback is very welcome. pg_rewind is a very nice piece of software. I am hoping that these sorts of changes will help ensure that it is easier to use and provides more predictable results.--Best Regards,Chris TraversDatabase AdministratorSaarbrücker Straße 37a, 10405 Berlin
Attachment
On Wed, Nov 1, 2017 at 5:58 PM, Chris Travers <chris.travers@adjust.com> wrote: > I would also like to address a couple of important points here: > > 1. I think default restrictions plus additional paths is the best, safest > way forward. Excluding shell-globs doesn't solve the "I need this > particular config file" very well particularly if we want to support this > outside of an internal environment. Shell globs also tend to be overly > inclusive and so if you exclude based on them, you run into a chance that > your rewind is corrupt for being overly exclusive. > > 2. I would propose any need for an additional paths be specified using an > --Include-path directive. This could be specified multiple times and could > point to a file or directory which would be added to the paths rewound. I > have a patch for this mostly done but I am concerned that these sorts of > changes result in a combination of changes that are easier to review > separately than together. So if needed, I can add it or in a separate patch > following. > > 3. I think it would be a mistake to tie backup solutions in non-replicated > environments to replication use cases, and vice versa. Things like > replication slots (used for streaming backups) have different considerations > in different environments. Rather than build the same infrastructure first, > I think it is better to support different use cases well and then build > common infrastructure to support the different cases. I am not against > building common infrastructure for pg_rewind and pg_basebackup. I am very > much against having the core guarantees being the exact same. +const char *rewind_dirs[] = { + "base", // Default tablespace + "global", // global tablespace + "pg_commit_ts", // In case we need to do PITR before up to sync + "pg_logical", // WAL related and no good reason to exclude + "pg_multixact", // WAL related and may need for vacuum-related reasons + "pg_tblspc", // Pther tablespaces + "pg_twophase", // mostly to *clear* + "pg_wal", // WAL + "pg_xact", // Commits of transactions + NULL +}; Incorrect comment format here. + in full. The advantage of <application>pg_rewind</> over taking a new base + backup, or tools like <application>rsync</>, is that <application>pg_rewind</> This generates warnings on HEAD when compiling the documentation. Please note that I am still -1 for using a methodology different than what is used for base backups with an inclusive method, and would much prefer an exclusive method by reusing the existing entries in basebackup.c. Still, I am the only one who expressed an opinion about this patch, so moved to next CF with waiting on author as status. -- Michael
On 11/29/17 12:46 AM, Michael Paquier wrote:> On Wed, Nov 1, 2017 at 5:58 PM, Chris Travers <chris.travers@adjust.com> wrote: > > Please note that I am still -1 for using a methodology different than > what is used for base backups with an inclusive method, and would much > prefer an exclusive method by reusing the existing entries in > basebackup.c. Still, I am the only one who expressed an opinion about > this patch, so moved to next CF with waiting on author as status. I'm also -1 on the inclusive methodology. Forgetting something in the exclusion list just makes the process less efficient while forgetting something in the inclusion list may mean breakage. Furthermore, maintaining two lists does not sound like a good idea. I worry that extensions using generic WAL might be writing in places we don't expect and don't think manually adding inclusions is a good solution as the requirement will not be obvious to the user. Regards, -- -David david@pgmasters.net
Chris, * Chris Travers (chris.travers@adjust.com) wrote: > Attached is the patch as submitted for commitfest. This has been stuck in Waiting for Author since before the commitfest started. I'll try to kick-start it but it seems like it's stuck because there's a fundamental disagreement about if we should be using include or exclude for pg_rewind (and, possibly, pg_basebackup, et al). > Please note, I am not adverse to adding an additional --Include-path > directive if that avoids backwards-compatibility problems. However the > patch is complex enough I would really prefer review on the rest of it to > start first. This doesn't strike me as perfectly trivial and I think it is > easier to review pieces separately. I have already started on the > --Include-path directive and could probably get it attached to a later > version of the patch very soon. > > I would also like to address a couple of important points here: > > 1. I think default restrictions plus additional paths is the best, safest > way forward. Excluding shell-globs doesn't solve the "I need this > particular config file" very well particularly if we want to support this > outside of an internal environment. Shell globs also tend to be overly > inclusive and so if you exclude based on them, you run into a chance that > your rewind is corrupt for being overly exclusive. There's been a number of strong points made as to why pg_basebackup uses an exclude list but you seem to keep coming back to a concern around shell globs when considering exclusion lists. Having an exclude list doesn't mean that shell globs have to be used to in the exclude list and, indeed, there are no such globs used in the exclude list for pg_basebackup today- every single file or directory excluded by pg_basebackup is an explicit file or directory (compared using a full strcmp()). Further, the specific concerns raised are about things which are very clearly able to be dealt with using specific paths (postgresql.auto.conf, pg_log) and which an administrator is likely to be familiar with (such as pg_log, in particular). Personally, I'm a big advocate of *not* having PG's logs in the data directory and part of it is because, really, the data directory is PG's purview while logs are for the administrator. I wasn't ever a fan of postgresql.auto.conf and I'm not surprised that we're having this discussion about if it should be pulled over by pg_rewind or not. > 3. I think it would be a mistake to tie backup solutions in non-replicated > environments to replication use cases, and vice versa. Things like > replication slots (used for streaming backups) have different > considerations in different environments. Rather than build the same > infrastructure first, I think it is better to support different use cases > well and then build common infrastructure to support the different cases. > I am not against building common infrastructure for pg_rewind and > pg_basebackup. I am very much against having the core guarantees being the > exact same. Backup solutions are already involved in understanding of replicated environments as they can be used to back up from replicas rather than the primary (or perhaps using both in some cases), so I'm not really sure why you're argueing that backups are somehow different between non-replicated and replicated environments. As it relates to the question about if the core guarantees between pg_basebackup and pg_rewind being the same or not, I've not see much argument for why they'd be different. The intent of pg_rewind is to do what pg_basebackup would do, but in a more efficient manner, as discussed in the documentation for pg_rewind. I would think the next step here, as Michael suggested very early on in this thread, would be to bring the exclude list and perhaps logic for pg_basebackup into the common code and have pg_rewind leverage that instead of having its own code that largely does the same and then adding an option to exclude additional items to that. There's no sense having pg_rewind operate on files that are going to end up getting wiped out when recovery starts anyway. Perhaps there's a use-case for overriding the exclude list with a 'include' option too, but I'm not convinced there is. Thanks! Stephen
Attachment
On Mon, Jan 22, 2018 at 03:12:58PM -0500, Stephen Frost wrote: > I would think the next step here, as Michael suggested very early on in > this thread, would be to bring the exclude list and perhaps logic for > pg_basebackup into the common code and have pg_rewind leverage that > instead of having its own code that largely does the same and then > adding an option to exclude additional items to that. There's no sense > having pg_rewind operate on files that are going to end up getting wiped > out when recovery starts anyway. Perhaps there's a use-case for > overriding the exclude list with a 'include' option too, but I'm not > convinced there is. Me neither. I'll look into getting something for the next commit fest. There have been way too many complaints about how pg_rewind copies too much data for nothing to ignore doing something (the last one about pg_replslot data). A good first step would be what you are writing in the paragraph above, so I intend to do that as I am sure that it would be a good addition. For now I have marked the proposed patches as returned with feedback. -- Michael