Thread: Using base backup exclusion filters to reduce data transferred withpg_rewind
Using base backup exclusion filters to reduce data transferred withpg_rewind
From
Michael Paquier
Date:
Hi all, Many threads have touched $subject: https://www.postgresql.org/message-id/CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com https://www.postgresql.org/message-id/7c50423.5ad0.15e8b308b2f.Coremail.chjischj@163.com https://www.postgresql.org/message-id/1516736993.5599.4.camel@cybertec.at Thread [2] is a bit different as it discusses with WAL segments data which is useless at the end, still it aimed at reducing the amount of data transferred during a rewind. I am not tackling this problem in the patch set of this thread. This can be discussed separately. Attached is a patch set which implements what I have mentioned a couple of times in those threads by having pg_rewind reuse the same exclusion filtering rules as for base backups, as after a rewind a node enters in recovery and a bunch of data which is copied during the rewind finishes in the void. There has been as well many complains about the need to remove all the time replication slot data manually after a rewind, so I believe that this makes the user experience much easier with the tool. Something useful is replication slot data getting filtered out. In order to reach this point, I have been hacking the backend code and finished with quite a bit of shuffling around how system paths are hardcoded in many places, like pg_replslot, pg_wal, etc. Well you can think here about all the paths hardcoded in initdb.c. So I have introduced a couple of things: - src/include/pg_paths.h, a new header which gathers the set of system file and directory names. With this facility in place, the backend code loses knowledge of hardcoded system-related paths, including things like "base", "global", "pg_tblspc", "base/pg_control", etc. A fun consequence of that refactoring is that it is possible to just change pg_paths.h and have change the system paths of a PostgreSQL instance with one-liners. For example you could change "base/" to "foo/". This can make PostgreSQL more malleable for forks. It would be more simple to just hardcode more the paths but I think that this would not be manageable in the long-term, especially if similar logics spread more. - src/include/replication/basebackup_paths.h, which extracts the exclude rules now in basebackup.c into a header which can be consumed by both frontends and backends. This is useful for any backup tools. - pg_rewind update to ensure that the filters are working correctly. So the patch set attached is made of the following: - 0001, which refactors all hardcoded system paths into pg_paths.h. This modifies only initdb.c and basebackup.c to ease reviews. - 0002 spreads the path changes and the use of pg_paths.h across the core code. - 0003 moves the last set of definitions with backup_label, tablespace_map and pg_internal.init. - 0004 creates basebackup_paths.h, this can be consumed by pg_rewind. - 0005 makes the changes for pg_rewind. 0001~0003 can be merged together, I have just done a split to ease reviews. I am adding that to the next CF. Thanks, -- Michael
Attachment
- 0001-Refactor-path-definitions-into-a-single-header-file-.patch
- 0002-Replace-all-system-paths-hardcoded-with-data-from-pg.patch
- 0003-Add-backup_label-pg_internal.init-and-tablespace_map.patch
- 0004-Move-base-backup-filter-lists-into-their-own-header-.patch
- 0005-Use-filtering-list-of-base-backups-in-pg_rewind-to-e.patch
- signature.asc
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Anastasia Lubennikova
Date:
05.02.2018 10:10, Michael Paquier: > So the patch set attached is made of the following: > - 0001, which refactors all hardcoded system paths into pg_paths.h. > This modifies only initdb.c and basebackup.c to ease reviews. > - 0002 spreads the path changes and the use of pg_paths.h across the > core code. > - 0003 moves the last set of definitions with backup_label, > tablespace_map and pg_internal.init. > - 0004 creates basebackup_paths.h, this can be consumed by pg_rewind. > - 0005 makes the changes for pg_rewind. Thank you for this set of patches. This refactoring makes code way more convenient to read and change. Due to some merge conflicts, patch 0002 was not applying clearly. So I attach the updated version. I also noticed a couple of rows that were not updated, and wrote a patch 0006, which contains just minor changes and can be applied on top of any patch after 0003. Since these patches contain mostly cosmetic changes and do not break anything, I think it's fine to mark this thread as Ready For Committer without long discussion. -- Anastasia Lubennikova Postgres Professional: http://www.postgrespro.com The Russian Postgres Company
Attachment
- 0001-Refactor-path-definitions-into-a-single-header-file-.patch
- 0002-v2-Replace-all-system-paths-hardcoded-with-data-from-pg.patch
- 0003-Add-backup_label-pg_internal.init-and-tablespace_map.patch
- 0004-Move-base-backup-filter-lists-into-their-own-header-.patch
- 0005-Use-filtering-list-of-base-backups-in-pg_rewind-to-e.patch
- 0006-Minor-renames.patch
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote: > Since these patches contain mostly cosmetic changes and do not break > anything, I think it's fine to mark this thread as Ready For Committer > without long discussion. Thanks Anastasia for the review. The refactoring is quite intuitive I think, still that's perhaps a bit too much intrusive. Extra opinions about that are welcome. As I read again the patch set, please note that I am not much happy yet about the part for the handling of temporary files when applying the filters in pg_rewind and the lack inconsistency for file filters and directory filters... -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Fujii Masao
Date:
On Tue, Mar 13, 2018 at 9:48 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote: >> Since these patches contain mostly cosmetic changes and do not break >> anything, I think it's fine to mark this thread as Ready For Committer >> without long discussion. > > Thanks Anastasia for the review. The refactoring is quite intuitive I > think, still that's perhaps a bit too much intrusive. Extra opinions > about that are welcome. Personally it looks very intrusive, so I'm feeling inclined to push the changes without that refactoring. The 0005 patch doesn't seem to be right. The patch changes process_source_file() so that it excludes some directories like pg_notify from the filemap. However, after that, process_source_file() is called for the files under those "excluded" directories and then they are not excluded. For example, pg_notify/0000 file is unexpectedly included in the filemap and copied by pg_rewind. This problem happens because recurse_dir() has the following code and ISTM you forgot to take into account it. else if (S_ISDIR(fst.st_mode)) { callback(path, FILE_TYPE_DIRECTORY, 0, NULL); /* recurse to handle subdirectories */ recurse_dir(datadir, path, callback); } Regards, -- Fujii Masao
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote: > Personally it looks very intrusive, so I'm feeling inclined to push > the changes without that refactoring. Okay. Just moving the list of items from basebackup.c to a dedicated header is not sufficient though as things like RELCACHE_INIT_FILENAME as declared in headers which are backend-only. The same applies to LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and PG_DYNSHMEM_DIR. BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h. So what are you looking for? I see a couple of options: 1) The inclusive refactoring, which you are discarding. 2) A dedicated header, but some of the now-not-hardcoded values will need to be so. That's the list I am giving above. 3) A copy of the list from basebackup.c to src/bin/pg_rewind/. I would guess that you are looking for 2), but I am not sure if you imply that 3) would be acceptable or not. -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote: > > Personally it looks very intrusive, so I'm feeling inclined to push > > the changes without that refactoring. I've been reading over the first couple of posted patches and mulling over the changes proposed. They certainly touch a lot of places but they're pretty straight-forward changes and so I'm not really sure I'd call them all that intrusive. I don't completely buy off on the argument that having these #define's would make it easier for forks (we've had quite a few folks fork PG, but how many of them have actually changed "base"?) and I'm on the fence about if these will make our lives simpler down the road when it comes to changing the directory names (if we changed "pg_multixact/members" to be "pg_multixact/processes" or some such, I imagine we'd also go through and change PG_MULTIXACT_MEMBERS_DIR to be PG_MULTIXACT_PROCESSES_DIR anyway, so this doesn't result in much improvement there), but as it relates to new tool development and such, there's some value here because the compiler is going to complain if you say PG_COMMIT_TX_DIR and there is no such #define, whereas a similar mistake with a string literal of "pg_commit_tx" might end up getting missed and that would be unfortunate. Therefore, on the whole, I'm +1 on these changes, but I'd argue a bit about some of the choices made: - Let's have them all be PG_WHATEVER_DIR/FILE - Use WAL instead of XLOG (I thought we agreed new code would..?) - Remove extraneous #define's (most of these were, but DIRECTORY_LOCK_FILE was kept..? Let's use PG_POSTMASTER_PID_FILE throughout instead, with a comment that it's also used as a lock file). Might be nice to go back and modify other pre-existing #define's to use that form also, but perhaps not that big of a deal. I would have that be independent from this, in any case. > Okay. Just moving the list of items from basebackup.c to a dedicated > header is not sufficient though as things like RELCACHE_INIT_FILENAME as > declared in headers which are backend-only. The same applies to > LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and > PG_DYNSHMEM_DIR. > > BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h. > So what are you looking for? I see a couple of options: > 1) The inclusive refactoring, which you are discarding. > 2) A dedicated header, but some of the now-not-hardcoded values will > need to be so. That's the list I am giving above. > 3) A copy of the list from basebackup.c to src/bin/pg_rewind/. > > I would guess that you are looking for 2), but I am not sure if you > imply that 3) would be acceptable or not. Yeah, neither 2 or 3 really appeals to me. Option 1 does touch a number of places but in a pretty straight-forward way- and if there's a typo there, the compiler is likely to complain, so it seems like the risk is relatively low. Thanks! Stephen
Attachment
Re: Using base backup exclusion filters to reduce data transferred with pg_rewind
From
Tom Lane
Date:
Stephen Frost <sfrost@snowman.net> writes: > I don't completely buy off on the argument that having these #define's > would make it easier for forks (we've had quite a few folks fork PG, but > how many of them have actually changed "base"?) and I'm on the fence > about if these will make our lives simpler down the road when it comes > to changing the directory names I am distressed that nobody, apparently, is putting any weight on the back-patching pain that will result from widespread replacement of path names with macros. I don't buy that either we or anyone else will need to change these names in future, so I see pain and effectively no gain. Furthermore, I think it's completely silly to claim that this sort of thing is a gain in readability or understandability: - path = psprintf("base/%u/t%d_%u", - dbNode, backendId, relNode); + path = psprintf("%s/%u/t%d_%u", + PG_BASE_DIR, dbNode, backendId, relNode); For my money it's a loss on both points. The extra level of indirection is just obscuring what's actually happening and putting extra cognitive load on the reader. We have better things to spend our time on. regards, tom lane
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote: > Stephen Frost <sfrost@snowman.net> writes: >> I don't completely buy off on the argument that having these #define's >> would make it easier for forks (we've had quite a few folks fork PG, but >> how many of them have actually changed "base"?) and I'm on the fence >> about if these will make our lives simpler down the road when it comes >> to changing the directory names > > I am distressed that nobody, apparently, is putting any weight on the > back-patching pain that will result from widespread replacement of path > names with macros. I don't buy that either we or anyone else will need > to change these names in future, so I see pain and effectively no > gain. That's actually something I worry about as well (as the author!), which is why I qualify the changes as intrusive. At the end, I think that I would be tempted to just do #3, aka to keep a copy of the filter list in pg_rewind code while hardcoding a minimum of names and mention in both basebackup.c and pg_rewind code to not forget to update the filter list if necessary. New paths in the data folder are not added on a monthly basis either, and not all of them can be filtered out so that's easy to maintain. -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Stephen Frost
Date:
Greetings, * Michael Paquier (michael@paquier.xyz) wrote: > On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote: > > Stephen Frost <sfrost@snowman.net> writes: > >> I don't completely buy off on the argument that having these #define's > >> would make it easier for forks (we've had quite a few folks fork PG, but > >> how many of them have actually changed "base"?) and I'm on the fence > >> about if these will make our lives simpler down the road when it comes > >> to changing the directory names > > > > I am distressed that nobody, apparently, is putting any weight on the > > back-patching pain that will result from widespread replacement of path > > names with macros. I don't buy that either we or anyone else will need > > to change these names in future, so I see pain and effectively no > > gain. While I agree that some consideration should be given to the impact a change has on back-patching, I continue to be of the opinion that this would be a good change, for the reasons which I outlined up-thread. That said, I don't hold that position very strongly. > That's actually something I worry about as well (as the author!), which > is why I qualify the changes as intrusive. At the end, I think that I > would be tempted to just do #3, aka to keep a copy of the filter list in > pg_rewind code while hardcoding a minimum of names and mention in both > basebackup.c and pg_rewind code to not forget to update the filter list > if necessary. New paths in the data folder are not added on a monthly > basis either, and not all of them can be filtered out so that's easy to > maintain. Intrusive, at least from my viewpoint, is more about how the code is changed around and/or refactored than about the impact it has on back-patching. These are pretty mechanical changes, after all. I'm not particularly thrilled with the idea of having two independent lists of hard-coded paths to maintain, even with comments in both places saying to update the other list. Thanks! Stephen
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Fujii Masao
Date:
On Sun, Mar 25, 2018 at 5:06 PM, Michael Paquier <michael@paquier.xyz> wrote: > On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote: >> Stephen Frost <sfrost@snowman.net> writes: >>> I don't completely buy off on the argument that having these #define's >>> would make it easier for forks (we've had quite a few folks fork PG, but >>> how many of them have actually changed "base"?) and I'm on the fence >>> about if these will make our lives simpler down the road when it comes >>> to changing the directory names >> >> I am distressed that nobody, apparently, is putting any weight on the >> back-patching pain that will result from widespread replacement of path >> names with macros. I don't buy that either we or anyone else will need >> to change these names in future, so I see pain and effectively no >> gain. > > That's actually something I worry about as well (as the author!), which > is why I qualify the changes as intrusive. At the end, I think that I > would be tempted to just do #3, aka to keep a copy of the filter list in > pg_rewind code while hardcoding a minimum of names and mention in both > basebackup.c and pg_rewind code to not forget to update the filter list > if necessary. +1. It's better for us to focus on the code change of the fillter on pg_rewind rather than such "refactoring". As I told upthread, the previous patch has the problem where the files which should be skipped are not skipped. ISTM that, in pg_rewind, the fillter should be triggered in recurse_dir() not process_source_file(). BTW what should pg_rewind do when it finds the directory which should be skipped, in the source directory? In your patch, pg_rewind just tries to skip that directory at all. But isn't this right behavior? If that directory doesn't exist in the target directory(though I'm not sure if this situation is really possible), I'm thinking that pg_rewind should create that "empty" directory in the target. No? Regards, -- Fujii Masao
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote: > +1. It's better for us to focus on the code change of the fillter on pg_rewind > rather than such "refactoring". (filter takes one 'l', not two) Okay. I had my mind mostly focused on how to shape the exclusion list and get it shared between the base backup and pg_rewind, so let's move on with the duplicated list for now. I did not put much efforts into the pg_rewind portion to be honest. > As I told upthread, the previous patch has the > problem where the files which should be skipped are not skipped. ISTM that, > in pg_rewind, the filter should be triggered in recurse_dir() not > process_source_file(). If you put that into recurse_dir you completely ignore the case where changes are fetched by libpq. Doing the filtering when processing the file map has the advantage to take care of both the local and remote cases, which is why I am doing it there. So you would just get half of the cake and not the whole of it. > BTW what should pg_rewind do when it finds the directory which should be > skipped, in the source directory? In your patch, pg_rewind just tries to skip > that directory at all. But isn't this right behavior? If that directory doesn't > exist in the target directory (though I'm not sure if this situation is really > possible), I'm thinking that pg_rewind should create that "empty" directory > in the target. No? I am not exactly sure what you are coming up with here. The target server should have the same basic directory mapping as the source as the target has been initialized normally with initdb or a base backup from another node, so checking for the *contents* of directories is enough and keeps the code more simple, as the exclude filter entries are based on elements inherent to PostgreSQL internals. Please note as well that if a non-system directory is present on the source but not the target then it would get created on the target. At the end I have finished with the attached. I have taken the decision to not include as well xlog.h in pg_rewind to avoid having to drag a lot of backend-only headers like pg_resetwal does, which I prefer avoid as that's only hardcoding values for "backup_label" and "tablespace_map". This applies filters based on directory contents, so by running the regression tests you can see entries like the following ones: entry "postmaster.opts" excluded from source file list entry "pg_subtrans/0000" excluded from source file list entry "pg_notify/0000" excluded from source file list entry "base/12360/pg_internal.init" excluded from source file list entry "backup_label.old" excluded from source file list entry "global/pg_internal.init" excluded from source file list entry "postmaster.opts" excluded from target file list entry "pg_subtrans/0000" excluded from target file list entry "pg_notify/0000" excluded from target file list entry "base/12360/pg_internal.init" excluded from target file list entry "global/pg_internal.init" excluded from target file list Processing the filemap list on the target also matters in my opinion. When at recovery, all the previous files will be wiped out, and we should not remove either things like postmaster.pid as those are around to prevent corruption problems. Thanks, -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Sat, Mar 24, 2018 at 09:12:09PM -0400, Stephen Frost wrote: > Yeah, neither 2 or 3 really appeals to me. Option 1 does touch a number > of places but in a pretty straight-forward way- and if there's a typo > there, the compiler is likely to complain, so it seems like the risk is > relatively low. One example of place which can be easily consolidated is pg_wal whose definition is in xlog_internal.h. And there are a couple of other places which could be consolidated without major refactoring like what I proposed initially on this thread. I would suggest to focus on this effort on a separate thread later on. -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Fujii Masao
Date:
On Tue, Mar 27, 2018 at 10:55 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote: >> +1. It's better for us to focus on the code change of the fillter on pg_rewind >> rather than such "refactoring". > > (filter takes one 'l', not two) > > Okay. I had my mind mostly focused on how to shape the exclusion list > and get it shared between the base backup and pg_rewind, so let's move > on with the duplicated list for now. I did not put much efforts into > the pg_rewind portion to be honest. > >> As I told upthread, the previous patch has the >> problem where the files which should be skipped are not skipped. ISTM that, >> in pg_rewind, the filter should be triggered in recurse_dir() not >> process_source_file(). > > If you put that into recurse_dir you completely ignore the case where > changes are fetched by libpq. Doing the filtering when processing the > file map has the advantage to take care of both the local and remote > cases, which is why I am doing it there. OK. >> BTW what should pg_rewind do when it finds the directory which should be >> skipped, in the source directory? In your patch, pg_rewind just tries to skip >> that directory at all. But isn't this right behavior? If that directory doesn't >> exist in the target directory (though I'm not sure if this situation is really >> possible), I'm thinking that pg_rewind should create that "empty" directory >> in the target. No? > > I am not exactly sure what you are coming up with here. The target > server should have the same basic directory mapping as the source as the > target has been initialized normally with initdb or a base backup from > another node, so checking for the *contents* of directories is enough > and keeps the code more simple, as the exclude filter entries are based > on elements inherent to PostgreSQL internals. Please note as well that > if a non-system directory is present on the source but not the target > then it would get created on the target. > > At the end I have finished with the attached. Thanks for the patch! + snprintf(localpath, sizeof(localpath), "%s/", + excludeDirContents[excludeIdx]); + if (strstr(path, localpath) != NULL) This code is almost ok in practice, but using the check of "strstr(path, localpath) == path" is more robust here? + for (excludeIdx = 0; excludeFiles[excludeIdx] != NULL; excludeIdx++) + { + if (strstr(path, excludeFiles[excludeIdx]) != NULL) Using the following code instead is more robust? This original code is almost ok in practice, though. filename = last_dir_separator(path); if (filename == NULL) filename = path; else filename++; if (strcmp(filename, excludeFiles[excludeIdx]) == 0) + (everything except the relation files). Similarly to base backups, + the contents of the directories <filename>pg_dynshmem/</filename>, + <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>, + <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>, + <filename>pg_stat_tmp/</filename>, and + <filename>pg_subtrans/</filename> are omitted from the data copied + from the source cluster. Any file or directory beginning with + <filename>pgsql_tmp</filename> is omitted, as well as are + <filename>backup_label</filename>, + <filename>tablespace_map</filename>, + <filename>pg_internal.init</filename>, + <filename>postmaster.opts</filename> and + <filename>postmaster.pid</filename>. I don't think this description is necessary. The doc for pg_basebackup also doesn't seem to have this kind of description. So attached is the patch that I updated based on your patch and am thinking to apply. Regards, -- Fujii Masao
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote: > This code is almost ok in practice, but using the check of > "strstr(path, localpath) == path" is more robust here? No problems with that either. > Using the following code instead is more robust? > This original code is almost ok in practice, though. > > filename = last_dir_separator(path); > if (filename == NULL) > filename = path; > else > filename++; > if (strcmp(filename, excludeFiles[excludeIdx]) == 0) Indeed, using last_dir_separator is a better idea for files. I was looking for something like that actually. > + (everything except the relation files). Similarly to base backups, > + the contents of the directories <filename>pg_dynshmem/</filename>, > + <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>, > + <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>, > + <filename>pg_stat_tmp/</filename>, and > + <filename>pg_subtrans/</filename> are omitted from the data copied > + from the source cluster. Any file or directory beginning with > + <filename>pgsql_tmp</filename> is omitted, as well as are > + <filename>backup_label</filename>, > + <filename>tablespace_map</filename>, > + <filename>pg_internal.init</filename>, > + <filename>postmaster.opts</filename> and > + <filename>postmaster.pid</filename>. > > I don't think this description is necessary. The doc for pg_basebackup > also doesn't seem to have this kind of description. Those are listed in backup.sgml. And I really think that we should at least document that the same type of exclusion filters as base backups are used in pg_rewind. If you don't want to include the whole list, then let's use a reference to backup-lowlevel-base-backup-data. > So attached is the patch that I updated based on your patch and > am thinking to apply. Thanks. Except for the documentation part, I am OK for the changes proposed. -- Michael
Attachment
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Fujii Masao
Date:
On Wed, Mar 28, 2018 at 7:54 AM, Michael Paquier <michael@paquier.xyz> wrote: > On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote: >> This code is almost ok in practice, but using the check of >> "strstr(path, localpath) == path" is more robust here? > > No problems with that either. > >> Using the following code instead is more robust? >> This original code is almost ok in practice, though. >> >> filename = last_dir_separator(path); >> if (filename == NULL) >> filename = path; >> else >> filename++; >> if (strcmp(filename, excludeFiles[excludeIdx]) == 0) > > Indeed, using last_dir_separator is a better idea for files. I was > looking for something like that actually. > >> + (everything except the relation files). Similarly to base backups, >> + the contents of the directories <filename>pg_dynshmem/</filename>, >> + <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>, >> + <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>, >> + <filename>pg_stat_tmp/</filename>, and >> + <filename>pg_subtrans/</filename> are omitted from the data copied >> + from the source cluster. Any file or directory beginning with >> + <filename>pgsql_tmp</filename> is omitted, as well as are >> + <filename>backup_label</filename>, >> + <filename>tablespace_map</filename>, >> + <filename>pg_internal.init</filename>, >> + <filename>postmaster.opts</filename> and >> + <filename>postmaster.pid</filename>. >> >> I don't think this description is necessary. The doc for pg_basebackup >> also doesn't seem to have this kind of description. > > Those are listed in backup.sgml. And I really think that we should at > least document that the same type of exclusion filters as base backups > are used in pg_rewind. Okay, I revived that change in the doc. >> So attached is the patch that I updated based on your patch and >> am thinking to apply. > > Thanks. Except for the documentation part, I am OK for the changes > proposed. Committed. Thanks! Regards, -- Fujii Masao
Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
From
Michael Paquier
Date:
On Thu, Mar 29, 2018 at 05:01:40AM +0900, Fujii Masao wrote: > Committed. Thanks! Thanks for including as well the documentation changes and committing it. The result looks good to me. -- Michael