Thread: Using base backup exclusion filters to reduce data transferred withpg_rewind

Using base backup exclusion filters to reduce data transferred withpg_rewind

From
Michael Paquier
Date:
Hi all,

Many threads have touched $subject:
https://www.postgresql.org/message-id/CAN-RpxDPE4baiMMJ6TLd6AiUvrG=YrC05tGxrgp4aUutH9j5TQ@mail.gmail.com
https://www.postgresql.org/message-id/7c50423.5ad0.15e8b308b2f.Coremail.chjischj@163.com
https://www.postgresql.org/message-id/1516736993.5599.4.camel@cybertec.at

Thread [2] is a bit different as it discusses with WAL segments data
which is useless at the end, still it aimed at reducing the amount of
data transferred during a rewind.  I am not tackling this problem in the
patch set of this thread.  This can be discussed separately.

Attached is a patch set which implements what I have mentioned a couple
of times in those threads by having pg_rewind reuse the same exclusion
filtering rules as for base backups, as after a rewind a node enters in
recovery and a bunch of data which is copied during the rewind finishes
in the void.  There has been as well many complains about the need to
remove all the time replication slot data manually after a rewind, so I
believe that this makes the user experience much easier with the tool.
Something useful is replication slot data getting filtered out.

In order to reach this point, I have been hacking the backend code and
finished with quite a bit of shuffling around how system paths are
hardcoded in many places, like pg_replslot, pg_wal, etc.  Well you can
think here about all the paths hardcoded in initdb.c.  So I have
introduced a couple of things:
- src/include/pg_paths.h, a new header which gathers the set of system
file and directory names.  With this facility in place, the backend code
loses knowledge of hardcoded system-related paths, including things like
"base", "global", "pg_tblspc", "base/pg_control", etc.  A fun
consequence of that refactoring is that it is possible to just change
pg_paths.h and have change the system paths of a PostgreSQL instance
with one-liners.  For example you could change "base/" to "foo/".  This
can make PostgreSQL more malleable for forks.  It would be more simple
to just hardcode more the paths but I think that this would not be
manageable in the long-term, especially if similar logics spread more.
- src/include/replication/basebackup_paths.h, which extracts the exclude
rules now in basebackup.c into a header which can be consumed by both
frontends and backends.  This is useful for any backup tools.
- pg_rewind update to ensure that the filters are working correctly.

So the patch set attached is made of the following:
- 0001, which refactors all hardcoded system paths into pg_paths.h.
This modifies only initdb.c and basebackup.c to ease reviews.
- 0002 spreads the path changes and the use of pg_paths.h across the
core code.
- 0003 moves the last set of definitions with backup_label,
tablespace_map and pg_internal.init.
- 0004 creates basebackup_paths.h, this can be consumed by pg_rewind.
- 0005 makes the changes for pg_rewind.

0001~0003 can be merged together, I have just done a split to ease
reviews.

I am adding that to the next CF.
Thanks,
--
Michael

Attachment

Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Anastasia Lubennikova
Date:
05.02.2018 10:10, Michael Paquier:
> So the patch set attached is made of the following:
> - 0001, which refactors all hardcoded system paths into pg_paths.h.
> This modifies only initdb.c and basebackup.c to ease reviews.
> - 0002 spreads the path changes and the use of pg_paths.h across the
> core code.
> - 0003 moves the last set of definitions with backup_label,
> tablespace_map and pg_internal.init.
> - 0004 creates basebackup_paths.h, this can be consumed by pg_rewind.
> - 0005 makes the changes for pg_rewind.
Thank you for this set of patches.
This refactoring makes code way more convenient to read and change.

Due to some merge conflicts, patch 0002 was not applying clearly.
So I attach the updated version.
I also noticed a couple of rows that were not updated, and wrote a patch 
0006,
which contains just minor changes and can be applied on top of any patch 
after 0003.

Since these patches contain mostly cosmetic changes and do not break 
anything,
I think it's fine to mark this thread as Ready For Committer without 
long discussion.

-- 
Anastasia Lubennikova
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company


Attachment

Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote:
> Since these patches contain mostly cosmetic changes and do not break
> anything, I think it's fine to mark this thread as Ready For Committer
> without long discussion.

Thanks Anastasia for the review.  The refactoring is quite intuitive I
think, still that's perhaps a bit too much intrusive.  Extra opinions
about that are welcome.

As I read again the patch set, please note that I am not much happy yet
about the part for the handling of temporary files when applying the
filters in pg_rewind and the lack inconsistency for file filters and
directory filters...
--
Michael

Attachment
On Tue, Mar 13, 2018 at 9:48 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Mar 13, 2018 at 01:16:27PM +0300, Anastasia Lubennikova wrote:
>> Since these patches contain mostly cosmetic changes and do not break
>> anything, I think it's fine to mark this thread as Ready For Committer
>> without long discussion.
>
> Thanks Anastasia for the review.  The refactoring is quite intuitive I
> think, still that's perhaps a bit too much intrusive.  Extra opinions
> about that are welcome.

Personally it looks very intrusive, so I'm feeling inclined to push
the changes without that refactoring.

The 0005 patch doesn't seem to be right. The patch changes process_source_file()
so that it excludes some directories like pg_notify from the filemap. However,
after that, process_source_file() is called for the files under those "excluded"
directories and then they are not excluded. For example, pg_notify/0000 file is
unexpectedly included in the filemap and copied by pg_rewind.

This problem happens because recurse_dir() has the following code and
ISTM you forgot to take into account it.

    else if (S_ISDIR(fst.st_mode))
    {
        callback(path, FILE_TYPE_DIRECTORY, 0, NULL);
        /* recurse to handle subdirectories */
        recurse_dir(datadir, path, callback);
    }

Regards,

-- 
Fujii Masao


Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote:
> Personally it looks very intrusive, so I'm feeling inclined to push
> the changes without that refactoring.

Okay.  Just moving the list of items from basebackup.c to a dedicated
header is not sufficient though as things like RELCACHE_INIT_FILENAME as
declared in headers which are backend-only.  The same applies to
LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and
PG_DYNSHMEM_DIR.

BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h.
So what are you looking for?  I see a couple of options:
1) The inclusive refactoring, which you are discarding.
2) A dedicated header, but some of the now-not-hardcoded values will
need to be so.  That's the list I am giving above.
3) A copy of the list from basebackup.c to src/bin/pg_rewind/.

I would guess that you are looking for 2), but I am not sure if you
imply that 3) would be acceptable or not.
--
Michael

Attachment
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Fri, Mar 23, 2018 at 01:38:38AM +0900, Fujii Masao wrote:
> > Personally it looks very intrusive, so I'm feeling inclined to push
> > the changes without that refactoring.

I've been reading over the first couple of posted patches and mulling
over the changes proposed.  They certainly touch a lot of places but
they're pretty straight-forward changes and so I'm not really sure I'd
call them all that intrusive.

I don't completely buy off on the argument that having these #define's
would make it easier for forks (we've had quite a few folks fork PG, but
how many of them have actually changed "base"?) and I'm on the fence
about if these will make our lives simpler down the road when it comes
to changing the directory names (if we changed "pg_multixact/members" to
be "pg_multixact/processes" or some such, I imagine we'd also go through
and change PG_MULTIXACT_MEMBERS_DIR to be PG_MULTIXACT_PROCESSES_DIR
anyway, so this doesn't result in much improvement there), but as it
relates to new tool development and such, there's some value here
because the compiler is going to complain if you say
PG_COMMIT_TX_DIR and there is no such #define, whereas a similar mistake
with a string literal of "pg_commit_tx" might end up getting missed and
that would be unfortunate.

Therefore, on the whole, I'm +1 on these changes, but I'd argue a bit
about some of the choices made:

- Let's have them all be PG_WHATEVER_DIR/FILE
- Use WAL instead of XLOG (I thought we agreed new code would..?)
- Remove extraneous #define's (most of these were, but
  DIRECTORY_LOCK_FILE was kept..?  Let's use PG_POSTMASTER_PID_FILE
  throughout instead, with a comment that it's also used as a lock
  file).

Might be nice to go back and modify other pre-existing #define's to use
that form also, but perhaps not that big of a deal.  I would have that
be independent from this, in any case.

> Okay.  Just moving the list of items from basebackup.c to a dedicated
> header is not sufficient though as things like RELCACHE_INIT_FILENAME as
> declared in headers which are backend-only.  The same applies to
> LOG_METAINFO_DATAFILE_TMP, PG_AUTOCONF_FILENAME, PG_STAT_TMP_DIR and
> PG_DYNSHMEM_DIR.
>
> BACKUP_LABEL_FILE and TABLESPACE_MAP can be included though via xlog.h.
> So what are you looking for?  I see a couple of options:
> 1) The inclusive refactoring, which you are discarding.
> 2) A dedicated header, but some of the now-not-hardcoded values will
> need to be so.  That's the list I am giving above.
> 3) A copy of the list from basebackup.c to src/bin/pg_rewind/.
>
> I would guess that you are looking for 2), but I am not sure if you
> imply that 3) would be acceptable or not.

Yeah, neither 2 or 3 really appeals to me.  Option 1 does touch a number
of places but in a pretty straight-forward way- and if there's a typo
there, the compiler is likely to complain, so it seems like the risk is
relatively low.

Thanks!

Stephen

Attachment
Stephen Frost <sfrost@snowman.net> writes:
> I don't completely buy off on the argument that having these #define's
> would make it easier for forks (we've had quite a few folks fork PG, but
> how many of them have actually changed "base"?) and I'm on the fence
> about if these will make our lives simpler down the road when it comes
> to changing the directory names

I am distressed that nobody, apparently, is putting any weight on the
back-patching pain that will result from widespread replacement of path
names with macros.  I don't buy that either we or anyone else will need
to change these names in future, so I see pain and effectively no gain.

Furthermore, I think it's completely silly to claim that this sort of
thing is a gain in readability or understandability:

-                path = psprintf("base/%u/t%d_%u",
-                                dbNode, backendId, relNode);
+                path = psprintf("%s/%u/t%d_%u",
+                                PG_BASE_DIR, dbNode, backendId, relNode);

For my money it's a loss on both points.  The extra level of indirection
is just obscuring what's actually happening and putting extra cognitive
load on the reader.

We have better things to spend our time on.

            regards, tom lane


Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
> Stephen Frost <sfrost@snowman.net> writes:
>> I don't completely buy off on the argument that having these #define's
>> would make it easier for forks (we've had quite a few folks fork PG, but
>> how many of them have actually changed "base"?) and I'm on the fence
>> about if these will make our lives simpler down the road when it comes
>> to changing the directory names
>
> I am distressed that nobody, apparently, is putting any weight on the
> back-patching pain that will result from widespread replacement of path
> names with macros.  I don't buy that either we or anyone else will need
> to change these names in future, so I see pain and effectively no
> gain.

That's actually something I worry about as well (as the author!), which
is why I qualify the changes as intrusive.  At the end, I think that I
would be tempted to just do #3, aka to keep a copy of the filter list in
pg_rewind code while hardcoding a minimum of names and mention in both
basebackup.c and pg_rewind code to not forget to update the filter list
if necessary.  New paths in the data folder are not added on a monthly
basis either, and not all of them can be filtered out so that's easy to
maintain.
--
Michael

Attachment
Greetings,

* Michael Paquier (michael@paquier.xyz) wrote:
> On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
> > Stephen Frost <sfrost@snowman.net> writes:
> >> I don't completely buy off on the argument that having these #define's
> >> would make it easier for forks (we've had quite a few folks fork PG, but
> >> how many of them have actually changed "base"?) and I'm on the fence
> >> about if these will make our lives simpler down the road when it comes
> >> to changing the directory names
> >
> > I am distressed that nobody, apparently, is putting any weight on the
> > back-patching pain that will result from widespread replacement of path
> > names with macros.  I don't buy that either we or anyone else will need
> > to change these names in future, so I see pain and effectively no
> > gain.

While I agree that some consideration should be given to the impact a
change has on back-patching, I continue to be of the opinion that this
would be a good change, for the reasons which I outlined up-thread.
That said, I don't hold that position very strongly.

> That's actually something I worry about as well (as the author!), which
> is why I qualify the changes as intrusive.  At the end, I think that I
> would be tempted to just do #3, aka to keep a copy of the filter list in
> pg_rewind code while hardcoding a minimum of names and mention in both
> basebackup.c and pg_rewind code to not forget to update the filter list
> if necessary.  New paths in the data folder are not added on a monthly
> basis either, and not all of them can be filtered out so that's easy to
> maintain.

Intrusive, at least from my viewpoint, is more about how the code is
changed around and/or refactored than about the impact it has on
back-patching.  These are pretty mechanical changes, after all.

I'm not particularly thrilled with the idea of having two independent
lists of hard-coded paths to maintain, even with comments in both places
saying to update the other list.

Thanks!

Stephen

Attachment
On Sun, Mar 25, 2018 at 5:06 PM, Michael Paquier <michael@paquier.xyz> wrote:
> On Sat, Mar 24, 2018 at 11:14:34PM -0400, Tom Lane wrote:
>> Stephen Frost <sfrost@snowman.net> writes:
>>> I don't completely buy off on the argument that having these #define's
>>> would make it easier for forks (we've had quite a few folks fork PG, but
>>> how many of them have actually changed "base"?) and I'm on the fence
>>> about if these will make our lives simpler down the road when it comes
>>> to changing the directory names
>>
>> I am distressed that nobody, apparently, is putting any weight on the
>> back-patching pain that will result from widespread replacement of path
>> names with macros.  I don't buy that either we or anyone else will need
>> to change these names in future, so I see pain and effectively no
>> gain.
>
> That's actually something I worry about as well (as the author!), which
> is why I qualify the changes as intrusive.  At the end, I think that I
> would be tempted to just do #3, aka to keep a copy of the filter list in
> pg_rewind code while hardcoding a minimum of names and mention in both
> basebackup.c and pg_rewind code to not forget to update the filter list
> if necessary.

+1. It's better for us to focus on the code change of the fillter on pg_rewind
rather than such "refactoring". As I told upthread, the previous patch has the
problem where the files which should be skipped are not skipped. ISTM that,
in pg_rewind, the fillter should be triggered in recurse_dir() not
process_source_file().

BTW what should pg_rewind do when it finds the directory which should be
skipped, in the source directory? In your patch, pg_rewind just tries to skip
that directory at all. But isn't this right behavior? If that directory doesn't
exist in the target directory(though I'm not sure if this situation is really
possible), I'm thinking that pg_rewind should create that "empty" directory
 in the target. No?

Regards,

-- 
Fujii Masao


Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
> +1. It's better for us to focus on the code change of the fillter on pg_rewind
> rather than such "refactoring".

(filter takes one 'l', not two)

Okay.  I had my mind mostly focused on how to shape the exclusion list
and get it shared between the base backup and pg_rewind, so let's move
on with the duplicated list for now.  I did not put much efforts into
the pg_rewind portion to be honest.

> As I told upthread, the previous patch has the
> problem where the files which should be skipped are not skipped. ISTM that,
> in pg_rewind, the filter should be triggered in recurse_dir() not
> process_source_file().

If you put that into recurse_dir you completely ignore the case where
changes are fetched by libpq.  Doing the filtering when processing the
file map has the advantage to take care of both the local and remote
cases, which is why I am doing it there.  So you would just get half of
the cake and not the whole of it.

> BTW what should pg_rewind do when it finds the directory which should be
> skipped, in the source directory? In your patch, pg_rewind just tries to skip
> that directory at all. But isn't this right behavior? If that directory doesn't
> exist in the target directory (though I'm not sure if this situation is really
> possible), I'm thinking that pg_rewind should create that "empty" directory
> in the target. No?

I am not exactly sure what you are coming up with here.  The target
server should have the same basic directory mapping as the source as the
target has been initialized normally with initdb or a base backup from
another node, so checking for the *contents* of directories is enough
and keeps the code more simple, as the exclude filter entries are based
on elements inherent to PostgreSQL internals.  Please note as well that
if a non-system directory is present on the source but not the target
then it would get created on the target.

At the end I have finished with the attached.  I have taken the decision
to not include as well xlog.h in pg_rewind to avoid having to drag a lot
of backend-only headers like pg_resetwal does, which I prefer avoid as
that's only hardcoding values for "backup_label" and "tablespace_map".
This applies filters based on directory contents, so by running the
regression tests you can see entries like the following ones:
entry "postmaster.opts" excluded from source file list
entry "pg_subtrans/0000" excluded from source file list
entry "pg_notify/0000" excluded from source file list
entry "base/12360/pg_internal.init" excluded from source file list
entry "backup_label.old" excluded from source file list
entry "global/pg_internal.init" excluded from source file list
entry "postmaster.opts" excluded from target file list
entry "pg_subtrans/0000" excluded from target file list
entry "pg_notify/0000" excluded from target file list
entry "base/12360/pg_internal.init" excluded from target file list
entry "global/pg_internal.init" excluded from target file list

Processing the filemap list on the target also matters in my opinion.
When at recovery, all the previous files will be wiped out, and we
should not remove either things like postmaster.pid as those are around
to prevent corruption problems.

Thanks,
--
Michael

Attachment

Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Sat, Mar 24, 2018 at 09:12:09PM -0400, Stephen Frost wrote:
> Yeah, neither 2 or 3 really appeals to me.  Option 1 does touch a number
> of places but in a pretty straight-forward way- and if there's a typo
> there, the compiler is likely to complain, so it seems like the risk is
> relatively low.

One example of place which can be easily consolidated is pg_wal whose
definition is in xlog_internal.h.  And there are a couple of other
places which could be consolidated without major refactoring like what I
proposed initially on this thread.  I would suggest to focus on this
effort on a separate thread later on.
--
Michael

Attachment
On Tue, Mar 27, 2018 at 10:55 AM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
>> +1. It's better for us to focus on the code change of the fillter on pg_rewind
>> rather than such "refactoring".
>
> (filter takes one 'l', not two)
>
> Okay.  I had my mind mostly focused on how to shape the exclusion list
> and get it shared between the base backup and pg_rewind, so let's move
> on with the duplicated list for now.  I did not put much efforts into
> the pg_rewind portion to be honest.
>
>> As I told upthread, the previous patch has the
>> problem where the files which should be skipped are not skipped. ISTM that,
>> in pg_rewind, the filter should be triggered in recurse_dir() not
>> process_source_file().
>
> If you put that into recurse_dir you completely ignore the case where
> changes are fetched by libpq.  Doing the filtering when processing the
> file map has the advantage to take care of both the local and remote
> cases, which is why I am doing it there.

OK.

>> BTW what should pg_rewind do when it finds the directory which should be
>> skipped, in the source directory? In your patch, pg_rewind just tries to skip
>> that directory at all. But isn't this right behavior? If that directory doesn't
>> exist in the target directory (though I'm not sure if this situation is really
>> possible), I'm thinking that pg_rewind should create that "empty" directory
>> in the target. No?
>
> I am not exactly sure what you are coming up with here.  The target
> server should have the same basic directory mapping as the source as the
> target has been initialized normally with initdb or a base backup from
> another node, so checking for the *contents* of directories is enough
> and keeps the code more simple, as the exclude filter entries are based
> on elements inherent to PostgreSQL internals.  Please note as well that
> if a non-system directory is present on the source but not the target
> then it would get created on the target.
>
> At the end I have finished with the attached.

Thanks for the patch!

+ snprintf(localpath, sizeof(localpath), "%s/",
+ excludeDirContents[excludeIdx]);
+ if (strstr(path, localpath) != NULL)

This code is almost ok in practice, but using the check of
"strstr(path, localpath) == path" is more robust here?

+ for (excludeIdx = 0; excludeFiles[excludeIdx] != NULL; excludeIdx++)
+ {
+ if (strstr(path, excludeFiles[excludeIdx]) != NULL)

Using the following code instead is more robust?
This original code is almost ok in practice, though.

    filename = last_dir_separator(path);
    if (filename == NULL)
        filename = path;
    else
        filename++;
    if (strcmp(filename, excludeFiles[excludeIdx]) == 0)

+      (everything except the relation files). Similarly to base backups,
+      the contents of the directories <filename>pg_dynshmem/</filename>,
+      <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
+      <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
+      <filename>pg_stat_tmp/</filename>, and
+      <filename>pg_subtrans/</filename> are omitted from the data copied
+      from the source cluster. Any file or directory beginning with
+      <filename>pgsql_tmp</filename> is omitted, as well as are
+      <filename>backup_label</filename>,
+      <filename>tablespace_map</filename>,
+      <filename>pg_internal.init</filename>,
+      <filename>postmaster.opts</filename> and
+      <filename>postmaster.pid</filename>.

I don't think this description is necessary. The doc for pg_basebackup
also doesn't seem to have this kind of description.

So attached is the patch that I updated based on your patch and
am thinking to apply.

Regards,

-- 
Fujii Masao

Attachment

Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote:
> This code is almost ok in practice, but using the check of
> "strstr(path, localpath) == path" is more robust here?

No problems with that either.

> Using the following code instead is more robust?
> This original code is almost ok in practice, though.
>
>     filename = last_dir_separator(path);
>     if (filename == NULL)
>         filename = path;
>     else
>         filename++;
>     if (strcmp(filename, excludeFiles[excludeIdx]) == 0)

Indeed, using last_dir_separator is a better idea for files.  I was
looking for something like that actually.

> +      (everything except the relation files). Similarly to base backups,
> +      the contents of the directories <filename>pg_dynshmem/</filename>,
> +      <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
> +      <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
> +      <filename>pg_stat_tmp/</filename>, and
> +      <filename>pg_subtrans/</filename> are omitted from the data copied
> +      from the source cluster. Any file or directory beginning with
> +      <filename>pgsql_tmp</filename> is omitted, as well as are
> +      <filename>backup_label</filename>,
> +      <filename>tablespace_map</filename>,
> +      <filename>pg_internal.init</filename>,
> +      <filename>postmaster.opts</filename> and
> +      <filename>postmaster.pid</filename>.
>
> I don't think this description is necessary. The doc for pg_basebackup
> also doesn't seem to have this kind of description.

Those are listed in backup.sgml.  And I really think that we should at
least document that the same type of exclusion filters as base backups
are used in pg_rewind.  If you don't want to include the whole list,
then let's use a reference to backup-lowlevel-base-backup-data.

> So attached is the patch that I updated based on your patch and
> am thinking to apply.

Thanks.  Except for the documentation part, I am OK for the changes
proposed.
--
Michael

Attachment
On Wed, Mar 28, 2018 at 7:54 AM, Michael Paquier <michael@paquier.xyz> wrote:
> On Wed, Mar 28, 2018 at 04:13:25AM +0900, Fujii Masao wrote:
>> This code is almost ok in practice, but using the check of
>> "strstr(path, localpath) == path" is more robust here?
>
> No problems with that either.
>
>> Using the following code instead is more robust?
>> This original code is almost ok in practice, though.
>>
>>     filename = last_dir_separator(path);
>>     if (filename == NULL)
>>         filename = path;
>>     else
>>         filename++;
>>     if (strcmp(filename, excludeFiles[excludeIdx]) == 0)
>
> Indeed, using last_dir_separator is a better idea for files.  I was
> looking for something like that actually.
>
>> +      (everything except the relation files). Similarly to base backups,
>> +      the contents of the directories <filename>pg_dynshmem/</filename>,
>> +      <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
>> +      <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
>> +      <filename>pg_stat_tmp/</filename>, and
>> +      <filename>pg_subtrans/</filename> are omitted from the data copied
>> +      from the source cluster. Any file or directory beginning with
>> +      <filename>pgsql_tmp</filename> is omitted, as well as are
>> +      <filename>backup_label</filename>,
>> +      <filename>tablespace_map</filename>,
>> +      <filename>pg_internal.init</filename>,
>> +      <filename>postmaster.opts</filename> and
>> +      <filename>postmaster.pid</filename>.
>>
>> I don't think this description is necessary. The doc for pg_basebackup
>> also doesn't seem to have this kind of description.
>
> Those are listed in backup.sgml.  And I really think that we should at
> least document that the same type of exclusion filters as base backups
> are used in pg_rewind.

Okay, I revived that change in the doc.

>> So attached is the patch that I updated based on your patch and
>> am thinking to apply.
>
> Thanks.  Except for the documentation part, I am OK for the changes
> proposed.

Committed. Thanks!

Regards,

-- 
Fujii Masao


Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind

From
Michael Paquier
Date:
On Thu, Mar 29, 2018 at 05:01:40AM +0900, Fujii Masao wrote:
> Committed. Thanks!

Thanks for including as well the documentation changes and committing
it.  The result looks good to me.
--
Michael

Attachment