Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind - Mailing list pgsql-hackers

From Fujii Masao
Subject Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind
Date
Msg-id CAHGQGwGNF7VUoUC-L3797ziMqCzxaxrgtLNf1=b1BNyMmOF6ag@mail.gmail.com
Whole thread Raw
In response to Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind  (Michael Paquier <michael@paquier.xyz>)
Responses Re: Using base backup exclusion filters to reduce data transferredwith pg_rewind  (Michael Paquier <michael@paquier.xyz>)
List pgsql-hackers
On Tue, Mar 27, 2018 at 10:55 AM, Michael Paquier <michael@paquier.xyz> wrote:
> On Tue, Mar 27, 2018 at 01:32:41AM +0900, Fujii Masao wrote:
>> +1. It's better for us to focus on the code change of the fillter on pg_rewind
>> rather than such "refactoring".
>
> (filter takes one 'l', not two)
>
> Okay.  I had my mind mostly focused on how to shape the exclusion list
> and get it shared between the base backup and pg_rewind, so let's move
> on with the duplicated list for now.  I did not put much efforts into
> the pg_rewind portion to be honest.
>
>> As I told upthread, the previous patch has the
>> problem where the files which should be skipped are not skipped. ISTM that,
>> in pg_rewind, the filter should be triggered in recurse_dir() not
>> process_source_file().
>
> If you put that into recurse_dir you completely ignore the case where
> changes are fetched by libpq.  Doing the filtering when processing the
> file map has the advantage to take care of both the local and remote
> cases, which is why I am doing it there.

OK.

>> BTW what should pg_rewind do when it finds the directory which should be
>> skipped, in the source directory? In your patch, pg_rewind just tries to skip
>> that directory at all. But isn't this right behavior? If that directory doesn't
>> exist in the target directory (though I'm not sure if this situation is really
>> possible), I'm thinking that pg_rewind should create that "empty" directory
>> in the target. No?
>
> I am not exactly sure what you are coming up with here.  The target
> server should have the same basic directory mapping as the source as the
> target has been initialized normally with initdb or a base backup from
> another node, so checking for the *contents* of directories is enough
> and keeps the code more simple, as the exclude filter entries are based
> on elements inherent to PostgreSQL internals.  Please note as well that
> if a non-system directory is present on the source but not the target
> then it would get created on the target.
>
> At the end I have finished with the attached.

Thanks for the patch!

+ snprintf(localpath, sizeof(localpath), "%s/",
+ excludeDirContents[excludeIdx]);
+ if (strstr(path, localpath) != NULL)

This code is almost ok in practice, but using the check of
"strstr(path, localpath) == path" is more robust here?

+ for (excludeIdx = 0; excludeFiles[excludeIdx] != NULL; excludeIdx++)
+ {
+ if (strstr(path, excludeFiles[excludeIdx]) != NULL)

Using the following code instead is more robust?
This original code is almost ok in practice, though.

    filename = last_dir_separator(path);
    if (filename == NULL)
        filename = path;
    else
        filename++;
    if (strcmp(filename, excludeFiles[excludeIdx]) == 0)

+      (everything except the relation files). Similarly to base backups,
+      the contents of the directories <filename>pg_dynshmem/</filename>,
+      <filename>pg_notify/</filename>, <filename>pg_replslot/</filename>,
+      <filename>pg_serial/</filename>, <filename>pg_snapshots/</filename>,
+      <filename>pg_stat_tmp/</filename>, and
+      <filename>pg_subtrans/</filename> are omitted from the data copied
+      from the source cluster. Any file or directory beginning with
+      <filename>pgsql_tmp</filename> is omitted, as well as are
+      <filename>backup_label</filename>,
+      <filename>tablespace_map</filename>,
+      <filename>pg_internal.init</filename>,
+      <filename>postmaster.opts</filename> and
+      <filename>postmaster.pid</filename>.

I don't think this description is necessary. The doc for pg_basebackup
also doesn't seem to have this kind of description.

So attached is the patch that I updated based on your patch and
am thinking to apply.

Regards,

-- 
Fujii Masao

Attachment

pgsql-hackers by date:

Previous
From: Simon Riggs
Date:
Subject: Re: [HACKERS] Surjective functional indexes
Next
From: Fujii Masao
Date:
Subject: Re: [bug fix] pg_rewind creates corrupt WAL files, and the standbycannot catch up the primary