Thread: pg_rewind : feature to rewind promoted standby is broken!

pg_rewind : feature to rewind promoted standby is broken!

From
Mithun Cy
Date:
I think pg_rewind's feature to rewind the promoted standby as a new
standby is broken in 11

STEPS:
1. create master standby setup.
Use below script for same.

2. Promote the standby
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby promote
waiting for server to promote.... done
server promoted

3. In promoted standby create a database and a table in the new database.
[mithuncy@localhost pgrewmasterbin]$ ./bin/psql -p 5433 postgres
postgres=# create database db1;
CREATE DATABASE
postgres=# \c db1
You are now connected to database "db1" as user "mithuncy".
db1=# create table t1 (t int);
CREATE TABLE

4. try to rewind the newly promoted standby (with old master as source)
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_ctl -D standby stop
waiting for server to shut down....... done
server stopped
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy
dbname=postgres"
servers diverged at WAL location 0/3000060 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
could not remove directory "standby/base/16384": Directory not empty
Failure, exiting

Note: dry run was successful!
[mithuncy@localhost pgrewmasterbin]$ ./bin/pg_rewind -D standby
--source-server="host=127.0.0.1 port=5432 user=mithuncy
dbname=postgres" -n
servers diverged at WAL location 0/3000060 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
Done!

Also I have tested same in version 10 it works fine there.

Did below commit has broken this feature? (Thanks to kuntal for
identifying same)
commit 266b6acb312fc440c1c1a2036aa9da94916beac6
Author: Fujii Masao <fujii@postgresql.org>
Date:   Thu Mar 29 04:56:52 2018 +0900
Make pg_rewind skip files and directories that are removed during server start.

-- 
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Attachment

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Tue, Mar 12, 2019 at 01:23:51PM +0530, Mithun Cy wrote:
> I think pg_rewind's feature to rewind the promoted standby as a new
> standby is broken in 11

Confirmed, it is.

> Also I have tested same in version 10 it works fine there.
>
> Did below commit has broken this feature? (Thanks to kuntal for
> identifying same)
> commit 266b6acb312fc440c1c1a2036aa9da94916beac6
> Author: Fujii Masao <fujii@postgresql.org>
> Date:   Thu Mar 29 04:56:52 2018 +0900
> Make pg_rewind skip files and directories that are removed during server start.

And you are pointing out to the correct commit.  The issue is that
process_target_file() has added a call to check_file_excluded(), and
this skips all the folders which it thinks can be skipped.  One
problem though is that we also filter out pg_internal.init, which is
present in each database folder, and remains in the target directory
marked for deletion.  Then, when the deletion happens, the failure
happens as the directory is not fully empty.

We could consider using rmtree() instead, but it is a nice sanity
check to make sure that all the entries in a path have been marked for
deletion.  Just removing the filter check on the target is fine I
think, as we only should have the filters anyway to avoid copying
unnecessary files from the source.  Attached is a patch.  What do you
think?

(check_file_excluded() could be simplified further but it's also nice
to keep some mirroring in this API if we finish by using it for target
files at some point.)
--
Michael

Attachment

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Tue, Mar 12, 2019 at 01:23:51PM +0530, Mithun Cy wrote:
> I think pg_rewind's feature to rewind the promoted standby as a new
> standby is broken in 11

Confirmed, it is.

> Also I have tested same in version 10 it works fine there.
>
> Did below commit has broken this feature? (Thanks to kuntal for
> identifying same)
> commit 266b6acb312fc440c1c1a2036aa9da94916beac6
> Author: Fujii Masao <fujii@postgresql.org>
> Date:   Thu Mar 29 04:56:52 2018 +0900
> Make pg_rewind skip files and directories that are removed during server start.

And you are pointing out to the correct commit.  The issue is that
process_target_file() has added a call to check_file_excluded(), and
this skips all the folders which it thinks can be skipped.  One
problem though is that we also filter out pg_internal.init, which is
present in each database folder, and remains in the target directory
marked for deletion.  Then, when the deletion happens, the failure
happens as the directory is not fully empty.

We could consider using rmtree() instead, but it is a nice sanity
check to make sure that all the entries in a path have been marked for
deletion.  Just removing the filter check on the target is fine I
think, as we only should have the filters anyway to avoid copying
unnecessary files from the source.  Attached is a patch.  What do you
think?

(check_file_excluded() could be simplified further but it's also nice
to keep some mirroring in this API if we finish by using it for target
files at some point.)
--
Michael

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote:
> And you are pointing out to the correct commit.  The issue is that
> process_target_file() has added a call to check_file_excluded(), and
> this skips all the folders which it thinks can be skipped.  One
> problem though is that we also filter out pg_internal.init, which is
> present in each database folder, and remains in the target directory
> marked for deletion.  Then, when the deletion happens, the failure
> happens as the directory is not fully empty.

Okay, here is a refined patch with better comments, the addition of a
test case (creating tables in the new databases in 002_databases.pl is
enough to trigger the problem).

Could you check that it fixes the issue on your side?
--
Michael

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote:
> And you are pointing out to the correct commit.  The issue is that
> process_target_file() has added a call to check_file_excluded(), and
> this skips all the folders which it thinks can be skipped.  One
> problem though is that we also filter out pg_internal.init, which is
> present in each database folder, and remains in the target directory
> marked for deletion.  Then, when the deletion happens, the failure
> happens as the directory is not fully empty.

Okay, here is a refined patch with better comments, the addition of a
test case (creating tables in the new databases in 002_databases.pl is
enough to trigger the problem).

Could you check that it fixes the issue on your side?
--
Michael

Attachment

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Mithun Cy
Date:
On Wed, Mar 13, 2019 at 1:38 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote:
> And you are pointing out to the correct commit.  The issue is that
> process_target_file() has added a call to check_file_excluded(), and
> this skips all the folders which it thinks can be skipped.  One
> problem though is that we also filter out pg_internal.init, which is
> present in each database folder, and remains in the target directory
> marked for deletion.  Then, when the deletion happens, the failure
> happens as the directory is not fully empty.

Okay, here is a refined patch with better comments, the addition of a
test case (creating tables in the new databases in 002_databases.pl is
enough to trigger the problem).

I have not looked into the patch but quick test show it has fixed the above issue.

[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" -n
servers diverged at WAL location 0/3000000 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
Done!
[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres"
servers diverged at WAL location 0/3000000 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
Done!

--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Mithun Cy
Date:
On Wed, Mar 13, 2019 at 1:38 PM Michael Paquier <michael@paquier.xyz> wrote:
On Tue, Mar 12, 2019 at 06:23:01PM +0900, Michael Paquier wrote:
> And you are pointing out to the correct commit.  The issue is that
> process_target_file() has added a call to check_file_excluded(), and
> this skips all the folders which it thinks can be skipped.  One
> problem though is that we also filter out pg_internal.init, which is
> present in each database folder, and remains in the target directory
> marked for deletion.  Then, when the deletion happens, the failure
> happens as the directory is not fully empty.

Okay, here is a refined patch with better comments, the addition of a
test case (creating tables in the new databases in 002_databases.pl is
enough to trigger the problem).

I have not looked into the patch but quick test show it has fixed the above issue.

[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres" -n
servers diverged at WAL location 0/3000000 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
Done!
[mithuncy@localhost pgrewindbin]$ ./bin/pg_rewind -D standby --source-server="host=127.0.0.1 port=5432 user=mithuncy dbname=postgres"
servers diverged at WAL location 0/3000000 on timeline 1
rewinding from last common checkpoint at 0/2000060 on timeline 1
Done!

--
Thanks and Regards
Mithun Chicklore Yogendra
EnterpriseDB: http://www.enterprisedb.com

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Thu, Mar 14, 2019 at 12:15:57AM +0530, Mithun Cy wrote:
> I have not looked into the patch but quick test show it has fixed the above
> issue.

Thanks for confirming, Mythun.  I'll think about the logic of this
patch for a couple of days in the background, then I'll try to commit
it likely at the beginning of next week.
--
Michael

Attachment

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Thu, Mar 14, 2019 at 12:15:57AM +0530, Mithun Cy wrote:
> I have not looked into the patch but quick test show it has fixed the above
> issue.

Thanks for confirming, Mythun.  I'll think about the logic of this
patch for a couple of days in the background, then I'll try to commit
it likely at the beginning of next week.
--
Michael

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Thu, Mar 14, 2019 at 09:04:41AM +0900, Michael Paquier wrote:
> Thanks for confirming, Mythun.  I'll think about the logic of this
> patch for a couple of days in the background, then I'll try to commit
> it likely at the beginning of next week.

Committed.  I have spent extra time polishing the comments to make the
filtering rules clearer when processing the source and target files,
particularly why they are useful the way they are.
--
Michael

Attachment

Re: pg_rewind : feature to rewind promoted standby is broken!

From
Michael Paquier
Date:
On Thu, Mar 14, 2019 at 09:04:41AM +0900, Michael Paquier wrote:
> Thanks for confirming, Mythun.  I'll think about the logic of this
> patch for a couple of days in the background, then I'll try to commit
> it likely at the beginning of next week.

Committed.  I have spent extra time polishing the comments to make the
filtering rules clearer when processing the source and target files,
particularly why they are useful the way they are.
--
Michael