Re: Review of pg_rewind - Mailing list pgsql-hackers

From Michael Paquier
Subject Re: Review of pg_rewind
Date
Msg-id CAB7nPqSw9DpqOdwnmLYb2cm+j8RERt=AU-cg=P_Jty1r7ifpNw@mail.gmail.com
Whole thread Raw
In response to Review of pg_rewind  (Samrat Revagade <revagade.samrat@gmail.com>)
Responses Re: Review of pg_rewind  (Samrat Revagade <revagade.samrat@gmail.com>)
List pgsql-hackers
Hi,

Thanks for the feedback. Btw, pg_rewind is not a project included in
Postgres core as a contrib module or anything, so could you send your
feedback and the issues you find directly on github instead? The URL
of the project is https://github.com/vmware/pg_rewind.

Either way, here are some comments below...

On Wed, Oct 23, 2013 at 6:07 PM, Samrat Revagade
<revagade.samrat@gmail.com> wrote:
> While testing pg_rewind I encountered following problem.
> I used following process to do the testing, Please correct me if I am doing
> it in wrong way.
>
> Problem-1:
> pg_rewind  gives error (target master must be shut down cleanly.) when
> master crashed unexpectedly.
>
> 1. Setup Streaming Replication (stand alone machine : master server port
> -5432, standby server port-5433 )
> 2. Do some operation on master server:
>           postgres=# create table test(id int);
> 3. Crash the Postgres process of master:
>           kill -9 [pid of postgres process of master server]
> 4. Promote standby server
> 5. Run pg_rewind:
>          $ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D
> /samrat/master-data/ --source-server='host=localhost port=5433
> dbname=postgres' -v
>          connected to remote server
>          fetched file "global/pg_control", length 8192
>          target master must be shut down cleanly.
> 6. Check masters control information:
>          $ /samrat/postgresql/install/bin/pg_controldata
> /samrat/master-data/ | grep "Database cluster state"
>             Database cluster state:               in production
>
> IIUC It is because pg_rewind does some checks before resynchronizing the
> PostgreSQL data directories.
> But In real time scenarios, for example due to hardware failure if master
> crashed and its controldata shows the state "in production" then pg_rewind
> will fail to pass this check.
Yeah, you could call that a limitation of this module. When I looked
at its code some time ago, I had on top of my mind the addition of an
option of the type --force that could attempt resynchronization of a
master even if it did not shut down correctly.

>
> Problem-2:
> For zero length WAL record pf_rewind gives error.
>
> 1. Setup Streaming Replication (stand alone machine : master server port
> -5432, standby server port-5433 )
> 2. Cleanly shutdown master (Do not add any data on master)
> 3. Promote standby server
> 4. Create table on new master (promoted standby)
>     postgres=# create table test(id int);
> 5. Run pg_rewind:
>          $ /samrat/postgresql/contrib/pg_rewind/pg_rewind -D
> /samrat/master-data/ --source-server='host=localhost port=5433
> connected to remote server
>          connected to remote server
>          fetched file "global/pg_control", length 8192
>          fetched file "pg_xlog/00000002.history", length 41
>          Last common WAL position: 0/4000090 on timeline 1
>          could not previous WAL record at 0/4000090: record with zero length
> at 0/4000090
This is rather interesting. When I tested it I did not find this error.

> Also it as you already listed in README of pg_rewind the it has a problem of
> tablespace support.
>
> I will continue with testing it further to help in improving it :)
Thanks!
-- 
Michael



pgsql-hackers by date:

Previous
From: Samrat Revagade
Date:
Subject: Review of pg_rewind
Next
From: Samrat Revagade
Date:
Subject: Re: Review of pg_rewind