Re: Switching timeline over streaming replication - Mailing list pgsql-hackers

From Amit Kapila
Subject Re: Switching timeline over streaming replication
Date
Msg-id 00bd01cdc402$dd1c4ad0$9754e070$@kapila@huawei.com
Whole thread Raw
In response to Re: Switching timeline over streaming replication  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: Switching timeline over streaming replication
List pgsql-hackers
On Thursday, November 15, 2012 6:05 PM Heikki Linnakangas wrote:
> On 15.11.2012 12:44, Heikki Linnakangas wrote:
> > Here's an updated version of this patch, rebased with master,
> > including the recent replication timeout changes, and some other
> cleanup.
> >
> > On 12.10.2012 09:34, Amit Kapila wrote:
> >> The test is finished from myside.
> >>
> >> one more issue:
> >  > ...
> >> ./pg_basebackup -P -D ../../data_sub -X fetch -p 2303
> >> pg_basebackup: COPY stream ended before last file was finished
> >
> > Fixed this.
> >
> > However, the test scenario you point to here:
> > http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e77
> > 10$@kapila@huawei.com still seems to be broken, although I get a
> > different error message now.
> > I'll dig into this..
> 
> Ok, here's an updated patch again, with that bug fixed.

First, I started with test of this Patch.

Basic stuff: 
------------ 
- Patch applies OK 
- Compiles cleanly with no warnings 
- Regression tests pass except the "standbycheck". 

From a glance view of the "standbycheck" regression failures are because of
sql scripts and expected outputs are little old. 

The following problems are observed while testing of the patch. 
Defect-1: 
     1. start primary A      2. start standby B following A      3. start cascade standby C following B.      4.
Promotestandby B.      5. After successful time line switch in cascade standby C, stop C.      6. Restart C, startup is
failingwith the following error. 
 
       LOG:  database system was shut down in recovery at 2012-11-16
16:26:29 IST        FATAL:  requested timeline 2 does not contain minimum recovery point
0/30143A0 on timeline 1        LOG:  startup process (PID 415) exited with exit code 1        LOG:  aborting startup
dueto startup process failure 
 

The above defect is already discussed in the following link. 
http://archives.postgresql.org/message-id/00a801cda6f3$4aba27b0$e02e7710$@ka
pila@huawei.com 



Defect-2: 
     1. start primary A      2. start standby B following A      3. start cascade standby C following B with
'recovery_target_timeline'
option in          recovery.conf is disabled.      4. Promote standby B.      5. Cascade Standby C is not able to
followthe new master B because of
 
timeline difference.        6. Try to stop the cascade standby C (which is failing and the
server is not stopping,          observations are as WAL Receiver process is still running and
clients are not allowing to connect).

The defect-2 is happened only once in my test environment, I will try to
reproduce it.

With Regards,
Amit Kapila.




pgsql-hackers by date:

Previous
From: Markus Wanner
Date:
Subject: Re: logical changeset generation v3 - comparison to Postgres-R change set format
Next
From: Merlin Moncure
Date:
Subject: Re: WIP patch for hint bit i/o mitigation