Issue observed in cascade standby setup and analysis for same - Mailing list pgsql-hackers

From Amit Kapila
Subject Issue observed in cascade standby setup and analysis for same
Date
Msg-id 008101cd8c38$c6d2bf90$54783eb0$@kapila@huawei.com
Whole thread Raw
Responses Re: Issue observed in cascade standby setup and analysis for same  (Josh Berkus <josh@agliodbs.com>)
List pgsql-hackers
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Below
mentionedissue might be current behavior of PostgreSQL, but I am not able to find that in documentation so I have done
briefanalysis as well.</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br
/><b><u>Issue</u></b>:cascade standby server raise errors as “<u>FATAL:  the database system is starting up</u>” when
tryingto connect a client. Cascaded standby server is started on database which is received from Standby node using
pg_basebackuptool. </span><p><b><u><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Steps to
reproduce</span></u></b><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif"">: </span><p><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Mhost is primary, S host is standby and CS host is cascaded
standby.<br /><br />1.        Set up postgresql-9.2beta2 on  all hosts. <br />2.        Execute command initdb on host
Mto create fresh database. <br />3.        Modify the configure file postgresql.conf on host M like this</span><span
lang="JA"style="font-size:10.0pt;font-family:SimSun">:</span><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""><br/>           listen_addresses = 'M' <br />           port
=15210 <br />           wal_level = hot_standby   <br />           max_wal_senders = 4 <br />           hot_standby =
on<br />4.        modify the configure file pg_hba.conf on host M  like this</span><span lang="JA"
style="font-size:10.0pt;font-family:SimSun">:</span><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif""><br
/>           host    replication     repl             M/24            md5 <br />5.        Start the server on host M as
primary.<br />6.        Connect one client to primary server and create a user ‘repl’ <br />                   create
userrepl superuser password '123'; <br />7.        Use the command pg_basebackup on the host S to retrieve database of
primaryhost <br />                    pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h
M-p 15210 -U repl –W <br />8.        Copy one recovery.conf.sample from share folder of package to database folder of
thehost S. Then rename this file to recovery.conf <br />9.        Modify the file recovery.conf on host S as below: <br
/>            standby_mode = on <br />             primary_conninfo = 'host=M port=15210 user=repl password=123' <br
/>10.       Modify the file postgresql.conf on host S as follow: <br />             listen_addresses = 'S' <br />11.  
    Start the server on host S as standby server. <br />12.        Use the command pg_basebackup on the host CS to
retrievedatabase of standby host <br />             pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l
repl_backup-P -v -h M -p 15210 -U repl –W <br />13.        Modify the file recovery.conf on host CS as below: <br
/>                 standby_mode = on <br />                  primary_conninfo = 'host=S port=15210 user=repl
password=123'<br />14.        Modify the file postgresql.conf on host S as follow: <br
/>                  listen_addresses= 'CS' <br />15.        Start the server on host CS as Cascaded standby server
node.<br />16.        Try to connect a client to host CS but it gives error as: <br />                        FATAL:
 thedatabase system is starting up <br /><br /><b><u>Observation based on code analysis:</u></b><br />In the above
scenarioit is observed that Start-up process has read all data (in our defect scenario minRecoveryPoint is 5016220)
tillthe position 5016220 and then it goes and check for recovery consistency by following condition in function
CheckRecoveryConsistency:<br />        if (!reachedConsistency && <br />               
XLByteLE(minRecoveryPoint,EndRecPtr) && <br />               
XLogRecPtrIsInvalid(ControlFile->backupStartPoint))<br /><br />At this point first two conditions are true but last
conditionis not true because still redo has not been applied and hence backupStartPoint has not been reset. So it does
notsignal postmaster regarding consistent stage. After this it goes and applies the redo and then reset
backupStartPointand then it goes to read next set of record. Since all records have been already read, so it starts
waitingfor the new record from the Standby node. But since there is no new record from Standby node coming so it keeps
waitingfor that and it does not get chance to recheck the recovery consistent level. And hence client connection does
notget allowed.</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">Suggestions?</span><pclass="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif""> </span><pclass="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">WithRegards,</span><p class="MsoNormal"><span
style="font-size:10.0pt;font-family:"Arial","sans-serif"">AmitKapila.</span></div> 

pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: Draft release notes complete
Next
From: Alvaro Herrera
Date:
Subject: Re: embedded list v2