Issue observed in cascade standby setup and analysis for same - Mailing list pgsql-hackers
From | Amit Kapila |
---|---|
Subject | Issue observed in cascade standby setup and analysis for same |
Date | |
Msg-id | 008101cd8c38$c6d2bf90$54783eb0$@kapila@huawei.com Whole thread Raw |
Responses |
Re: Issue observed in cascade standby setup and analysis
for same
|
List | pgsql-hackers |
<div class="WordSection1"><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Below mentionedissue might be current behavior of PostgreSQL, but I am not able to find that in documentation so I have done briefanalysis as well.</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br /><b><u>Issue</u></b>:cascade standby server raise errors as “<u>FATAL: the database system is starting up</u>” when tryingto connect a client. Cascaded standby server is started on database which is received from Standby node using pg_basebackuptool. </span><p><b><u><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Steps to reproduce</span></u></b><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif"">: </span><p><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Mhost is primary, S host is standby and CS host is cascaded standby.<br /><br />1. Set up postgresql-9.2beta2 on all hosts. <br />2. Execute command initdb on host Mto create fresh database. <br />3. Modify the configure file postgresql.conf on host M like this</span><span lang="JA"style="font-size:10.0pt;font-family:SimSun">:</span><span style="font-size:10.0pt;font-family:"Arial","sans-serif""><br/> listen_addresses = 'M' <br /> port =15210 <br /> wal_level = hot_standby <br /> max_wal_senders = 4 <br /> hot_standby = on<br />4. modify the configure file pg_hba.conf on host M like this</span><span lang="JA" style="font-size:10.0pt;font-family:SimSun">:</span><spanstyle="font-size:10.0pt;font-family:"Arial","sans-serif""><br /> host replication repl M/24 md5 <br />5. Start the server on host M as primary.<br />6. Connect one client to primary server and create a user ‘repl’ <br /> create userrepl superuser password '123'; <br />7. Use the command pg_basebackup on the host S to retrieve database of primaryhost <br /> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P -v -h M-p 15210 -U repl –W <br />8. Copy one recovery.conf.sample from share folder of package to database folder of thehost S. Then rename this file to recovery.conf <br />9. Modify the file recovery.conf on host S as below: <br /> standby_mode = on <br /> primary_conninfo = 'host=M port=15210 user=repl password=123' <br />10. Modify the file postgresql.conf on host S as follow: <br /> listen_addresses = 'S' <br />11. Start the server on host S as standby server. <br />12. Use the command pg_basebackup on the host CS to retrievedatabase of standby host <br /> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup-P -v -h M -p 15210 -U repl –W <br />13. Modify the file recovery.conf on host CS as below: <br /> standby_mode = on <br /> primary_conninfo = 'host=S port=15210 user=repl password=123'<br />14. Modify the file postgresql.conf on host S as follow: <br /> listen_addresses= 'CS' <br />15. Start the server on host CS as Cascaded standby server node.<br />16. Try to connect a client to host CS but it gives error as: <br /> FATAL: thedatabase system is starting up <br /><br /><b><u>Observation based on code analysis:</u></b><br />In the above scenarioit is observed that Start-up process has read all data (in our defect scenario minRecoveryPoint is 5016220) tillthe position 5016220 and then it goes and check for recovery consistency by following condition in function CheckRecoveryConsistency:<br /> if (!reachedConsistency && <br /> XLByteLE(minRecoveryPoint,EndRecPtr) && <br /> XLogRecPtrIsInvalid(ControlFile->backupStartPoint))<br /><br />At this point first two conditions are true but last conditionis not true because still redo has not been applied and hence backupStartPoint has not been reset. So it does notsignal postmaster regarding consistent stage. After this it goes and applies the redo and then reset backupStartPointand then it goes to read next set of record. Since all records have been already read, so it starts waitingfor the new record from the Standby node. But since there is no new record from Standby node coming so it keeps waitingfor that and it does not get chance to recheck the recovery consistent level. And hence client connection does notget allowed.</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">Suggestions?</span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif""> </span><pclass="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">WithRegards,</span><p class="MsoNormal"><span style="font-size:10.0pt;font-family:"Arial","sans-serif"">AmitKapila.</span></div>
pgsql-hackers by date: