Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby - Mailing list pgsql-bugs
From | Fujii Masao |
---|---|
Subject | Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby |
Date | |
Msg-id | CAHGQGwG4VXyvgHtiepiJ=e89szESOva0k+SC-WE5Wnj3NoO7Pw@mail.gmail.com Whole thread Raw |
In response to | Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby (Heikki Linnakangas <hlinnaka@iki.fi>) |
Responses |
Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
|
List | pgsql-bugs |
On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote= : > On 12.09.2012 22:03, Fujii Masao wrote: >> >> On Wed, Sep 12, 2012 at 8:47 PM,<amit.kapila@huawei.com> wrote: >>> >>> The following bug has been logged on the website: >>> >>> Bug reference: 7533 >>> Logged by: Amit Kapila >>> Email address: amit.kapila@huawei.com >>> PostgreSQL version: 9.2.0 >>> Operating system: Suse >>> Description: >>> >>> M host is primary, S host is standby and CS host is cascaded standby. >>> >>> 1.Set up postgresql-9.2beta2/RC1 on all hosts. >>> 2.Execute command initdb on host M to create fresh database. >>> 3.Modify the configure file postgresql.conf on host M like this=EF=BC= =9A >>> listen_addresses =3D 'M' >>> port =3D 15210 >>> wal_level =3D hot_standby >>> max_wal_senders =3D 4 >>> hot_standby =3D on >>> 4.modify the configure file pg_hba.conf on host M like this=EF=BC=9A >>> host replication repl M/24 md5 >>> 5.Start the server on host M as primary. >>> 6.Connect one client to primary server and create a user =E2=80=98repl= =E2=80=99 >>> Create user repl superuser password '123'; >>> 7.Use the command pg_basebackup on the host S to retrieve database of >>> primary host >>> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup >>> -P >>> -v -h M -p 15210 -U repl =E2=80=93W >>> 8. Copy one recovery.conf.sample from share folder of package to databa= se >>> folder of the host S. Then rename this file to recovery.conf >>> 9.Modify the file recovery.conf on host S as below: >>> standby_mode =3D on >>> primary_conninfo =3D 'host=3DM port=3D15210 user=3Drepl >>> password=3D123' >>> 10. Modify the file postgresql.conf on host S as follow: >>> listen_addresses =3D 'S' >>> 11.Start the server on host S as standby server. >>> 12.Use the command pg_basebackup on the host CS to retrieve database of >>> standby host >>> pg_basebackup -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup >>> -P >>> -v -h M -p 15210 -U repl =E2=80=93W >>> 13.Modify the file recovery.conf on host CS as below: >>> standby_mode =3D on >>> primary_conninfo =3D 'host=3DS port=3D15210 user=3Drepl password=3D= 123' >>> 14. Modify the file postgresql.conf on host S as follow: >>> listen_addresses =3D 'CS' >>> 15.Start the server on host CS as Cascaded standby server node. >>> 16. Try to connect a client to host CS but it gives error as: >>> FATAL: the database system is starting up >> >> >> This procedures didn't reproduce the problem in HEAD. But when I restart= ed >> the master server between the step 11 and 12, I was able to reproduce th= e >> problem. >> >>> Observations related to bug >>> ------------------------------ >>> In the above scenario it is observed that Start-up process has read all >>> data >>> (in our defect scenario minRecoveryPoint is 5016220) till the position >>> 5016220 and then it goes and check for recovery consistency by followin= g >>> condition in function CheckRecoveryConsistency: >>> if (!reachedConsistency&& >>> XLByteLE(minRecoveryPoint, EndRecPtr)&& >>> XLogRecPtrIsInvalid(ControlFile->backupStartPoint)) >>> >>> At this point first two conditions are true but last condition is not >>> true >>> because still redo has not been applied and hence backupStartPoint has >>> not >>> been reset. So it does not signal postmaster regarding consistent stage= . >>> After this it goes and applies the redo and then reset backupStartPoint >>> and >>> then it goes to read next set of record. Since all records have been >>> already >>> read, so it starts waiting for the new record from the Standby node. Bu= t >>> since there is no new record from Standby node coming so it keeps waiti= ng >>> for that and it does not get chance to recheck the recovery consistent >>> level. And hence client connection does not get allowed. >> >> >> If cascaded standby starts a recovery at a normal checkpoint record, >> this problem will not happen. Because if wal_level is set to hot_standby= , >> XLOG_RUNNING_XACTS WAL record always follows after the normal >> checkpont record. So while XLOG_RUNNING_XACTS record is being replayed, >> ControlFile->backupStartPoint can be reset, and then cascaded standby >> can pass through the consistency test. >> >> The problem happens when cascaded standby starts a recovery at a >> shutdown checkpoint record. In this case, no WAL record might follow >> the checkpoint one yet. So, after replaying the shutdown checkpoint >> record, cascaded standby needs to wait for new WAL record to appear >> before reaching the code block for resetting >> ControlFile->backupStartPoint. >> The cascaded standby cannot reach a consistent state and a client cannot >> connect to the cascaded standby until new WAL has arrived. >> >> Attached patch will fix the problem. In this patch, if recovery is >> beginning at a shutdown checkpoint record, any ControlFile fields >> (like backupStartPoint) required for checking that an end-of-backup is >> reached are not set at first. IOW, cascaded standby thinks that the >> database is consistent from the beginning. This is safe because >> a shutdown checkpoint record means that there is no running database >> activity at that point and the database is in consistent state. > > > Hmm, I think the CheckRecoveryConsistency() call in the redo loop is > misplaced. It's called after we got a record from ReadRecord, but *before= * > replaying it (rm_redo). Even if replaying record X makes the system > consistent, we won't check and notice that until we have fetched record X= +1. > In this particular test case, record X is a shutdown checkpoint record, b= ut > it could as well be a running-xacts record, or the record that reaches > minRecoveryPoint. > > Does the problem go away if you just move the CheckRecoveryConsistency() > call *after* rm_redo (attached)? No, at least in my case. When recovery starts at shutdown checkpoint record= and there is no record following the shutdown checkpoint, recovery gets in wait state before entering the main redo apply loop. That is, recovery starts waiting = for new WAL record to arrive, in ReadRecord just before the redo loop. So movin= g the CheckRecoveryConsistency() call after rm_redo cannot fix the problem wh= ich I reported. To fix the problem, we need to make the recovery reach the consistent point before the redo loop, i.e., in the CheckRecoveryConsistency() just before the redo loop. Regards, --=20 Fujii Masao
pgsql-bugs by date: