Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby - Mailing list pgsql-bugs

From Fujii Masao
Subject Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Date
Msg-id CAHGQGwG4VXyvgHtiepiJ=e89szESOva0k+SC-WE5Wnj3NoO7Pw@mail.gmail.com
Whole thread Raw
In response to Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (Heikki Linnakangas <hlinnaka@iki.fi>)
Responses Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
List pgsql-bugs
On Thu, Sep 13, 2012 at 9:21 PM, Heikki Linnakangas <hlinnaka@iki.fi> wrote=
:
> On 12.09.2012 22:03, Fujii Masao wrote:
>>
>> On Wed, Sep 12, 2012 at 8:47 PM,<amit.kapila@huawei.com>  wrote:
>>>
>>> The following bug has been logged on the website:
>>>
>>> Bug reference:      7533
>>> Logged by:          Amit Kapila
>>> Email address:      amit.kapila@huawei.com
>>> PostgreSQL version: 9.2.0
>>> Operating system:   Suse
>>> Description:
>>>
>>> M host is primary, S host is standby and CS host is cascaded standby.
>>>
>>> 1.Set up postgresql-9.2beta2/RC1 on  all hosts.
>>> 2.Execute command initdb on host M to create fresh database.
>>> 3.Modify the configure file postgresql.conf on host M like this=EF=BC=
=9A
>>>       listen_addresses =3D 'M'
>>>     port =3D 15210
>>>     wal_level =3D hot_standby
>>>     max_wal_senders =3D 4
>>>     hot_standby =3D on
>>> 4.modify the configure file pg_hba.conf on host M like this=EF=BC=9A
>>> host     replication     repl             M/24            md5
>>> 5.Start the server on host M as primary.
>>> 6.Connect one client to primary server and create a user =E2=80=98repl=
=E2=80=99
>>>    Create user repl superuser password '123';
>>> 7.Use the command pg_basebackup on the host S to retrieve database of
>>> primary host
>>> pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup
>>> -P
>>> -v -h M -p 15210 -U repl =E2=80=93W
>>> 8. Copy one recovery.conf.sample from share folder of package to databa=
se
>>> folder of the host S. Then rename this file to recovery.conf
>>> 9.Modify the file recovery.conf on host S as below:
>>>               standby_mode =3D on
>>>               primary_conninfo =3D 'host=3DM port=3D15210 user=3Drepl
>>> password=3D123'
>>> 10. Modify the file postgresql.conf on host S as follow:
>>>         listen_addresses =3D 'S'
>>> 11.Start the server on host S as standby server.
>>> 12.Use the command pg_basebackup on the host CS to retrieve database of
>>> standby host
>>> pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup
>>> -P
>>> -v -h M -p 15210 -U repl =E2=80=93W
>>> 13.Modify the file recovery.conf on host CS as below:
>>>     standby_mode =3D on
>>>     primary_conninfo =3D 'host=3DS port=3D15210 user=3Drepl password=3D=
123'
>>> 14. Modify the file postgresql.conf on host S as follow:
>>>       listen_addresses =3D 'CS'
>>> 15.Start the server on host CS as Cascaded standby server node.
>>> 16. Try to connect a client to host CS but it gives error as:
>>>      FATAL:  the database system is starting up
>>
>>
>> This procedures didn't reproduce the problem in HEAD. But when I restart=
ed
>> the master server between the step 11 and 12, I was able to reproduce th=
e
>> problem.
>>
>>> Observations related to bug
>>> ------------------------------
>>> In the above scenario it is observed that Start-up process has read all
>>> data
>>> (in our defect scenario minRecoveryPoint is 5016220) till the position
>>> 5016220 and then it goes and check for recovery consistency by followin=
g
>>> condition in function CheckRecoveryConsistency:
>>>          if (!reachedConsistency&&
>>>                  XLByteLE(minRecoveryPoint, EndRecPtr)&&
>>>                  XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
>>>
>>> At this point first two conditions are true but last condition is not
>>> true
>>> because still redo has not been applied and hence backupStartPoint has
>>> not
>>> been reset. So it does not signal postmaster regarding consistent stage=
.
>>> After this it goes and applies the redo and then reset backupStartPoint
>>> and
>>> then it goes to read next set of record. Since all records have been
>>> already
>>> read, so it starts waiting for the new record from the Standby node. Bu=
t
>>> since there is no new record from Standby node coming so it keeps waiti=
ng
>>> for that and it does not get chance to recheck the recovery consistent
>>> level. And hence client connection does not get allowed.
>>
>>
>> If cascaded standby starts a recovery at a normal checkpoint record,
>> this problem will not happen. Because if wal_level is set to hot_standby=
,
>> XLOG_RUNNING_XACTS WAL record always follows after the normal
>> checkpont record. So while XLOG_RUNNING_XACTS record is being replayed,
>> ControlFile->backupStartPoint can be reset, and then cascaded standby
>> can pass through the consistency test.
>>
>> The problem happens when cascaded standby starts a recovery at a
>> shutdown checkpoint record. In this case, no WAL record might follow
>> the checkpoint one yet. So, after replaying the shutdown checkpoint
>> record, cascaded standby needs to wait for new WAL record to appear
>> before reaching the code block for resetting
>> ControlFile->backupStartPoint.
>> The cascaded standby cannot reach a consistent state and a client cannot
>> connect to the cascaded standby until new WAL has arrived.
>>
>> Attached patch will fix the problem. In this patch, if recovery is
>> beginning at a shutdown checkpoint record, any ControlFile fields
>> (like backupStartPoint) required for checking that an end-of-backup is
>> reached are not set at first. IOW, cascaded standby thinks that the
>> database is consistent from the beginning. This is safe because
>> a shutdown checkpoint record means that there is no running database
>> activity at that point and the database is in consistent state.
>
>
> Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
> misplaced. It's called after we got a record from ReadRecord, but *before=
*
> replaying it (rm_redo). Even if replaying record X makes the system
> consistent, we won't check and notice that until we have fetched record X=
+1.
> In this particular test case, record X is a shutdown checkpoint record, b=
ut
> it could as well be a running-xacts record, or the record that reaches
> minRecoveryPoint.
>
> Does the problem go away if you just move the CheckRecoveryConsistency()
> call *after* rm_redo (attached)?

No, at least in my case. When recovery starts at shutdown checkpoint record=
 and
there is no record following the shutdown checkpoint, recovery gets in
wait state
before entering the main redo apply loop. That is, recovery starts waiting =
for
new WAL record to arrive, in ReadRecord just before the redo loop. So movin=
g
the CheckRecoveryConsistency() call after rm_redo cannot fix the problem wh=
ich
I reported. To fix the problem, we need to make the recovery reach the
consistent
point before the redo loop, i.e., in the CheckRecoveryConsistency()
just before the
redo loop.

Regards,

--=20
Fujii Masao

pgsql-bugs by date:

Previous
From: Dimitri Fontaine
Date:
Subject: Re: BUG #6704: ALTER EXTENSION postgis SET SCHEMA leaves dangling relations
Next
From: bugs@bereft.net
Date:
Subject: BUG #7536: run arbitrary -c setup command before interaction [wishlist]