Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby - Mailing list pgsql-bugs

From Heikki Linnakangas
Subject Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Date
Msg-id 5051CFD2.60103@iki.fi
Whole thread Raw
In response to Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (Fujii Masao <masao.fujii@gmail.com>)
Responses Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (Amit Kapila <amit.kapila@huawei.com>)
Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby  (Fujii Masao <masao.fujii@gmail.com>)
List pgsql-bugs
On 12.09.2012 22:03, Fujii Masao wrote:
> On Wed, Sep 12, 2012 at 8:47 PM,<amit.kapila@huawei.com>  wrote:
>> The following bug has been logged on the website:
>>
>> Bug reference:      7533
>> Logged by:          Amit Kapila
>> Email address:      amit.kapila@huawei.com
>> PostgreSQL version: 9.2.0
>> Operating system:   Suse
>> Description:
>>
>> M host is primary, S host is standby and CS host is cascaded standby.
>>
>> 1.Set up postgresql-9.2beta2/RC1 on  all hosts.
>> 2.Execute command initdb on host M to create fresh database.
>> 3.Modify the configure file postgresql.conf on host M like this:
>>       listen_addresses = 'M'
>>     port = 15210
>>     wal_level = hot_standby
>>     max_wal_senders = 4
>>     hot_standby = on
>> 4.modify the configure file pg_hba.conf on host M like this:
>> host     replication     repl             M/24            md5
>> 5.Start the server on host M as primary.
>> 6.Connect one client to primary server and create a user ‘repl’
>>    Create user repl superuser password '123';
>> 7.Use the command pg_basebackup on the host S to retrieve database of
>> primary host
>> pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P
>> -v -h M -p 15210 -U repl –W
>> 8. Copy one recovery.conf.sample from share folder of package to database
>> folder of the host S. Then rename this file to recovery.conf
>> 9.Modify the file recovery.conf on host S as below:
>>               standby_mode = on
>>               primary_conninfo = 'host=M port=15210 user=repl password=123'
>> 10. Modify the file postgresql.conf on host S as follow:
>>         listen_addresses = 'S'
>> 11.Start the server on host S as standby server.
>> 12.Use the command pg_basebackup on the host CS to retrieve database of
>> standby host
>> pg_basebackup  -D /opt/t38917/data -F p -x fetch -c fast -l repl_backup -P
>> -v -h M -p 15210 -U repl –W
>> 13.Modify the file recovery.conf on host CS as below:
>>     standby_mode = on
>>     primary_conninfo = 'host=S port=15210 user=repl password=123'
>> 14. Modify the file postgresql.conf on host S as follow:
>>       listen_addresses = 'CS'
>> 15.Start the server on host CS as Cascaded standby server node.
>> 16. Try to connect a client to host CS but it gives error as:
>>      FATAL:  the database system is starting up
>
> This procedures didn't reproduce the problem in HEAD. But when I restarted
> the master server between the step 11 and 12, I was able to reproduce the
> problem.
>
>> Observations related to bug
>> ------------------------------
>> In the above scenario it is observed that Start-up process has read all data
>> (in our defect scenario minRecoveryPoint is 5016220) till the position
>> 5016220 and then it goes and check for recovery consistency by following
>> condition in function CheckRecoveryConsistency:
>>          if (!reachedConsistency&&
>>                  XLByteLE(minRecoveryPoint, EndRecPtr)&&
>>                  XLogRecPtrIsInvalid(ControlFile->backupStartPoint))
>>
>> At this point first two conditions are true but last condition is not true
>> because still redo has not been applied and hence backupStartPoint has not
>> been reset. So it does not signal postmaster regarding consistent stage.
>> After this it goes and applies the redo and then reset backupStartPoint and
>> then it goes to read next set of record. Since all records have been already
>> read, so it starts waiting for the new record from the Standby node. But
>> since there is no new record from Standby node coming so it keeps waiting
>> for that and it does not get chance to recheck the recovery consistent
>> level. And hence client connection does not get allowed.
>
> If cascaded standby starts a recovery at a normal checkpoint record,
> this problem will not happen. Because if wal_level is set to hot_standby,
> XLOG_RUNNING_XACTS WAL record always follows after the normal
> checkpont record. So while XLOG_RUNNING_XACTS record is being replayed,
> ControlFile->backupStartPoint can be reset, and then cascaded standby
> can pass through the consistency test.
>
> The problem happens when cascaded standby starts a recovery at a
> shutdown checkpoint record. In this case, no WAL record might follow
> the checkpoint one yet. So, after replaying the shutdown checkpoint
> record, cascaded standby needs to wait for new WAL record to appear
> before reaching the code block for resetting ControlFile->backupStartPoint.
> The cascaded standby cannot reach a consistent state and a client cannot
> connect to the cascaded standby until new WAL has arrived.
>
> Attached patch will fix the problem. In this patch, if recovery is
> beginning at a shutdown checkpoint record, any ControlFile fields
> (like backupStartPoint) required for checking that an end-of-backup is
> reached are not set at first. IOW, cascaded standby thinks that the
> database is consistent from the beginning. This is safe because
> a shutdown checkpoint record means that there is no running database
> activity at that point and the database is in consistent state.

Hmm, I think the CheckRecoveryConsistency() call in the redo loop is
misplaced. It's called after we got a record from ReadRecord, but
*before* replaying it (rm_redo). Even if replaying record X makes the
system consistent, we won't check and notice that until we have fetched
record X+1. In this particular test case, record X is a shutdown
checkpoint record, but it could as well be a running-xacts record, or
the record that reaches minRecoveryPoint.

Does the problem go away if you just move the CheckRecoveryConsistency()
call *after* rm_redo (attached)?

- Heikki

Attachment

pgsql-bugs by date:

Previous
From: Amit Kapila
Date:
Subject: Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby
Next
From: Amit Kapila
Date:
Subject: Re: BUG #7533: Client is not able to connect cascade standby incase basebackup is taken from hot standby