Re: 9.2.3 crashes during archive recovery - Mailing list pgsql-hackers

From KONDO Mitsumasa
Subject Re: 9.2.3 crashes during archive recovery
Date
Msg-id 51384A56.1010906@lab.ntt.co.jp
Whole thread Raw
In response to Re: 9.2.3 crashes during archive recovery  (Heikki Linnakangas <hlinnakangas@vmware.com>)
Responses Re: 9.2.3 crashes during archive recovery  (Heikki Linnakangas <hlinnakangas@vmware.com>)
List pgsql-hackers
(2013/03/06 16:50), Heikki Linnakangas wrote:>
>> Hi,
>>
>> Horiguch's patch does not seem to record minRecoveryPoint in ReadRecord();
>> Attempt patch records minRecoveryPoint.
>> [crash recovery -> record minRecoveryPoint in control file -> archive
>> recovery]
>> I think that this is an original intention of Heikki's patch.
>
> Yeah. That fix isn't right, though; XLogPageRead() is supposed to return true on success, and false on error, and the
patchmakes it return 'true' on error, if archive recovery was requested but we're still in crash recovery. The real
issuehere is that I missed the two "return NULL;"s in ReadRecord(), so the code that I put in the
next_record_is_invalidcodepath isn't run if XLogPageRead() doesn't find the file at all. Attached patch is the proper
fixfor this. 
>
Thanks for createing patch! I test your patch in 9.2_STABLE, but it does not use promote command...
When XLogPageRead() was returned false ,it means the end of stanby loop, crash recovery loop, and archive recovery
loop.
Your patch is not good for promoting Standby to Master. It does not come off standby loop.

So I make new patch which is based Heikki's and Horiguchi's patch.
I attempt test script which was modifyed Horiuch's script. This script does not depend on shell enviroment. It was only
neededto fix PGPATH. 
Please execute this test script.


>> I also found a bug in latest 9.2_stable. It does not get latest timeline
>> and
>> recovery history file in archive recovery when master and standby
>> timeline is different.
>
> Works for me.. Can you create a test script for that? Remember to set "recovery_target_timeline='latest'".
I set recovery_target_timeline=latest. hmm...

Here is my recovery.conf.
> mitsu-ko@localhost postgresql]$ cat Standby/recovery.conf
> standby_mode = 'yes'
> recovery_target_timeline='latest'
> primary_conninfo='host=localhost port=65432'
> restore_command='cp ../arc/%f %p'
And my system's log message is here.
> waiting for server to start....[Standby] LOG:  database system was shut down in recovery at 2013-03-07 02:56:05 EST
> [Standby] LOG:  restored log file "00000002.history" from archive
> cp: cannot stat `../arc/00000003.history': そのようなファイルやディレクトリはありません
> [Standby] FATAL:  requested timeline 2 is not a child of database system timeline 1
> [Standby] LOG:  startup process (PID 20941) exited with exit code 1
> [Standby] LOG:  aborting startup due to startup process failure
It can be reproduced in my test script, too.
Last master start command might seem not to exist generally in my test script.
But it is generally that PostgreSQL with Pacemaker system.


Best regards,
--
Mitsumasa KONDO
NTT OSS Center

Attachment

pgsql-hackers by date:

Previous
From: Amit Kapila
Date:
Subject: Re: Re: Proposal for Allow postgresql.conf values to be changed via SQL [review]
Next
From: Simon Riggs
Date:
Subject: Re: Materialized views WIP patch