Thread: Re: Timeline issue if StartupXLOG() is interrupted right before end-of-recovery record is done


> On 21 Jan 2025, at 16:47, Roman Eskin <r.eskin@arenadata.io> wrote:
>
>>
>> Persisting recovery signal file for some _timeout_ seems super dangerous to me. In distributed systems every extra
_timeout_is a source of complexity, uncertainty and despair. 
>
> The approach is not about persisting the signal files for some timeout. Currently the files are removed in
StartupXLOG()before writeTimeLineHistory() and PerformRecoveryXLogAction() are called. The suggestion is to move the
fileremoval after PerformRecoveryXLogAction() inside StartupXLOG(). 

Sending node to repeated promote-fail cycle without resolving root cause seems like even less appealing idea.
If something prevented promotion, why we should retry by this particular method?

Even in case of transient failure which you described - power loss - it does not sound like a very good idea to retry
promotionafter returning online. The user will get unexpected splitbrain. 


Best regards, Andrey Borodin.