On 4/6/20 9:17 PM, David Steele wrote:
> Hi Grigory,
Hello!
>
> On 4/5/20 8:02 PM, Grigory Smolkin wrote:
>> Hello, hackers!
>>
>> I`m investigating a complains from our clients about archive recovery
>> speed been very slow, and I`ve noticed a really strange and, I think,
>> a very dangerous recovery behavior.
>>
>> When running multi-timeline archive recovery, for every requested
>> segno startup process iterates through every timeline in restore
>> target timeline history, starting from highest timeline and ending in
>> current, and tries to fetch the segno in question from this timeline.
>
> <snip>
>
>> Is there a reason behind this behavior?
>>
>> Also I`ve attached a patch, which fixed this issue for me, but I`m
>> not sure, that chosen approach is sound and didn`t break something.
>
> This sure looks like [1] which has a completed patch nearly ready to
> commit. Can you confirm and see if the proposed patch looks good?
Well I`ve been testing it all day and so far nothing is broken.
But this foreach(xlog.c:3777) loop looks very strange to me, it is not
robust, we are blindly going over timelines and feeding recovery some
files, hoping they are the right ones. I think we can do better, because:
1. we know whether or not we are running multi-timeline recovery
2. we know next timeline ID and can calculate switchpoint segment
3. make an informed decision about from what timeline we must requesting
files now.
I will work on it.
--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company