Re: "invalid contrecord" error on replica - Mailing list pgsql-general

From Adrien Nayrat
Subject Re: "invalid contrecord" error on replica
Date
Msg-id d3374925-79dc-fd0d-be9f-47fb4f967804@anayrat.info
Whole thread Raw
In response to Re: "invalid contrecord" error on replica  (Kyotaro Horiguchi <horikyota.ntt@gmail.com>)
Responses Re: "invalid contrecord" error on replica
List pgsql-general
On 5/6/21 7:37 AM, Kyotaro Horiguchi wrote:
> At Sun, 2 May 2021 22:43:44 +0200, Adrien Nayrat <adrien.nayrat@anayrat.info> wrote in
>> I also dumped 00000001000000AA000000A1 on the secondary and it
>> contains all the records until AA/A1004018.
>>
>> It is really weird, I don't understand how the secondary can miss the
>> last 2 records of A0? It seems he did not received the
>> CHECKPOINT_SHUTDOWN record?
>>
>> Any idea?
> 
> This seems like stepping on the same issue with [1], in short, the
> secondary having received an incomplete record but the primary forgot
> of the record after restart.
> 
> Specifically, primary was writing a WAL record that starts at A0FFFB70
> and continues to A1xxxxxx segment. The secondary successfully received
> the first half of the record but the primary failed to write (then
> send) the last half of the record due to disk full.
> 
> At this time it seems that the primary's last completed record ended
> at A0FFB70. Then the CHECKPOINT_SHUTDOWN record overwrote the
> already-halfly-sent record up to A0FFBE8 while restarting.
> 
> On the secondary side, there's only the first half of the record,
> which had been forgotten by the primary and the last half starting at
> LSN A1000000 was still the future in the new history on the primary.
> 
> After some time the primary reaches A1000000 but the first record in
> the segment is of course disagrees with the history of the secondary.
> 
> 1: https://www.postgresql.org/message-id/CBDDFA01-6E40-46BB-9F98-9340F4379505%40amazon.com
> 
> regards.
> 

Hello,

Thanks for your reply and your explanation! Now, I understand, it's good to know 
it is a known issue.
I'll follow this thread, I hope we will find a solution. It's annoying that your 
secondary breaks when your primary crash and the only solution is to either 
fetch an archived WAL file and replace it on the secondary, or completely 
rebuild your secondary.

Thanks





-- 
Adrien NAYRAT




pgsql-general by date:

Previous
From: Tom Lane
Date:
Subject: Re: Strange behavior of function date_trunc
Next
From: Droid Tools
Date:
Subject: Optimizing search query with sorting by creation field