Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas - Mailing list pgsql-admin

From Alex Kliukin
Subject Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas
Date
Msg-id 1512033808.1345812.1189147312.07BDA54D@webmail.messagingengine.com
Whole thread Raw
In response to Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas  (Alvaro Herrera <alvherre@alvh.no-ip.org>)
List pgsql-admin
On Thu, Nov 30, 2017, at 00:22, Alvaro Herrera wrote:
> Alex Kliukin wrote:
> 
> > 2017-11-15 13:15:46.673 CET,,,15154,,5a0c2ff1.3b32,5,,2017-11-15
> > 13:15:45 CET,,0,PANIC,XX000,"replication checkpoint has wrong magic
> > 5714534 instead of 307747550",,,,,,,,,""
> 
> Uhh ... I had never heard of this "replication checkpoint" thing.  It is
> part of replication origins feature, which is fairly new stuff (see
> src/backend/replication/logical/origin.c).  I'd bet this problem is
> related to a bug in logical replication "origins" code rather than any
> procedural problems in your base-backup taking setup ...

We are not using logical replication or logical decoding on those hosts.
On the master, pg_replication_origin is empty as well as
pg_replication_slots

Those masters were fairly recently (around 2 months ago) upgraded from
9.3.

> 
> I wonder if there is some truncation of the 0x1257DADE value that
> produces the 5714534 value you're seeing -- something related to a
> pg_logical/replorigin_checkpoint file being written partially while the
> backup is being taken.

307747550 = 0x1257DADE
0001 0010 0101 0111 1101 1010 1101 1110

5714534 = 0x573266 = w2f ASCII
0000 0000 0101 0111 0011 0010 0110 0110

I see no patterns here.

What is interesting is that 0x573266 is actually mentioned in relcache.c

#define RELCACHE_INIT_FILENAME  "pg_internal.init"
#define RELCACHE_INIT_FILEMAGIC         0x573266        /* version ID
value */

it's a file magic for the relcache init files, but given that the copy
is performed by just compressing and decompressing the original files I
don't see how those 2 could be confused by software. 

> 
> Another point towards not including pg_logical/ contents when taking a
> base backup, I guess ...

In our case wouldn't it just mask the real issue?
-- 
Sincerely,
Alex


pgsql-admin by date:

Previous
From: Marco Nietz
Date:
Subject: Re: Barman WAL size issue
Next
From: Alex Kliukin
Date:
Subject: Re: 'replication checkpoint has wrong magic' on the newly clonedreplicas