We found that this problem appears also on shards with enabled checksums.
This shard has 1st timeline, which means there was no switchover after upgrade to 9.6.
xdb11f(master)=# select pg_current_xlog_location(),
pg_xlogfile_name(pg_current_xlog_location());pg_current_xlog_location| pg_xlogfile_name
--------------------------+--------------------------30BA/5966AD38 | 00000001000030BA00000059
(1 row)
xdb11f(master)=# select * from page_header(get_raw_page(‘mytable', 1787)); lsn | checksum | flags | lower |
upper| special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+-----------1F43/8C432C60 | -3337 |
5 | 256 | 304 | 8192 | 8192 | 4 | 0
(1 row)
xdb11h(replica)=# select * from page_header(get_raw_page(‘mytable', 1787)); lsn | checksum | flags | lower |
upper| special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+-----------1B28/45819C28 | -17617 |
5 | 256 | 304 | 8192 | 8192 | 4 | 0
(1 row)
xdb11e(replica)=# select * from page_header(get_raw_page('mytable', 1787)); lsn | checksum | flags | lower |
upper| special | pagesize | version | prune_xid
---------------+----------+-------+-------+-------+---------+----------+---------+-----------1B28/45819C28 | -17617 |
5 | 256 | 304 | 8192 | 8192 | 4 | 0
(1 row)
Master has newer page version and freeze bits.
xdb11f(master)=# select t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) from
heap_page_items(get_raw_page(‘mytable',1787)) where lp = 42; t_xmin | ?column?
-----------+----------------------------------516651778 | 00000000000000000000001100000000
(1 row)
xdb11h(replica)=# select t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) from
heap_page_items(get_raw_page('mytable',1787)) where lp = 42; t_xmin | ?column?
-----------+----------------------------------516651778 | 00000000000000000000000000000000
(1 row)
xdb11e(replica)=# select t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) from
heap_page_items(get_raw_page('mytable',1787)) where lp = 42; t_xmin | ?column?
-----------+----------------------------------516651778 | 00000000000000000000000000000000
(1 row)
It seems like replica did not replayed corresponding WAL records.
Any thoughts?
Regards,
Dmitriy Sarafannikov