[HACKERS] Broken hint bits (freeze) - Mailing list pgsql-hackers

From Dmitriy Sarafannikov
Subject [HACKERS] Broken hint bits (freeze)
Date
Msg-id DA18C5E1-A115-4C1C-9F7C-E7B9A5F3EBC5@yandex.ru
Whole thread Raw
Responses Re: [HACKERS] Broken hint bits (freeze)  (Dmitriy Sarafannikov <dsarafannikov@yandex.ru>)
Re: [HACKERS] Broken hint bits (freeze)  (Amit Kapila <amit.kapila16@gmail.com>)
List pgsql-hackers
Hi hackers,

We have some problems on our production with hint bits and frozen tuples.
More and more following errors began to appear on master after switchover:
ERROR:  58P01: could not access status of transaction 1952523525
DETAIL:  Could not open file "pg_clog/0746": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:896

We investigated the problem with pageinspect and found the tuples that are the cause:

xdb311g(master)=# select * from mytable where ctid = '(4,21)';
ERROR:  58P01: could not access status of transaction 1951521353
DETAIL:  Could not open file "pg_clog/0745": No such file or directory.
LOCATION:  SlruReportIOError, slru.c:896

But the same query successfully executed on replica.

We found some difference in hint bits between master and replica:

xdb311g(master)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) FROM
heap_page_items(get_raw_page(‘mytable',4))where lp=21; 
-[ RECORD 1 ]------------------------------
t_xmin   | 1951521353
?column? | 00000000000000000000000000000000

old master, now replica:
xdb311e(replica)=# SELECT t_xmin, t_infomask::bit(32) & X'0300'::int::bit(32) FROM
heap_page_items(get_raw_page(‘mytable',4))where lp=21; 
-[ RECORD 1 ]------------------------------
t_xmin   | 1951521353
?column? | 00000000000000000000001100000000

X’0300’ = HEAP_XMIN_FROZEN according to

#define HEAP_XMIN_COMMITTED     0x0100  /* t_xmin committed */
#define HEAP_XMIN_INVALID       0x0200  /* t_xmin invalid/aborted */
#define HEAP_XMIN_FROZEN        (HEAP_XMIN_COMMITTED|HEAP_XMIN_INVALID)

xdb311g(master)=# select relfrozenxid from pg_class where relname = ‘mytable';
relfrozenxid
--------------
2266835605
(1 row)

This tuple must be frozen but there are no set bits HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID on master

Another interesting thing that LSN of this page on master and replica are not the same:
xdb311g(master)=# select lsn from page_header(get_raw_page(‘mytable',4));  lsn
---------------
8092/6A26DD08
(1 row)

xdb311e(replica)=# select lsn from page_header(get_raw_page(‘mytable',4));  lsn
---------------
838D/C4A0D280
(1 row)

And LSN on replica is greater that LSN on master (838D/C4A0D280 > 8092/6A26DD08)
How can this be possible?

We wrote a query which returns ctid of frozen tuples, which must be frozen but not actually frozen.

xdb311e(replica)=# select t_ctid from generate_series(0, pg_relation_size(‘mytable')/8192 - 1 ) s(i) left join lateral
heap_page_items(get_raw_page(‘mytable',s.i::int))on true where t_xmin::text::bigint < (select
relfrozenxid::text::bigintfrom pg_class where relname = ‘mytable') and t_infomask & X'0300'::int < 1; 
t_ctid
-----------
(400,16)
(2837,71)
(2837,72)
(2837,73)
(2837,75)
(2837,76)
(3042,40)
(4750,80)
(4750,81)
(5214,60)
(5214,65)
(6812,31)
(6912,63)
(7329,8)
(7374,26)
(7374,27)
(16 rows)
Same query on master returns 317 rows.

Our thoughts:
1) We think that it is related to switchover.
2) Any WAL-logged modification of this page on master will replace this page on replica due to full page writes.And all
replicaswill have broken hint bits too. It’s dangerous. 

Where to dig further?

RHEL6, PostgreSQL 9.6.3, wal_log_hints=off, full_page_writes=on, fsync=on, checksums disabled.
We don’t think that it is any hardware-related problems because this databases started from 9.4
and they survived 2 upgrades with pg_upgrade. And any hardware-related problems was not detected.
Problem appears not only in this shard.
Size of each shard is around 5TB and we can’t provide data.

Regards
Dmitriy Sarafannikov


pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Re: [HACKERS] Getting server crash after running sqlsmith
Next
From: Petr Jelinek
Date:
Subject: Re: [HACKERS] walsender & parallelism