clog segment truncation - Mailing list pgsql-bugs

From Smolkin Grigory
Subject clog segment truncation
Date
Msg-id CAMp+ueZkG+HTWd0tajG4nDU_Vk51+wkngtuq7XjOjEBEW7t8Gw@mail.gmail.com
Whole thread Raw
Responses Re: clog segment truncation
Re: clog segment truncation
List pgsql-bugs
Hello, hackers!

I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment:
ERROR:  could not access status of transaction 1550558894
DETAIL:  Could not open file "pg_xact/05C6": No such file or directory.


After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG        len (rec/tot):     38/    38, tx:          0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039


After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum.

According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224)                  
wms-svc-inspecting-# ;
 all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
 t           | t          | f
(1 row)

So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ;

-[ RECORD 1 ]-------------+-----------------------------------------------------------------------------
lp                        | 1
lp_off                    | 7976
lp_flags                  | 1
lp_len                    | 216
t_xmin                    | 1550558894
t_xmax                    | 0
t_field3                  | 0
t_ctid                    | (5456224,1)
t_infomask2               | 12
t_infomask                | 2051
t_hoff                    | 32
t_bits                    | 1111111011100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2 ]-------------+-----------------------------------------------------------------------------
lp                        | 2
lp_off                    | 7768
lp_flags                  | 1
lp_len                    | 208
t_xmin                    | 1550558894
t_xmax                    | 0
t_field3                  | 0
t_ctid                    | (5456224,2)
t_infomask2               | 12
t_infomask                | 2307
t_hoff                    | 32
t_bits                    | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3 ]-------------+-----------------------------------------------------------------------------
lp                        | 3
lp_off                    | 7520
lp_flags                  | 1
lp_len                    | 248
t_xmin                    | 1550558894
t_xmax                    | 0
t_field3                  | 0
t_ctid                    | (5456224,3)
t_infomask2               | 12
t_infomask                | 2051
t_hoff                    | 32
t_bits                    | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4 ]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23 ]------------+-----------------------------------------------------------------------------
lp                        | 23
lp_off                    | 272
lp_flags                  | 1
lp_len                    | 312
t_xmin                    | 1550558894
t_xmax                    | 0
t_field3                  | 0
t_ctid                    | (5456224,23)
t_infomask2               | 12
t_infomask                | 2051
t_hoff                    | 32
t_bits                    | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})


Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname      | affected_relation
relfrozenxid | 1609822074

So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on

pgsql-bugs by date:

Previous
From: Heikki Linnakangas
Date:
Subject: Re: ecpg Fetch issue
Next
From: Heikki Linnakangas
Date:
Subject: Re: BUG #19108: Stack overflow duting query parse