clog segment truncation - Mailing list pgsql-bugs
| From | Smolkin Grigory |
|---|---|
| Subject | clog segment truncation |
| Date | |
| Msg-id | CAMp+ueZkG+HTWd0tajG4nDU_Vk51+wkngtuq7XjOjEBEW7t8Gw@mail.gmail.com Whole thread Raw |
| Responses |
Re: clog segment truncation
Re: clog segment truncation |
| List | pgsql-bugs |
Hello, hackers!
I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment:
ERROR: could not access status of transaction 1550558894
DETAIL: Could not open file "pg_xact/05C6": No such file or directory.
After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039
After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum.
According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224)
wms-svc-inspecting-# ;
all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
t | t | f
(1 row)
So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case
SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ;
-[ RECORD 1 ]-------------+-----------------------------------------------------------------------------
lp | 1
lp_off | 7976
lp_flags | 1
lp_len | 216
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,1)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111011100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2 ]-------------+-----------------------------------------------------------------------------
lp | 2
lp_off | 7768
lp_flags | 1
lp_len | 208
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,2)
t_infomask2 | 12
t_infomask | 2307
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3 ]-------------+-----------------------------------------------------------------------------
lp | 3
lp_off | 7520
lp_flags | 1
lp_len | 248
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,3)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4 ]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23 ]------------+-----------------------------------------------------------------------------
lp | 23
lp_off | 272
lp_flags | 1
lp_len | 312
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,23)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname | affected_relation
relfrozenxid | 1609822074
So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on
I`m currently investigating the case of one of our PostgreSQL instance having started issuing error messages about missing clog segment:
ERROR: could not access status of transaction 1550558894
DETAIL: Could not open file "pg_xact/05C6": No such file or directory.
After examining the WAL records just prior the first sighing of these errors, it became apparent, that clog segment was truncated by autovacuum
rmgr: CLOG len (rec/tot): 38/ 38, tx: 0, lsn: 81A/87C899E0, prev 81A/87C895A0, desc: TRUNCATE page 48264; oldestXact 1581542039
After looking closely at the affected heap pages (there are scores of them) with tuples, created by 1550558894 transaction, and comparing them with pg_visibility state and pg_class.relfrozenxig, it became apparent, that there is may be something awry with either setting all_frozen bit in _vm, or oldestXact calculation by vacuum.
According to pg_visibility, all_frozen bit is set:
wms-svc-inspecting=# select * from pg_visibility('affected_relation', 5456224)
wms-svc-inspecting-# ;
all_visible | all_frozen | pd_all_visible
-------------+------------+----------------
t | t | f
(1 row)
So all tuples on this page should be marked with HEAP_XMIN_FROZEN, but that is not the case
SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, heap_tuple_infomask_flags(t_infomask, t_infomask2) FROM heap_page_items(get_raw_page('affected_relation', 5456224)) ;
-[ RECORD 1 ]-------------+-----------------------------------------------------------------------------
lp | 1
lp_off | 7976
lp_flags | 1
lp_len | 216
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,1)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111011100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 2 ]-------------+-----------------------------------------------------------------------------
lp | 2
lp_off | 7768
lp_flags | 1
lp_len | 208
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,2)
t_infomask2 | 12
t_infomask | 2307
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMIN_COMMITTED,HEAP_XMAX_INVALID}",{})
-[ RECORD 3 ]-------------+-----------------------------------------------------------------------------
lp | 3
lp_off | 7520
lp_flags | 1
lp_len | 248
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,3)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
-[ RECORD 4 ]-------------+-----------------------------------------------------------------------------
....
-[ RECORD 23 ]------------+-----------------------------------------------------------------------------
lp | 23
lp_off | 272
lp_flags | 1
lp_len | 312
t_xmin | 1550558894
t_xmax | 0
t_field3 | 0
t_ctid | (5456224,23)
t_infomask2 | 12
t_infomask | 2051
t_hoff | 32
t_bits | 1111111111100000
heap_tuple_infomask_flags | ("{HEAP_HASNULL,HEAP_HASVARWIDTH,HEAP_XMAX_INVALID}",{})
Also pg_class.relfrozenxig contains xid which exceed 1550558894:
# select relname, relfrozenxid from pg_class where oid = 'affected_relation'::regclass::oid;
-[ RECORD 1 ]+-----------
relname | affected_relation
relfrozenxid | 1609822074
So is it possible that either all_frozen bit was set incorrectly or wansn't unset when tuples were created?
I will be happy to provide any additional information if required.
PostgreSQL version: 15.6, data_checksums: on
pgsql-bugs by date: