visibility map corruption - Mailing list pgsql-hackers

From Floris Van Nee
Subject visibility map corruption
Date
Msg-id 87da83168c644fd9aae38f546cc70295@opammb0562.comp.optiver.com
Whole thread Raw
Responses Re: visibility map corruption  (Peter Geoghegan <pg@bowt.ie>)
List pgsql-hackers
Hi hackers,

We recently ran into an issue where the visibility map of a relation was corrupt, running Postgres 12.4. The error we'd
getwhen running a SELECT * from this table is:
 

could not access status of transaction 3704450152
DETAIL:  Could not open file "pg_xact/0DCC": No such file or directory.

On the lists I could find several similar reports, but corruption like this could obviously have a very wide range of
rootcauses.. see [1] [2] [3] for example - not all of them have their root cause known.
 

This particular case was similar to reported cases above, but also has some differences.

The following query returns ~21.000 rows, which indicates something inconsistent between the visibility map and the
pd_all_visibleflag on the page:
 

select * from pg_check_frozen('tbl');

Looking at one of the affected pages with pageinspect:

=# SELECT lp,lp_off,lp_flags,lp_len,t_xmin,t_xmax,t_field3,t_ctid,t_infomask2,t_infomask,t_hoff,t_oid FROM
heap_page_items(get_raw_page('tbl',726127));
 

┌────┬────────┬──────────┬────────┬────────────┬────────────┬──────────┬────────────┬─────────────┬────────────┬────────┬───────┐
│ lp │ lp_off │ lp_flags │ lp_len │   t_xmin   │   t_xmax   │ t_field3 │   t_ctid   │ t_infomask2 │ t_infomask │ t_hoff
│t_oid │
 

├────┼────────┼──────────┼────────┼────────────┼────────────┼──────────┼────────────┼─────────────┼────────────┼────────┼───────┤
│  1 │   6328 │        1 │   1864 │ 3704450155 │ 3704450155 │        1 │ (726127,1) │         249 │       8339 │     56
│    ∅ │
 
│  2 │   4464 │        1 │   1864 │ 3704450155 │ 3704450155 │        1 │ (726127,2) │         249 │       8339 │     56
│    ∅ │
 
│  3 │   2600 │        1 │   1864 │ 3704450155 │ 3704450155 │        1 │ (726127,3) │         249 │       8339 │     56
│    ∅ │
 
│  4 │    680 │        1 │   1920 │ 3704450155 │ 3704450155 │        1 │ (726127,4) │         249 │       8339 │     56
│    ∅ │
 

└────┴────────┴──────────┴────────┴────────────┴────────────┴──────────┴────────────┴─────────────┴────────────┴────────┴───────┘

t_infomask shows that HEAP_XMIN_COMMITTED and HEAP_XMIN_INVALID bits are both unset.
This pg_visibility() call shows the inconsistency between VM and page, with PD_ALL_VISIBLE=false

=# select * from pg_visibility('tbl', 726127);
┌─────────────┬────────────┬────────────────┐
│ all_visible │ all_frozen │ pd_all_visible │
├─────────────┼────────────┼────────────────┤
│ t           │ t          │ f              │
└─────────────┴────────────┴────────────────┘

Looking at other pages show the same information.
What's interesting is that out of the affected tuples returned by pg_check_frozen, over 99% belong to 1 transaction
(3704450155as seen above) and the remaining few are from one other transaction that occurred at roughly the same time.
 
I find it hard to believe that this is due to some random bit flipping, because many pages are affected, but the
"incorrect"ones are in total only from two specific transactions which occurred at roughly the same time. There were
alsono server crashes or other known failures around the time of this transaction. I'm not ruling out any other kind of
failurestill, but I also cannot really explain how this could have happened.
 

The server has PG12.4 with full_page_writes=on, data_checksums=off. It's a large analytics database. The VM
inconsistenciesalso occur on the streaming replicas.
 

I realize these cases are pretty rare and hard to debug, but I wanted to share the information I found so far here for
reference.Maybe someone has an idea what occurred, or maybe someone in the future finds it useful when he finds
somethingsimilar.
 

I have no idea how the inconsistency between VM and PD_ALL_VISIBLE started - from looking through the code I can't
reallyfind any way how this could occur. However, for it to lead to the problem described here, I believe there should
be*no* SELECT that touches that particular page after the insert/update transaction and before the transaction log gets
truncated.If the page is read before the transaction log gets truncated, then the hint bit HEAP_XMIN_COMMITTED will get
setand future reads will succeed regardless of tx log truncation. One of the replica's had this happen to it: the
affectedpages are identical to the primary except that the HEAP_XMIN_COMMITTED flag is set (note that the VM
inconsistencyis still there on the replica though: PD_ALL_VISIBLE=false even though VM shows that
all_frozen=all_visible=true).But I can query these rows on the replica without issues, because it doesn't check the tx
logwhen it sees that HEAP_XMIN_COMMITTED is set.
 

-Floris

[1] https://postgrespro.com/list/thread-id/2422376
[2] https://postgrespro.com/list/thread-id/2501800
[3] https://postgrespro.com/list/thread-id/2321949


pgsql-hackers by date:

Previous
From: Fabien COELHO
Date:
Subject: Re: rand48 replacement
Next
From: Peter Geoghegan
Date:
Subject: Re: visibility map corruption