Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic - Mailing list pgsql-hackers

From Peter Geoghegan
Subject Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
Date
Msg-id CAH2-WznvaKsC6-Z_jf3Y9CbNyk-rOY6Lfx+sJPmqebFg41nT2A@mail.gmail.com
Whole thread Raw
In response to Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic  (Justin Pryzby <pryzby@telsasoft.com>)
List pgsql-hackers
On Tue, Jun 8, 2021 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff,
t_bits,t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page'));
 
>  lp | lp_off | lp_flags | lp_len |  t_xmin   |  t_xmax   | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
         t_bits              | t_oid
 
>
----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+-------
>   1 |   1320 |        1 |    259 | 926025112 |         0 |        0 | (1,1)  |       32799 |      10499 |     32 |
11111111111111111111111000100000|
 

*** SNIP ***

>   6 |   7464 |        1 |    259 | 926014884 | 926025112 |        0 | (1,1)  |       49183 |       9475 |     32 |
11111111111111111111111000100000|
 

As I understand it from your remarks + gdb output from earlier [1],
the tuple at offset number 6 is the tuple that triggers the suspicious
"goto restart" here. There was a regular UPDATE (not a HOT UPDATE)
that produced a successor version on the same heap page -- which is lp
1. Here are the t_infomask details for both tuples:

lp 6: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_COMMITTED|HEAP_UPDATED
<-- points to (1,1)
lp 1: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_INVALID|HEAP_UPDATED
        <-- This is (1,1)

So if lp 1's xmin and lp 6's xmax XID/Xact committed (i.e., if XID
926025112 committed), why shouldn't HeapTupleSatisfiesVacuum() think
that lp 6 is DEAD (and not RECENTLY_DEAD)? You also say that
vacuumlazy.c's OldestXmin is 926025113, so it is hard to fault HTSV
here. The only way it could be wrong is if the hint bits were somehow
spuriously set, which seems unlikely to me.

[1] https://postgr.es/m/20210608113333.GC16435@telsasoft.com
-- 
Peter Geoghegan



pgsql-hackers by date:

Previous
From: Mark Dilger
Date:
Subject: logical replication of truncate command with trigger causes Assert
Next
From: Tom Lane
Date:
Subject: Re: logical replication of truncate command with trigger causes Assert