Thread: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

06 June 2021, 16:35:31

An internal instance was rejecting connections with "too many clients".
I found a bunch of processes waiting on a futex and I was going to upgrade the
kernel (3.10.0-514) and dismiss the issue.

However, I also found an autovacuum chewing 100% CPU, and it appears the
problem is actually because autovacuum has locked a page of pg-statistic, and
every other process then gets stuck waiting in the planner.  I checked a few
and found these:

#13 0x0000000000961908 in SearchSysCache3 (cacheId=cacheId@entry=59, key1=key1@entry=2610, key2=key2@entry=2,
key3=key3@entry=0)at syscache.c:1156
 

As for the autovacuum:

$ ps -wwf 18950
UID        PID  PPID  C STIME TTY      STAT   TIME CMD
postgres 18950  7179 93 Jun04 ?        ts   2049:20 postgres: autovacuum worker ts

(gdb)
#0  0x00000000004f995c in heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7ffe7a0cd0c0,
tup=tup@entry=0x7ffe7a0cce50,buffer=buffer@entry=14138) at pruneheap.c:423
 
#1  0x00000000004fa154 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, rootoffnum=11, buffer=14138) at pruneheap.c:644
#2  heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0
<GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0,
 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=<optimized out>,
off_loc@entry=0x1d1b3fc)at pruneheap.c:278
 
#3  0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75,
page=page@entry=0x2aaab2089e00"G\f",
 
    vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7ffe7a0ced80) at
vacuumlazy.c:1712
#4  0x0000000000500263 in lazy_scan_heap (aggressive=<optimized out>, params=0x1c77b7c, vacrel=<optimized out>) at
vacuumlazy.c:1347
#5  heap_vacuum_rel (rel=0x7f0349466d28, params=0x1c77b7c, bstrategy=<optimized out>) at vacuumlazy.c:612
#6  0x000000000067418a in table_relation_vacuum (bstrategy=<optimized out>, params=0x1c77b7c, rel=0x7f0349466d28) at
../../../src/include/access/tableam.h:1678
#7  vacuum_rel (relid=2619, relation=<optimized out>, params=params@entry=0x1c77b7c) at vacuum.c:2001
#8  0x000000000067556e in vacuum (relations=0x1cc5008, params=params@entry=0x1c77b7c, bstrategy=<optimized out>,
bstrategy@entry=0x1c77400,isTopLevel=isTopLevel@entry=true) at vacuum.c:461
 
#9  0x0000000000783c13 in autovacuum_do_vac_analyze (bstrategy=0x1c77400, tab=0x1c77b78) at autovacuum.c:3284
#10 do_autovacuum () at autovacuum.c:2537
#11 0x0000000000784073 in AutoVacWorkerMain (argv=0x0, argc=0) at autovacuum.c:1715
#12 0x00000000007841c9 in StartAutoVacWorker () at autovacuum.c:1500
#13 0x0000000000792b9c in StartAutovacuumWorker () at postmaster.c:5547
#14 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5251
#15 <signal handler called>
#16 0x00007f0346c56783 in __select_nocancel () from /lib64/libc.so.6
#17 0x000000000048ee7d in ServerLoop () at postmaster.c:1709
#18 0x0000000000793e98 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1bed280) at postmaster.c:1417
#19 0x0000000000491272 in main (argc=3, argv=0x1bed280) at main.c:209

heap_page_prune() is being called repeatedly, with (I think) the same arguments.

(gdb) c
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138,
vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, 
 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at
pruneheap.c:225
225     in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138,
vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, 
 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at
pruneheap.c:225
225     in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138,
vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, 
 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at
pruneheap.c:225
225     in pruneheap.c
(gdb) 
Continuing.

Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138,
vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, 
 
    old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at
pruneheap.c:225
225     in pruneheap.c

(gdb) p *vacrel
$3 = {rel = 0x7f0349466d28, indrels = 0x1d1b500, nindexes = 1, do_index_vacuuming = true, do_index_cleanup = true,
do_failsafe= false, bstrategy = 0x1c77400, lps = 0x0, old_rel_pages = 80, 
 
  old_live_tuples = 1101, relfrozenxid = 909081649, relminmxid = 53341561, OldestXmin = 913730329, FreezeLimit =
863730329,MultiXactCutoff = 48553302, relnamespace = 0x1d1b520 "pg_catalog", 
 
  relname = 0x1d1b548 "pg_statistic", indname = 0x0, blkno = 75, offnum = 15, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP,
dead_tuples= 0x1ccef10, rel_pages = 85, scanned_pages = 76, 
 
  pinskipped_pages = 0, frozenskipped_pages = 0, tupcount_pages = 76, pages_removed = 0, lpdead_item_pages = 65,
nonempty_pages= 75, lock_waiter_detected = false, new_rel_tuples = 0, 
 
  new_live_tuples = 0, indstats = 0x1d1b590, num_index_scans = 0, tuples_deleted = 757, lpdead_items = 1103,
new_dead_tuples= 0, num_tuples = 973, live_tuples = 973}
 

(gdb) p *rel
$2 = {rd_node = {spcNode = 1663, dbNode = 16886, relNode = 107230415}, rd_smgr = 0x1d0a670, rd_refcnt = 1, rd_backend =
-1,rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, 
 
  rd_indexvalid = true, rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_firstRelfilenodeSubid
=0, rd_droppedSubid = 0, rd_rel = 0x7f0349466f40, rd_att = 0x7f0349467058, 
 
  rd_id = 2619, rd_lockInfo = {lockRelId = {relId = 2619, dbId = 16886}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc =
0x0,rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, 
 
  rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0, rd_partdesc_nodetached = 0x0, rd_pddcxt =
0x0,rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, 
 
  rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x7f0349498e40, rd_pkindex = 2696, rd_replidindex =
0,rd_statlist = 0x0, rd_indexattr = 0x7f0349498ee8, 
 
  rd_keyattr = 0x7f0349498e98, rd_pkattr = 0x7f0349498ec0, rd_idattr = 0x0, rd_pubactions = 0x0, rd_options = 0x0,
rd_amhandler= 3, rd_tableam = 0x9ccfc0 <heapam_methods>, rd_index = 0x0, 
 
  rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0,
rd_supportinfo= 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, 
 
  rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache =
0x0,rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x1c88940}
 

(gdb) p *relation->rd_rel
$8 = {oid = 2619, relname = {data = "pg_statistic", '\000' <repeats 51 times>}, relnamespace = 11, reltype = 13029,
reloftype= 0, relowner = 10, relam = 2, relfilenode = 107230415, 
 
  reltablespace = 0, relpages = 80, reltuples = 1101, relallvisible = 11, reltoastrelid = 2840, relhasindex = true,
relisshared= false, relpersistence = 112 'p', relkind = 114 'r', relnatts = 31, 
 
  relchecks = 0, relhasrules = false, relhastriggers = false, relhassubclass = false, relrowsecurity = false,
relforcerowsecurity= false, relispopulated = true, relreplident = 110 'n', 
 
  relispartition = false, relrewrite = 0, relfrozenxid = 909081649, relminmxid = 53341561}

(gdb) info locals
rel = 0x7f0349466d28
offnum = 4
maxoff = 21
itemid = 0x2aaab2089e24
tuple = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data =
0x2aaab208bbc8}

(gdb) p *itemid
$2 = {lp_off = 7624, lp_flags = 1, lp_len = 564}

I'll leave the instance running for a little bit before restarting (or kill-9)
in case someone requests more info.

See also:
https://www.postgresql.org/message-id/2591376.1621196582@sss.pgh.pa.us

These commits may be relevant.

commit 3c3b8a4b26891892bccf3d220580a7f413c0b9ca
Author: Peter Geoghegan <pg@bowt.ie>
Date:   Wed Apr 7 08:47:15 2021 -0700

    Truncate line pointer array during VACUUM.

commit 7ab96cf6b312cfcd79cdc1a69c6bdb75de0ed30f
Author: Peter Geoghegan <pg@bowt.ie>
Date:   Tue Apr 6 07:49:39 2021 -0700

    Refactor lazy_scan_heap() loop.

commit 8523492d4e349c4714aa2ab0291be175a88cb4fc
Author: Peter Geoghegan <pg@bowt.ie>
Date:   Tue Apr 6 08:49:22 2021 -0700

    Remove tupgone special case from vacuumlazy.c.

commit dc7420c2c9274a283779ec19718d2d16323640c0
Author: Andres Freund <andres@anarazel.de>
Date:   Wed Aug 12 16:03:49 2020 -0700

    snapshot scalability: Don't compute global horizons while building snapshots.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

06 June 2021, 17:26:22

On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> An internal instance was rejecting connections with "too many clients".
> I found a bunch of processes waiting on a futex and I was going to upgrade the
> kernel (3.10.0-514) and dismiss the issue.
>
> However, I also found an autovacuum chewing 100% CPU, and it appears the
> problem is actually because autovacuum has locked a page of pg-statistic, and
> every other process then gets stuck waiting in the planner.  I checked a few
> and found these:

My suspicion is that for some tuple on that page
HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
thinks should have been cleaned up by heap_page_prune, but isn't. This
would result in an infinite loop in lazy_scan_prune where the
condition on vacuumlazy.c:1800 will always be true, but the retry will
not do the job it's expected to do.

Apart from reporting this suspicion, I sadly can't help you much
further, as my knowledge and experience on vacuum and snapshot
horizons is only limited and probably won't help you in this.

I think it would be helpful for further debugging if we would have the
state of the all tuples on that page (well, the tuple headers with
their transactionids and their line pointers), as that would help with
determining if my suspicion could be correct.

With regards,

Matthias van de Meent

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Tom Lane

Date:

06 June 2021, 17:59:10

Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote:
>> However, I also found an autovacuum chewing 100% CPU, and it appears the
>> problem is actually because autovacuum has locked a page of pg-statistic, and
>> every other process then gets stuck waiting in the planner.  I checked a few
>> and found these:

> My suspicion is that for some tuple on that page
> HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
> thinks should have been cleaned up by heap_page_prune, but isn't. This
> would result in an infinite loop in lazy_scan_prune where the
> condition on vacuumlazy.c:1800 will always be true, but the retry will
> not do the job it's expected to do.

Since Justin's got a debugger on the process already, it probably
wouldn't be too hard to confirm or disprove that theory by stepping
through the code.

            regards, tom lane

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

06 June 2021, 18:00:38

On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I'll leave the instance running for a little bit before restarting (or kill-9)
> in case someone requests more info.

How about dumping the page image out, and sharing it with the list?
This procedure should work fine from gdb:

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB

I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
imagine that you have the instance already stuck in an infinite loop,
so what we'll probably see from the page image is the page after the
first prune and another no-progress prune.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

"Andres Freund"

Date:

06 June 2021, 18:01:54

Hi, 

On Sun, Jun 6, 2021, at 10:59, Tom Lane wrote:
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >> However, I also found an autovacuum chewing 100% CPU, and it appears the
> >> problem is actually because autovacuum has locked a page of pg-statistic, and
> >> every other process then gets stuck waiting in the planner.  I checked a few
> >> and found these:
> 
> > My suspicion is that for some tuple on that page
> > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
> > thinks should have been cleaned up by heap_page_prune, but isn't. This
> > would result in an infinite loop in lazy_scan_prune where the
> > condition on vacuumlazy.c:1800 will always be true, but the retry will
> > not do the job it's expected to do.
> 
> Since Justin's got a debugger on the process already, it probably
> wouldn't be too hard to confirm or disprove that theory by stepping
> through the code.

If that turns out to be the issue, it'd be good to check what prevents the tuple from being considered fully dead, by
steppingthrough the visibility test...
 

Andres

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

06 June 2021, 18:43:11

On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I'll leave the instance running for a little bit before restarting (or kill-9)
> > in case someone requests more info.
> 
> How about dumping the page image out, and sharing it with the list?
> This procedure should work fine from gdb:

Sorry, but I already killed the process to try to follow Matthias' suggestion.
I have a core file from "gcore" but it looks like it's incomplete and the
address is now "out of bounds"...

#2  0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75,
page=page@entry=0x2aaab2089e00<Address 0x2aaab2089e00 out of bounds>,

I saved a copy of the datadir, but a manual "vacuum" doesn't trigger the
problem.  So if Matthias' theory is right, it seems like there may be a race
condition.  Maybe that goes without saying.

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

06 June 2021, 19:03:53

On Sun, Jun 6, 2021 at 11:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> Sorry, but I already killed the process to try to follow Matthias' suggestion.
> I have a core file from "gcore" but it looks like it's incomplete and the
> address is now "out of bounds"...

Based on what you said about ending up back in lazy_scan_prune()
alone, I think he's right. That is, I agree that it's very likely that
the stuck VACUUM would not have become stuck had the "goto retry on
HEAPTUPLE_DEAD inside lazy_scan_prune" thing not been added by commit
8523492d4e3. But that in itself doesn't necessarily implicate commit
8523492d4e3.

The interesting question is: Why doesn't heap_page_prune() ever agree
with HeapTupleSatisfiesVacuum() calls made from lazy_scan_prune(), no
matter how many times the call to heap_page_prune() is repeated? (It's
repeated to try to resolve the disagreement that aborted xacts can
sometimes cause.)

If I had to guess I'd say that the underlying problem has something to
do with heap_prune_satisfies_vacuum() not agreeing with
HeapTupleSatisfiesVacuum(), perhaps only when GlobalVisCatalogRels is
used. But that's a pretty wild guess at this point.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

06 June 2021, 20:06:23

On Sun, Jun 06, 2021 at 07:26:22PM +0200, Matthias van de Meent wrote:
> I think it would be helpful for further debugging if we would have the
> state of the all tuples on that page (well, the tuple headers with
> their transactionids and their line pointers), as that would help with
> determining if my suspicion could be correct.

This is the state of the page after I killed the cluster and started a
postmaster on a copy of its datadir, with autovacuum=off:

SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid
FROMheap_page_items(get_raw_page('pg_statistic', 75)) ;
 
 lp | lp_off | lp_flags | lp_len |  t_xmin   |  t_xmax   | t_field3 | t_ctid  | t_infomask2 | t_infomask | t_hoff |
        t_bits              | t_oid 
 

----+--------+----------+--------+-----------+-----------+----------+---------+-------------+------------+--------+----------------------------------+-------
  1 |      0 |        3 |      0 |           |           |          |         |             |            |        |
                            |      
 
  2 |      0 |        3 |      0 |           |           |          |         |             |            |        |
                            |      
 
  3 |      0 |        3 |      0 |           |           |          |         |             |            |        |
                            |      
 
  4 |   7624 |        1 |    564 | 913726913 | 913730328 |        0 | (83,19) |          31 |       8451 |     32 |
11111111111111111111101000100000|      
 
  5 |   6432 |        1 |   1188 | 913726913 | 913730328 |        0 | (83,20) |          31 |       8451 |     32 |
11111111111111111111110100110000|      
 
  6 |   6232 |        1 |    195 | 913726913 | 913730328 |        0 | (83,21) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
  7 |   6032 |        1 |    195 | 913726913 | 913730328 |        0 | (83,22) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
  8 |   5848 |        1 |    181 | 913726913 | 913730328 |        0 | (83,23) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
  9 |   5664 |        1 |    181 | 913726913 | 913730328 |        0 | (81,13) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 10 |   5488 |        1 |    176 | 913726913 | 913730328 |        0 | (81,14) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 11 |   5312 |        1 |    176 | 913726913 | 913730328 |        0 | (82,13) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 12 |   5128 |        1 |    181 | 913726913 | 913730328 |        0 | (79,57) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 13 |   4952 |        1 |    176 | 913726913 | 913730328 |        0 | (80,25) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 14 |   4776 |        1 |    176 | 913726913 | 913730328 |        0 | (80,26) |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 15 |   4600 |        1 |    176 | 913726913 | 913730328 |        0 | (84,1)  |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 16 |   4424 |        1 |    176 | 913726913 | 913730328 |        0 | (84,2)  |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 17 |   4248 |        1 |    176 | 913726913 | 913730328 |        0 | (84,3)  |          31 |       8451 |     32 |
11111111111111111111111000100000|      
 
 18 |   2880 |        1 |   1364 | 913726913 | 913730328 |        0 | (84,4)  |          31 |       8451 |     32 |
11111111111111111111110100110000|      
 
 19 |   2696 |        1 |    179 | 913726914 |         0 |        0 | (75,19) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 
 20 |   2520 |        1 |    176 | 913726914 |         0 |        0 | (75,20) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 
 21 |   2336 |        1 |    179 | 913726914 |         0 |        0 | (75,21) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 
(21 rows)

(In the interest of full disclosure, this was a dumb cp -a of the live datadir
where the processes had been stuck for a day, and where I was unable to open a
client session).

8451 = HEAP_KEYS_UPDATED + 259 atts?

Note that after I vacuum pg_statistic, it looks like this:

ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits,
t_oidFROM heap_page_items(get_raw_page('pg_statistic', 75));
 
 lp | lp_off | lp_flags | lp_len |  t_xmin   | t_xmax | t_field3 | t_ctid  | t_infomask2 | t_infomask | t_hoff |
     t_bits              | t_oid 
 

----+--------+----------+--------+-----------+--------+----------+---------+-------------+------------+--------+----------------------------------+-------
  1 |      0 |        0 |      0 |           |        |          |         |             |            |        |
                         |
 
...
 18 |      0 |        0 |      0 |           |        |          |         |             |            |        |
                         |
 
 19 |   8008 |        1 |    179 | 913726914 |      0 |        0 | (75,19) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 
 20 |   7832 |        1 |    176 | 913726914 |      0 |        0 | (75,20) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 
 21 |   7648 |        1 |    179 | 913726914 |      0 |        0 | (75,21) |          31 |      10499 |     32 |
11111111111111111111111000100000|      
 

ts=# VACUUM VERBOSE pg_statistic;
|INFO:  vacuuming "pg_catalog.pg_statistic"
|INFO:  scanned index "pg_statistic_relid_att_inh_index" to remove 278403 row versions
|DETAIL:  CPU: user: 0.10 s, system: 0.00 s, elapsed: 0.14 s
|INFO:  "pg_statistic": removed 278403 dead item identifiers in 4747 pages
|DETAIL:  CPU: user: 0.12 s, system: 0.07 s, elapsed: 1.99 s
|INFO:  index "pg_statistic_relid_att_inh_index" now contains 1101 row versions in 758 pages
|DETAIL:  277271 index row versions were removed.
|747 index pages were newly deleted.
|747 index pages are currently deleted, of which 0 are currently reusable.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
|INFO:  "pg_statistic": found 277216 removable, 1101 nonremovable row versions in 4758 out of 4758 pages
|DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 920282115
|0 pages removed.
|Skipped 0 pages due to buffer pins, 0 frozen pages.
|CPU: user: 1.75 s, system: 0.18 s, elapsed: 6.52 s.
|INFO:  "pg_statistic": truncated 4758 to 96 pages
|DETAIL:  CPU: user: 0.02 s, system: 0.02 s, elapsed: 0.06 s
|INFO:  vacuuming "pg_toast.pg_toast_2619"
|INFO:  scanned index "pg_toast_2619_index" to remove 30 row versions
|DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
|INFO:  "pg_toast_2619": removed 30 dead item identifiers in 11 pages
|DETAIL:  CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
|INFO:  index "pg_toast_2619_index" now contains 115 row versions in 2 pages
|DETAIL:  3 index row versions were removed.
|0 index pages were newly deleted.
|0 index pages are currently deleted, of which 0 are currently reusable.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s.
|INFO:  "pg_toast_2619": found 29 removable, 115 nonremovable row versions in 43 out of 43 pages
|DETAIL:  0 dead row versions cannot be removed yet, oldest xmin: 920282115
|0 pages removed.
|Skipped 0 pages due to buffer pins, 0 frozen pages.
|CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s.
|VACUUM

Before:
 pg_catalog | pg_statistic | table | postgres | permanent   | heap          | 38 MB | 
After:
 pg_catalog | pg_statistic | table | postgres | permanent   | heap          | 1192 kB | 

I also have nearly-intact bt f from the partial core:

#0  0x00000000004fa063 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, rootoffnum=4, buffer=14138) at pruneheap.c:592
        lp = <optimized out>
        tupdead = <optimized out>
        recent_dead = <optimized out>
        rootlp = 0x2aaab2089e24
        nchain = 0
        tup = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data =
0x2aaab208bbc8}
        ndeleted = 0
        priorXmax = 0
        htup = <optimized out>
        maxoff = 21
        offnum = 4
...
#1  heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0
<GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0,
 
    report_stats=report_stats@entry=false, off_loc=<optimized out>, off_loc@entry=0x1d1b3fc) at pruneheap.c:278
        itemid = 0x2aaab2089e24
        ndeleted = 0
        page = 0x2aaab2089e00 <Address 0x2aaab2089e00 out of bounds>
        offnum = 4
        maxoff = 21
...
#2  0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75,
page=page@entry=0x2aaab2089e00<Address 0x2aaab2089e00 out of bounds>,
 
    vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7ffe7a0ced80) at
vacuumlazy.c:1712
        rel = 0x7f0349466d28
        offnum = 4
        maxoff = 21
        itemid = 0x2aaab2089e24
        tuple = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data =
0x2aaab208bbc8}
        res = <optimized out>
        tuples_deleted = 0
        lpdead_items = 0
        new_dead_tuples = 0
        num_tuples = 0
        live_tuples = 0
        nfrozen = 0

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 11:03:29

On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I'll leave the instance running for a little bit before restarting (or kill-9)
> > in case someone requests more info.
> 
> How about dumping the page image out, and sharing it with the list?
> This procedure should work fine from gdb:
> 
>
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB

> I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> imagine that you have the instance already stuck in an infinite loop,
> so what we'll probably see from the page image is the page after the
> first prune and another no-progress prune.

The cluster was again rejecting with "too many clients already".

I was able to open a shell this time, but it immediately froze when I tried to
tab complete "pg_stat_acti"...

I was able to dump the page image, though - attached.  I can send you its
"data" privately, if desirable.  I'll also try to step through this.

postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff,
t_bits,t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page'));
 
 lp | lp_off | lp_flags | lp_len |  t_xmin   |  t_xmax   | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
       t_bits              | t_oid 
 

----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+-------
  1 |   1320 |        1 |    259 | 926025112 |         0 |        0 | (1,1)  |       32799 |      10499 |     32 |
11111111111111111111111000100000|      
 
  2 |   8088 |        1 |    104 | 926018702 |         0 |        0 | (1,2)  |       32799 |      10497 |     32 |
11111111111111111111100000000000|      
 
  3 |      0 |        0 |      0 |           |           |          |        |             |            |        |
                           |      
 
  4 |   7904 |        1 |    179 | 926018702 |         0 |        0 | (1,4)  |       32799 |      10499 |     32 |
11111111111111111111111000100000|      
 
  5 |   7728 |        1 |    176 | 926018702 |         0 |        0 | (1,5)  |       32799 |      10499 |     32 |
11111111111111111111111000100000|      
 
  6 |   7464 |        1 |    259 | 926014884 | 926025112 |        0 | (1,1)  |       49183 |       9475 |     32 |
11111111111111111111111000100000|      
 
  7 |      2 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
  8 |      4 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
  9 |     19 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
 10 |      0 |        0 |      0 |           |           |          |        |             |            |        |
                           |      
 
 11 |     20 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
 12 |   5792 |        1 |   1666 | 926014887 |         0 |        0 | (1,12) |          31 |      10499 |     32 |
11111111111111111111101000100000|      
 
 13 |      5 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
 14 |   3912 |        1 |   1880 | 925994211 |         0 |        0 | (1,14) |          31 |      10499 |     32 |
11111111111111111111110100110000|      
 
 15 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 16 |     18 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
 17 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 18 |   1936 |        1 |   1976 | 926013259 |         0 |        0 | (1,18) |       32799 |      10499 |     32 |
11111111111111111111110100110000|      
 
 19 |   1760 |        1 |    176 | 926014887 |         0 |        0 | (1,19) |       32799 |      10499 |     32 |
11111111111111111111111000100000|      
 
 20 |   1584 |        1 |    176 | 926014889 |         0 |        0 | (1,20) |       32799 |      10499 |     32 |
11111111111111111111111000100000|      
 
 21 |      6 |        2 |      0 |           |           |          |        |             |            |        |
                           |      
 
 22 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 23 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 24 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 25 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 26 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 27 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
 28 |      0 |        3 |      0 |           |           |          |        |             |            |        |
                           |      
 
(28 rows)

No great surprise that it's again in pg_statistic.

#0  GetPrivateRefCountEntry (buffer=buffer@entry=411, do_move=do_move@entry=false) at bufmgr.c:313
#1  0x00000000007ecb4f in GetPrivateRefCount (buffer=411) at bufmgr.c:398
#2  BufferGetBlockNumber (buffer=buffer@entry=411) at bufmgr.c:2762
#3  0x00000000004fa0f3 in heap_prune_chain (prstate=0x7fff7e4a9180, rootoffnum=7, buffer=411) at pruneheap.c:625
#4  heap_page_prune (relation=relation@entry=0x7fe636faed28, buffer=buffer@entry=411, vistest=vistest@entry=0xe7bcc0
<GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, 
 
    report_stats=report_stats@entry=false, off_loc=<optimized out>, off_loc@entry=0x12b433c) at pruneheap.c:278
#5  0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1,
page=page@entry=0x2aaaab54be00"J\f", vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, 
 
    prunestate=prunestate@entry=0x7fff7e4aae40) at vacuumlazy.c:1712
#6  0x0000000000500263 in lazy_scan_heap (aggressive=<optimized out>, params=0x12ce89c, vacrel=<optimized out>) at
vacuumlazy.c:1347
#7  heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612
#8  0x000000000067418a in table_relation_vacuum (bstrategy=<optimized out>, params=0x12ce89c, rel=0x7fe636faed28) at
../../../src/include/access/tableam.h:1678
#9  vacuum_rel (relid=2619, relation=<optimized out>, params=params@entry=0x12ce89c) at vacuum.c:2001

-- 
Justin

Attachment

dump_block.page.gz

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 11:33:33

On Sun, Jun 06, 2021 at 01:59:10PM -0400, Tom Lane wrote:
> Matthias van de Meent <boekewurm+postgres@gmail.com> writes:
> > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >> However, I also found an autovacuum chewing 100% CPU, and it appears the
> >> problem is actually because autovacuum has locked a page of pg-statistic, and
> >> every other process then gets stuck waiting in the planner.  I checked a few
> >> and found these:
> 
> > My suspicion is that for some tuple on that page
> > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it
> > thinks should have been cleaned up by heap_page_prune, but isn't. This
> > would result in an infinite loop in lazy_scan_prune where the
> > condition on vacuumlazy.c:1800 will always be true, but the retry will
> > not do the job it's expected to do.
> 
> Since Justin's got a debugger on the process already, it probably
> wouldn't be too hard to confirm or disprove that theory by stepping
> through the code.

Breakpoint 2, HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fff7e4a9ca0, OldestXmin=926025113, buffer=buffer@entry=411)
atheapam_visibility.c:1163
 
1163    heapam_visibility.c: No such file or directory.
(gdb) fin
Run till exit from #0  HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fff7e4a9ca0, OldestXmin=926025113,
buffer=buffer@entry=411)at heapam_visibility.c:1163
 
lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1, page=page@entry=0x2aaaab54be00
"J\f",vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7fff7e4aae40)
 
    at vacuumlazy.c:1790
1790    vacuumlazy.c: No such file or directory.
Value returned is $23 = HEAPTUPLE_DEAD

        offnum = 6
        tuple = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data =
0x2aaaab54db28}
        res = HEAPTUPLE_DEAD

(gdb) p *itemid
$51 = {lp_off = 7464, lp_flags = 1, lp_len = 259}

(gdb) p *tuple->t_data
$54 = {t_choice = {t_heap = {t_xmin = 926014884, t_xmax = 926025112, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum =
{datum_len_= 926014884, datum_typmod = 926025112, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0,
 
      bi_lo = 1}, ip_posid = 1}, t_infomask2 = 49183, t_infomask = 9475, t_hoff = 32 ' ', t_bits = 0x2aaaab54db3f
"\377\377\177\004"}

lp_flags = LP_NORMAL 1 ??
t_infomask2 = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED + 31 atts
infomask = HEAP_UPDATED | HEAP_XMAX_COMMITTED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH | HEAP_HASNULL

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

08 June 2021, 11:54:41

On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > I'll leave the instance running for a little bit before restarting (or kill-9)
> > > in case someone requests more info.
> >
> > How about dumping the page image out, and sharing it with the list?
> > This procedure should work fine from gdb:
> >
> >
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB
>
> > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> > imagine that you have the instance already stuck in an infinite loop,
> > so what we'll probably see from the page image is the page after the
> > first prune and another no-progress prune.
>
> The cluster was again rejecting with "too many clients already".
>
> I was able to open a shell this time, but it immediately froze when I tried to
> tab complete "pg_stat_acti"...
>
> I was able to dump the page image, though - attached.  I can send you its
> "data" privately, if desirable.  I'll also try to step through this.

Could you attach a dump of lazy_scan_prune's vacrel, all the global
visibility states (GlobalVisCatalogRels, and possibly
GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels),  and
heap_page_prune's PruneState?

Additionally, the locals of lazy_scan_prune (more specifically, the
'offnum' when it enters heap_page_prune) would also be appreciated, as
it helps indicate the tuple.

I've been looking at whatever might have done this, and I'm currently
stuck on lacking information in GlobalVisCatalogRels and the
PruneState.

One curiosity that I did notice is that the t_xmax of the problematic
tuples has been exactly one lower than the OldestXmin. Not weird, but
a curiosity.

With regards,

Matthias van de Meent.

PS. Attached a few of my current research notes, which are mainly
comparisons between heap_prune_satisfies_vacuum and
HeapTupleSatisfiesVacuum.

Attachment

research_notes.txt

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 12:11:36

On Tue, Jun 08, 2021 at 01:54:41PM +0200, Matthias van de Meent wrote:
> On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote:
> >
> > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > I'll leave the instance running for a little bit before restarting (or kill-9)
> > > > in case someone requests more info.
> > >
> > > How about dumping the page image out, and sharing it with the list?
> > > This procedure should work fine from gdb:
> > >
> > >
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB
> >
> > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> > > imagine that you have the instance already stuck in an infinite loop,
> > > so what we'll probably see from the page image is the page after the
> > > first prune and another no-progress prune.
> >
> > The cluster was again rejecting with "too many clients already".
> >
> > I was able to open a shell this time, but it immediately froze when I tried to
> > tab complete "pg_stat_acti"...
> >
> > I was able to dump the page image, though - attached.  I can send you its
> > "data" privately, if desirable.  I'll also try to step through this.
> 
> Could you attach a dump of lazy_scan_prune's vacrel, all the global
> visibility states (GlobalVisCatalogRels, and possibly
> GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels),  and
> heap_page_prune's PruneState?

(gdb) p *vacrel
$56 = {rel = 0x7fe636faed28, indrels = 0x12b4440, nindexes = 1, do_index_vacuuming = true, do_index_cleanup = true,
do_failsafe= false, bstrategy = 0x1210340, lps = 0x0, old_rel_pages = 81, old_live_tuples = 1100, relfrozenxid =
921613998,relminmxid = 53878631, OldestXmin = 926025113, FreezeLimit = 876025113, MultiXactCutoff = 49085856,
relnamespace= 0x12b4460 "pg_catalog", relname = 0x12b4488 "pg_statistic", indname = 0x0, blkno = 1, offnum = 6, phase =
VACUUM_ERRCB_PHASE_SCAN_HEAP,dead_tuples = 0x127a980, rel_pages = 81, scanned_pages = 2, pinskipped_pages = 0,
frozenskipped_pages= 0, tupcount_pages = 2, pages_removed = 0, lpdead_item_pages = 1, nonempty_pages = 1,
lock_waiter_detected= false, new_rel_tuples = 0, new_live_tuples = 0, indstats = 0x12b4568, num_index_scans = 0,
tuples_deleted= 0, lpdead_items = 3, new_dead_tuples = 0, num_tuples = 14, live_tuples = 14}
 

(gdb) p GlobalVisCatalogRels
$57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}}
(gdb) p GlobalVisSharedRels
$58 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}}
(gdb) p GlobalVisDataRels
$59 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025113}}
(gdb) p GlobalVisTempRels
$60 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025113}}

I don't know when you want prstate from, but here it is at some point:

(gdb) p *prstate
$77 = {rel = 0x7fe636faed28, vistest = 0xe7bcc0 <GlobalVisCatalogRels>, old_snap_ts = 0, old_snap_xmin = 0,
old_snap_used= false, new_prune_xid = 0, latestRemovedXid = 0, nredirected = 0, ndead = 0, nunused = 0, 
 

> Additionally, the locals of lazy_scan_prune (more specifically, the
> 'offnum' when it enters heap_page_prune) would also be appreciated, as
> it helps indicate the tuple.

Breakpoint 1, heap_page_prune (relation=relation@entry=0x7fe636faed28, buffer=buffer@entry=411,
vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0,
old_snap_ts=old_snap_ts@entry=0,
    report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x12b433c) at pruneheap.c:225
225     pruneheap.c: No such file or directory.
(gdb) up
#1  0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1,
page=page@entry=0x2aaaab54be00"J\f", vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>,
 
    prunestate=prunestate@entry=0x7fff7e4aae40) at vacuumlazy.c:1712
1712    vacuumlazy.c: No such file or directory.
(gdb) info locals
rel = 0x7fe636faed28
offnum = 6
maxoff = 28
itemid = 0x2aaaab54be2c
tuple = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data =
0x2aaaab54db28}
res = <optimized out>
tuples_deleted = 0
lpdead_items = 0
new_dead_tuples = 0
num_tuples = 0
live_tuples = 0
nfrozen = 0

Maybe you need to know that this is also returning RECENTLY_DEAD.

Breakpoint 4, heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7fff7e4a9180, tup=tup@entry=0x7fff7e4a8f10,
buffer=buffer@entry=411)at pruneheap.c:423
 
423     in pruneheap.c
(gdb)
Run till exit from #0  heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7fff7e4a9180, tup=tup@entry=0x7fff7e4a8f10,
buffer=buffer@entry=411)at pruneheap.c:423
 
0x00000000004fa887 in heap_prune_chain (prstate=0x7fff7e4a9180, rootoffnum=6, buffer=411) at pruneheap.c:560
560     in pruneheap.c
Value returned is $72 = HEAPTUPLE_RECENTLY_DEAD

tup = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data =
0x2aaaab54db28}
(gdb) p * htup
$82 = {t_choice = {t_heap = {t_xmin = 926014884, t_xmax = 926025112, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum =
{datum_len_= 926014884, datum_typmod = 926025112, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 1},
ip_posid= 1}, t_infomask2 = 49183, t_infomask = 9475, t_hoff = 32 ' ', t_bits = 0x2aaaab54db3f "\377\377\177\004"}
 

-- 
Justin
System Administrator
Telsasoft
+1-952-707-8581

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

08 June 2021, 12:27:14

On Tue, 8 Jun 2021 at 14:11, Justin Pryzby <pryzby@telsasoft.com> wrote:
>
> On Tue, Jun 08, 2021 at 01:54:41PM +0200, Matthias van de Meent wrote:
> > On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote:
> > >
> > > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote:
> > > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > > > I'll leave the instance running for a little bit before restarting (or kill-9)
> > > > > in case someone requests more info.
> > > >
> > > > How about dumping the page image out, and sharing it with the list?
> > > > This procedure should work fine from gdb:
> > > >
> > > >
https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB
> > >
> > > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I
> > > > imagine that you have the instance already stuck in an infinite loop,
> > > > so what we'll probably see from the page image is the page after the
> > > > first prune and another no-progress prune.
> > >
> > > The cluster was again rejecting with "too many clients already".
> > >
> > > I was able to open a shell this time, but it immediately froze when I tried to
> > > tab complete "pg_stat_acti"...
> > >
> > > I was able to dump the page image, though - attached.  I can send you its
> > > "data" privately, if desirable.  I'll also try to step through this.
> >
> > Could you attach a dump of lazy_scan_prune's vacrel, all the global
> > visibility states (GlobalVisCatalogRels, and possibly
> > GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels),  and
> > heap_page_prune's PruneState?
>
> (gdb) p *vacrel
> $56 = {... OldestXmin = 926025113, ...}
>
> (gdb) p GlobalVisCatalogRels
> $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}}

This maybe_needed is older than the OldestXmin, which indeed gives us
this problematic behaviour:

heap_prune_satisfies_vacuum considers 1 more transaction to be
unvacuumable, and thus indeed won't vacuum that tuple that
HeapTupleSatisfiesVacuum does want to be vacuumed.

The new open question is now: Why is
GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC
GLobalVisCatalogRels->maybe_needed is constructed from the same
ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is
returned to be used in vacrel->OldestXmin.


> Maybe you need to know that this is also returning RECENTLY_DEAD.

I had expected that, but good to have confirmation.

Thanks for the information!


With regards,

Matthias van de Meent.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 13:15:08

On Tue, Jun 08, 2021 at 02:27:14PM +0200, Matthias van de Meent wrote:
> Thanks for the information!

I created an apparently-complete core file by first doing this:
| echo 127 |sudo tee /proc/22591/coredump_filter

*and updated wiki:Developer_FAQ to work with huge pages

I'm planning to kill the process shortly if nobody asks for anything else.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Alvaro Herrera

Date:

08 June 2021, 16:06:02

On 2021-Jun-06, Justin Pryzby wrote:

> However, I also found an autovacuum chewing 100% CPU, and it appears the
> problem is actually because autovacuum has locked a page of pg-statistic, and
> every other process then gets stuck waiting in the planner.  I checked a few
> and found these:

> [...]

Hmm ... I wonder if this could be related to commits d9d076222f5b,
c98763bf51bf, etc.  I don't have any connecting thoughts other than the
tuple visibility code being involved.  Do you see any procs with the
PROC_IN_SAFE_IC flag set?

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

08 June 2021, 17:17:11

Hi,

On 2021-06-08 14:27:14 +0200, Matthias van de Meent wrote:
> heap_prune_satisfies_vacuum considers 1 more transaction to be
> unvacuumable, and thus indeed won't vacuum that tuple that
> HeapTupleSatisfiesVacuum does want to be vacuumed.
> 
> The new open question is now: Why is
> GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC
> GLobalVisCatalogRels->maybe_needed is constructed from the same
> ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is
> returned to be used in vacrel->OldestXmin.

The horizon used by pruning is only updated once per transaction (well,
approximately). What presumably is happening is that the retry loop is
retrying, without updating the horizon, therefore the same thing is
happening over and over again?

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 17:34:04

On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> On 2021-Jun-06, Justin Pryzby wrote:
> 
> > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > problem is actually because autovacuum has locked a page of pg-statistic, and
> > every other process then gets stuck waiting in the planner.  I checked a few
> > and found these:
> 
> Hmm ... I wonder if this could be related to commits d9d076222f5b,
> c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> tuple visibility code being involved.  Do you see any procs with the
> PROC_IN_SAFE_IC flag set?

Can you give me a hint how to do that from a corefile ?

I got this far:
(gdb) p MyProc->links
$13 = {prev = 0x0, next = 0x0}

I do have a reindex script which runs CIC, and it does looks suspicious.

<<Mon Jun  7 22:00:54 CDT 2021: starting db=ts
...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription_rel: pg_subscription_rel_srrelid_srsubid_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_oid_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_subname_index(reindex system)...
Mon Jun  7 22:01:16 CDT 2021: ts: pg_subscription: pg_toast.pg_toast_6100_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_statistic_ext_data_stxoid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_toast.pg_toast_3429_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_name_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_oid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_relid_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_toast.pg_toast_3381_index(reindex system)...
Mon Jun  7 22:01:17 CDT 2021: ts: pg_statistic: pg_statistic_relid_att_inh_index(reindex system)...
Mon Jun  7 22:02:56 CDT 2021: ts: pg_statistic: pg_toast.pg_toast_2619_index(reindex system)...
Mon Jun  7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex
non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew
Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)...
Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex
non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew
Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
Tue Jun  8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)...
Tue Jun  8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)...
...
Tue Jun  8 01:21:20 CDT 2021: ts: pg_aggregate: pg_aggregate_fnoid_index(reindex system)...
Tue Jun  8 01:21:20 CDT 2021: ts: pg_aggregate: pg_toast.pg_toast_2600_index(reindex system)...
Tue Jun  8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew
Tue Jun  8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew
Tue Jun  8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)...
Tue Jun  8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew
Tue Jun  8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)...
ERROR:  canceling statement due to statement timeout
reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew

=> It's strange that these timed out, since the statio tables are less than
15MB, and permissions and patchfiles are both under 100kB.

And it seems to say that it time out after less than 1sec.

They're running this:
| PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"

I've killed it a little bit ago, but I should've saved the start time of the
autovacuum.  I found this:

#5  heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612
starttime = 676436464463888
ru0 = {tv = {tv_sec = 1623121264, tv_usec = 463887}, ru = {ru_utime = {tv_sec = 0, tv_usec = 77418}, ru_stime = {tv_sec
=0, tv_usec = 13866}, {ru_maxrss = 7440, __ru_maxrss_word = 7440}, {ru_ixrss = 0, __ru_ixrss_word = 0}, {

      ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1826, __ru_minflt_word =
1826},{ru_majflt = 1, __ru_majflt_word = 1}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2008,

      __ru_inblock_word = 2008}, {ru_oublock = 192, __ru_oublock_word = 192}, {ru_msgsnd = 0, __ru_msgsnd_word = 0},
{ru_msgrcv= 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 29,

      __ru_nvcsw_word = 29}, {ru_nivcsw = 9, __ru_nivcsw_word = 9}}}

$ date -d @1623121264
Mon Jun  7 22:01:04 CDT 2021

$ date -d '2000-01-01 UTC + 676436464seconds'
Mon Jun  7 22:01:04 CDT 2021

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Alvaro Herrera

Date:

08 June 2021, 18:01:51

On 2021-Jun-08, Justin Pryzby wrote:

> On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> > On 2021-Jun-06, Justin Pryzby wrote:
> > 
> > > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > > problem is actually because autovacuum has locked a page of pg-statistic, and
> > > every other process then gets stuck waiting in the planner.  I checked a few
> > > and found these:
> > 
> > Hmm ... I wonder if this could be related to commits d9d076222f5b,
> > c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> > tuple visibility code being involved.  Do you see any procs with the
> > PROC_IN_SAFE_IC flag set?
> 
> Can you give me a hint how to do that from a corefile ?

(gdb) set $i=0
(gdb) set $total = ProcGlobal->allProcCount
(gdb) while($i<$total)
 >print ProcGlobal->allProcs[$i++]->statusFlags
 >end

-- 
Álvaro Herrera       Valdivia, Chile
"In fact, the basic problem with Perl 5's subroutines is that they're not
crufty enough, so the cruft leaks out into user-defined code instead, by
the Conservation of Cruft Principle."  (Larry Wall, Apocalypse 6)

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

08 June 2021, 18:40:31

Reminds me of the other bug that you also reported about a year ago, Justin - which was never fixed. The one with the duplicate tids from a cic (see pg 14 open item).

Is this essentially the same environment as the one that led to your other bug report?

Peter Geoghegan
(Sent from my phone)

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 18:45:14

On Tue, Jun 08, 2021 at 02:01:51PM -0400, Alvaro Herrera wrote:
> On 2021-Jun-08, Justin Pryzby wrote:
> 
> > On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> > > On 2021-Jun-06, Justin Pryzby wrote:
> > > 
> > > > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > > > problem is actually because autovacuum has locked a page of pg-statistic, and
> > > > every other process then gets stuck waiting in the planner.  I checked a few
> > > > and found these:
> > > 
> > > Hmm ... I wonder if this could be related to commits d9d076222f5b,
> > > c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> > > tuple visibility code being involved.  Do you see any procs with the
> > > PROC_IN_SAFE_IC flag set?
> > 
> > Can you give me a hint how to do that from a corefile ?
> 
> (gdb) set $i=0
> (gdb) set $total = ProcGlobal->allProcCount
> (gdb) while($i<$total)
>  >print ProcGlobal->allProcs[$i++]->statusFlags
>  >end

They're all zero except for:

$201 = 1 '\001'
$202 = 3 '\003'
$203 = 1 '\001'

src/include/storage/proc.h-#define              PROC_IS_AUTOVACUUM      0x01    /* is it an autovac worker? */
src/include/storage/proc.h-#define              PROC_IN_VACUUM          0x02    /* currently running lazy vacuum */
src/include/storage/proc.h:#define              PROC_IN_SAFE_IC         0x04    /* currently running CREATE INDEX

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 19:27:11

On Tue, Jun 08, 2021 at 12:34:04PM -0500, Justin Pryzby wrote:
> On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote:
> > On 2021-Jun-06, Justin Pryzby wrote:
> > 
> > > However, I also found an autovacuum chewing 100% CPU, and it appears the
> > > problem is actually because autovacuum has locked a page of pg-statistic, and
> > > every other process then gets stuck waiting in the planner.  I checked a few
> > > and found these:
> > 
> > Hmm ... I wonder if this could be related to commits d9d076222f5b,
> > c98763bf51bf, etc.  I don't have any connecting thoughts other than the
> > tuple visibility code being involved.  Do you see any procs with the
> > PROC_IN_SAFE_IC flag set?
> 
> I do have a reindex script which runs CIC, and it does looks suspicious.
> 
> <<Mon Jun  7 22:00:54 CDT 2021: starting db=ts
> ...
> Mon Jun  7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex
non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew
> Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)...
> Mon Jun  7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex
non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew
> Tue Jun  8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)...
> Tue Jun  8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew
> Tue Jun  8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew
> Tue Jun  8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)...
> Tue Jun  8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew
> Tue Jun  8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)...
> ERROR:  canceling statement due to statement timeout
> reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew
> 
> => It's strange that these timed out, since the statio tables are less than
> 15MB, and permissions and patchfiles are both under 100kB.
> 
> And it seems to say that it time out after less than 1sec.

Oops, no: it timed out after 3600sec, as requested.

> They're running this:
> | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
> And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"
...
> $ date -d @1623121264
> Mon Jun  7 22:01:04 CDT 2021

Which is probably because the reindex was waiting for the vacuum process to
finish (or maybe waiting on the page that vacuum had locked?).

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Alvaro Herrera

Date:

08 June 2021, 20:51:57

On 2021-Jun-08, Justin Pryzby wrote:

> They're all zero except for:
> 
> $201 = 1 '\001'
> $202 = 3 '\003'
> $203 = 1 '\001'
> 
> src/include/storage/proc.h-#define              PROC_IS_AUTOVACUUM      0x01    /* is it an autovac worker? */
> src/include/storage/proc.h-#define              PROC_IN_VACUUM          0x02    /* currently running lazy vacuum */
> src/include/storage/proc.h:#define              PROC_IN_SAFE_IC         0x04    /* currently running CREATE INDEX

Ah okay, not related then.  Thanks for checking.

-- 
Álvaro Herrera                            39°49'30"S 73°17'W

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

08 June 2021, 20:52:40

On Tue, Jun 8, 2021 at 12:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > They're running this:
> > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
> > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"
> ...
> > $ date -d @1623121264
> > Mon Jun  7 22:01:04 CDT 2021

Perhaps reindex was waiting on the VACUUM process to finish, while
VACUUM was (in effect) busy waiting on the REINDEX to finish. If the
bug is hard to reproduce then it might just be that the circumstances
that lead to livelock require that things line up exactly and the heap
page + XID level -- which I'd expect to be tricky to reproduce. As I
said upthread, I'm almost certain that the "goto retry" added by
commit 8523492d is a factor here -- that is what I mean by busy
waiting inside VACUUM. It's possible that busy waiting like this
happens much more often than an actual undetected deadlock/livelock.
We only expect to "goto retry" in the event of a concurrently aborting
transaction.

The other bug that you reported back in July of last year [1] (which
involved a "REINDEX INDEX pg_class_tblspc_relfilenode_index") was
pretty easy to recreate, just by running the REINDEX in a tight loop.
Could you describe how tricky it is to repro this issue now?

If you instrument the "goto retry" code added to lazy_scan_prune() by
commit 8523492d, then you might notice that it is hit in contexts that
it was never intended to work with. If you can reduce reproducing the
problem to reproducing hitting that goto in the absence of an aborted
transaction, then it might be a lot easier to produce a simple repro.
The livelock/deadlock is probably nothing more than the worst
consequence of the same issue, and so may not need to be reproduced
directly to fix the issue.

[1] https://www.postgresql.org/message-id/CAH2-WzkjjCoq5Y4LeeHJcjYJVxGm3M3SAWZ0%3D6J8K1FPSC9K0w%40mail.gmail.com
-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

08 June 2021, 21:23:37

On Tue, Jun 08, 2021 at 01:52:40PM -0700, Peter Geoghegan wrote:
> On Tue, Jun 8, 2021 at 12:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > > They're running this:
> > > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i"
> > > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad"
> > ...
> > > $ date -d @1623121264
> > > Mon Jun  7 22:01:04 CDT 2021
> 
> Perhaps reindex was waiting on the VACUUM process to finish, while
> VACUUM was (in effect) busy waiting on the REINDEX to finish.

But when the reindex exited, the vacuum kept spinning until I sent SIGABRT 12
hours later.

> The other bug that you reported back in July of last year [1] (which
> involved a "REINDEX INDEX pg_class_tblspc_relfilenode_index") was
> pretty easy to recreate, just by running the REINDEX in a tight loop.
> Could you describe how tricky it is to repro this issue now?

I didn't try to reproduce it, but now hit it twice in 3 days.
(Actuallly, I did try to reproduce it, by running tight loops around
vacuum/analyze pg_statistic, which didn't work.  Maybe because reindex is
what's important.)

I mentioned that we've been running pg14b1 since 2021-05-20. So it ran fine for
13 days before breaking in an obvious way.

OH - in the first instance, I recorded the stuck process, but not its
timestamp.  It looks like that autovacuum process *also* started right after
10pm, which is when the reindex job starts.  So it seems like REINDEX may
trigger this pretty consistently:

(gdb) frame 4
#4  heap_vacuum_rel (rel=0x7f0349466d28, params=0x1c77b7c, bstrategy=<optimized out>) at vacuumlazy.c:612
612     vacuumlazy.c: No such file or directory.
(gdb) info locals
starttime = 676177375524485

$ date -d '2000-01-01 UTC + 676177375seconds'
Fri Jun  4 22:02:55 CDT 2021

> If you instrument the "goto retry" code added to lazy_scan_prune() by
> commit 8523492d, then you might notice that it is hit in contexts that
> it was never intended to work with. If you can reduce reproducing the
> problem to reproducing hitting that goto in the absence of an aborted
> transaction, then it might be a lot easier to produce a simple repro.

I'm not sure what you're suggesting ?  Maybe I should add some NOTICES there.

I'm not sure why/if pg_statistic is special, but I guess when analyze happens,
it gets updated, and eventually processed by autovacuum.

The main table here is a partitioned table which receives UPDATEs which moves
tuples into a different partition (probably more often than what's
recommended).

 autovacuum_analyze_threshold      | 2
 autovacuum_analyze_scale_factor   | 0.005
 autovacuum_vacuum_scale_factor    | 0.005
 log_autovacuum_min_duration       | 9000
 checkpoint_timeout                | 60
 wal_level                         | minimal

In pg14, the parent table is auto-analyzed.

-- 
Justin

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

08 June 2021, 21:38:37

On Tue, Jun 8, 2021 at 2:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I'm not sure what you're suggesting ?  Maybe I should add some NOTICES there.

Here is one approach that might work: Can you check if the assertion
added by the attached patch fails very quickly with your test case?

This does nothing more than trigger an assertion failure in the event
of retrying a second time for any given heap page. Theoretically that
could happen without there being any bug -- in principle we might have
to retry several times for the same page. In practice the chances of
it happening even once are vanishingly low, though -- so two times
strongly signals a bug. It was quite hard to hit the "goto restart"
even once during my testing. There is still no test coverage for the
line of code because it's so hard to hit.

If you find that the assertion is hit pretty quickly with the same
workload then you've all but reproduced the issue, probably in far
less time. And, if you know that there were no concurrently aborting
transactions then you can be 100% sure that you have reproduced the
issue -- this goto is only supposed to be executed when a transaction
that was in progress during the heap_page_prune() aborts after it
returns, but before we call HeapTupleSatisfiesVacuum() for one of the
aborted-xact tuples. It's supposed to be a super narrow thing.

> I'm not sure why/if pg_statistic is special, but I guess when analyze happens,
> it gets updated, and eventually processed by autovacuum.

pg_statistic is probably special, though only in a superficial way: it
is the system catalog that tends to be the most frequently vacuumed in
practice.

> In pg14, the parent table is auto-analyzed.

I wouldn't expect that to matter. The "ANALYZE portion" of the VACUUM
ANALYZE won't have started at the point that we get stuck.

-- 
Peter Geoghegan

Attachment

0001-Assert-that-restart-behavior-happens-once-only.patch

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

08 June 2021, 22:52:02

On Tue, Jun 8, 2021 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote:
> postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff,
t_bits,t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page'));

>  lp | lp_off | lp_flags | lp_len |  t_xmin   |  t_xmax   | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff |
         t_bits              | t_oid

>
----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+-------
>   1 |   1320 |        1 |    259 | 926025112 |         0 |        0 | (1,1)  |       32799 |      10499 |     32 |
11111111111111111111111000100000|

*** SNIP ***

>   6 |   7464 |        1 |    259 | 926014884 | 926025112 |        0 | (1,1)  |       49183 |       9475 |     32 |
11111111111111111111111000100000|

As I understand it from your remarks + gdb output from earlier [1],
the tuple at offset number 6 is the tuple that triggers the suspicious
"goto restart" here. There was a regular UPDATE (not a HOT UPDATE)
that produced a successor version on the same heap page -- which is lp
1. Here are the t_infomask details for both tuples:

lp 6: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_COMMITTED|HEAP_UPDATED
<-- points to (1,1)
lp 1: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_INVALID|HEAP_UPDATED
        <-- This is (1,1)

So if lp 1's xmin and lp 6's xmax XID/Xact committed (i.e., if XID
926025112 committed), why shouldn't HeapTupleSatisfiesVacuum() think
that lp 6 is DEAD (and not RECENTLY_DEAD)? You also say that
vacuumlazy.c's OldestXmin is 926025113, so it is hard to fault HTSV
here. The only way it could be wrong is if the hint bits were somehow
spuriously set, which seems unlikely to me.

[1] https://postgr.es/m/20210608113333.GC16435@telsasoft.com
-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

08 June 2021, 23:46:46

On Tue, Jun 8, 2021 at 5:27 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > (gdb) p *vacrel
> > $56 = {... OldestXmin = 926025113, ...}
> >
> > (gdb) p GlobalVisCatalogRels
> > $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}}
>
> This maybe_needed is older than the OldestXmin, which indeed gives us
> this problematic behaviour:

Good catch.

> heap_prune_satisfies_vacuum considers 1 more transaction to be
> unvacuumable, and thus indeed won't vacuum that tuple that
> HeapTupleSatisfiesVacuum does want to be vacuumed.

Following up from my email from an hour ago here. Since I have no
reason to suspect HeapTupleSatisfiesVacuum (per the earlier analysis),
this is very much starting to look like a
heap_prune_satisfies_vacuum() problem. And therefore likely a problem
in the snapshot scalability work.

Note that GlobalVisCatalogRels.maybe_needed is 926025112, which
doesn't match OldestXmin in VACUUM (that's 926025113). Though both
GlobalVisDataRels.definitely_needed and GlobalVisDataRels.maybe_needed
*are* 926025113, and therefore agree with VACUUM's OldestXmin. But
this is pg_statistic we're vacuuming, and so GlobalVisCatalogRels is
what matters.

> The new open question is now: Why is
> GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC
> GLobalVisCatalogRels->maybe_needed is constructed from the same
> ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is
> returned to be used in vacrel->OldestXmin.

Exactly.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Tom Lane

Date:

09 June 2021, 00:11:20

Peter Geoghegan <pg@bowt.ie> writes:
> On Tue, Jun 8, 2021 at 5:27 AM Matthias van de Meent
>>> (gdb) p GlobalVisCatalogRels
>>> $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}}

>> This maybe_needed is older than the OldestXmin, which indeed gives us
>> this problematic behaviour:

> Good catch.

I wonder if this is a variant of the problem shown at

https://www.postgresql.org/message-id/2591376.1621196582%40sss.pgh.pa.us

where maybe_needed was visibly quite insane.  This value is
less visibly insane, but it's still wrong.  It might be
interesting to try running this test case with the extra
assertions I proposed there, to try to narrow down where
it's going off the rails.

            regards, tom lane

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Justin Pryzby

Date:

09 June 2021, 00:18:18

On Tue, Jun 08, 2021 at 02:38:37PM -0700, Peter Geoghegan wrote:
> On Tue, Jun 8, 2021 at 2:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> > I'm not sure what you're suggesting ?  Maybe I should add some NOTICES there.
> 
> Here is one approach that might work: Can you check if the assertion
> added by the attached patch fails very quickly with your test case?

I reproduced the issue on a new/fresh cluster like this:

./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c log_autovacuum_min_duration=-1
psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON t(i);"
time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done&
time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done&

TRAP: FailedAssertion("restarts == 0", File: "vacuumlazy.c", Line: 1803, PID: 10367)
postgres: autovacuum worker postgres(ExceptionalCondition+0x99)[0x5633f3ad6b09]
postgres: autovacuum worker postgres(+0x1c0a37)[0x5633f36cca37]
postgres: autovacuum worker postgres(heap_vacuum_rel+0xfca)[0x5633f36cf75a]
postgres: autovacuum worker postgres(+0x305fed)[0x5633f3811fed]
postgres: autovacuum worker postgres(vacuum+0x61a)[0x5633f38137ea]
postgres: autovacuum worker postgres(+0x409dd3)[0x5633f3915dd3]
postgres: autovacuum worker postgres(+0x40ae46)[0x5633f3916e46]
postgres: autovacuum worker postgres(AutoVacuumUpdateDelay+0x0)[0x5633f3916f50]
postgres: autovacuum worker postgres(+0x41985b)[0x5633f392585b]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f085c591890]
/lib/x86_64-linux-gnu/libc.so.6(__select+0x17)[0x7f085bafaff7]
postgres: autovacuum worker postgres(+0x419d06)[0x5633f3925d06]
postgres: autovacuum worker postgres(PostmasterMain+0xcbb)[0x5633f39277bb]
postgres: autovacuum worker postgres(main+0x4d4)[0x5633f3660a14]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f085ba05b97]
postgres: autovacuum worker postgres(_start+0x2a)[0x5633f3660aba]
2021-06-08 19:10:36.875 CDT postmaster[13483] LOG:  server process (PID 10367) was terminated by signal 6: Aborted
2021-06-08 19:10:36.875 CDT postmaster[13483] DETAIL:  Failed process was running: autovacuum: VACUUM
pg_toast.pg_toast_2619
2021-06-08 19:10:36.875 CDT postmaster[13483] LOG:  terminating any other active server processes
WARNING:  terminating connection because of crash of another server process
DETAIL:  The postmaster has commanded this server process to roll back the current transaction and exit, because
anotherserver process exited abnormally and possibly corrupted shared memory.

HINT:  In a moment you should be able to reconnect to the database and repeat your command.
server closed the connection unexpectedly
        This probably means the server terminated abnormally
        before or while processing the request.
connection to server was lost

real    0m14.477s

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

09 June 2021, 00:44:15

On Tue, Jun 8, 2021 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
> I wonder if this is a variant of the problem shown at
>
> https://www.postgresql.org/message-id/2591376.1621196582%40sss.pgh.pa.us
>
> where maybe_needed was visibly quite insane.  This value is
> less visibly insane, but it's still wrong.  It might be
> interesting to try running this test case with the extra
> assertions I proposed there, to try to narrow down where
> it's going off the rails.

Oh yeah. Justin didn't say anything about upgrading using pg_upgrade
(just something about upgrading the kernel).

Did you use pg_upgrade here, Justin?

I'm going to see Andres in person in 20 minutes time (for the first
time in over a year!). I'll discuss this issue with him.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

09 June 2021, 00:47:28

On Tue, Jun 8, 2021 at 5:18 PM Justin Pryzby <pryzby@telsasoft.com> wrote:
> I reproduced the issue on a new/fresh cluster like this:
>
> ./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c
log_autovacuum_min_duration=-1
> psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON
t(i);"
> time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done&
> time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done&

I don't have time to try this out myself today, but offhand I'm pretty
confident that this is sufficient to reproduce the underlying bug
itself. And if that's true then I guess it can't have anything to do
with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the
apparent similarity.

Thanks
-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Masahiko Sawada

Date:

09 June 2021, 02:26:12

On Wed, Jun 9, 2021 at 2:17 AM Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-06-08 14:27:14 +0200, Matthias van de Meent wrote:
> > heap_prune_satisfies_vacuum considers 1 more transaction to be
> > unvacuumable, and thus indeed won't vacuum that tuple that
> > HeapTupleSatisfiesVacuum does want to be vacuumed.
> >
> > The new open question is now: Why is
> > GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC
> > GLobalVisCatalogRels->maybe_needed is constructed from the same
> > ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is
> > returned to be used in vacrel->OldestXmin.
>
> The horizon used by pruning is only updated once per transaction (well,
> approximately). What presumably is happening is that the retry loop is
> retrying, without updating the horizon, therefore the same thing is
> happening over and over again?

When we calculated vacrel->OldestXmin in vacuum_set_xid_limits(),
vacrel->OldestXmin and GlogalVisCatalogRels->maybe_needed must have
been the same value. That is, those were 926025113. After that,
vacrel->OldestXmin is not changed throughout lazy vacuum whereas
GlobalVisCatalogRels->maybe_needed could be updated (right?). Is there
any chance that GlobalVisCatalogRels->maybe_needed goes backward? For
example, a case like where when re-calculating
catalog_oldest_nonremovable (i.g. updating
GlobalVisCatalogRels->maybe_needed) we take a process into account who
has an old XID but was ignored last time for some reason (e.g., its
statusFlag).

Regards,

-- 
Masahiko Sawada
EDB:  https://www.enterprisedb.com/

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Michael Paquier

Date:

09 June 2021, 02:42:09

On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote:
> I don't have time to try this out myself today, but offhand I'm pretty
> confident that this is sufficient to reproduce the underlying bug
> itself. And if that's true then I guess it can't have anything to do
> with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the
> apparent similarity.

Agreed.  It took me a couple of minutes to get autovacuum to run in an
infinite loop with a standalone instance.  Nice catch, Justin!
--
Michael

Attachment

signature.asc

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

09 June 2021, 15:42:34

On Wed, 9 Jun 2021 at 04:42, Michael Paquier <michael@paquier.xyz> wrote:
>
> On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote:
> > I don't have time to try this out myself today, but offhand I'm pretty
> > confident that this is sufficient to reproduce the underlying bug
> > itself. And if that's true then I guess it can't have anything to do
> > with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the
> > apparent similarity.
>
> Agreed.  It took me a couple of minutes to get autovacuum to run in an
> infinite loop with a standalone instance.  Nice catch, Justin!

I believe that I've found the culprit:
GetOldestNonRemovableTransactionId(rel) does not use the exact same
conditions for returning OldestXmin as GlobalVisTestFor(rel) does.
This results in different minimal XIDs, and subsequently this failure.

The attached patch fixes this inconsistency, and adds a set of asserts
to ensure that GetOldesNonRemovableTransactionId is equal to the
maybe_needed of the GlobalVisTest of that relation, plus some at
GlobalVisUpdateApply such that it will fail whenever it is called with
arguments that would move the horizons in the wrong direction. Note
that there was no problem in GlobalVisUpdateApply, but it helped me
determine that that part was not the source of the problem, and I
think that having this safeguard is a net-positive.

Another approach might be changing GlobalVisTestFor(rel) instead to
reflect the conditions in GetOldestNonRemovableTransactionId.

With attached prototype patch, I was unable to reproduce the
problematic case in 10 minutes. Without, I got the problematic
behaviour in seconds.

With regards,

Matthias

Attachment

v1-0001-Fix-a-bug-in-GetOldestNonRemovableTransactionId.patch

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

09 June 2021, 18:45:06

Hi,

Good find!

On 2021-06-09 17:42:34 +0200, Matthias van de Meent wrote:
> I believe that I've found the culprit:
> GetOldestNonRemovableTransactionId(rel) does not use the exact same
> conditions for returning OldestXmin as GlobalVisTestFor(rel) does.
> This results in different minimal XIDs, and subsequently this failure.

Specifically, the issue is that it uses the innocuous looking

    else if (RelationIsAccessibleInLogicalDecoding(rel))
        return horizons.catalog_oldest_nonremovable;

but that's not sufficient, because

#define RelationIsAccessibleInLogicalDecoding(relation) \
    (XLogLogicalInfoActive() && \
     RelationNeedsWAL(relation) && \
     (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))

it is never true if wal_level < logical. So what it is missing is the
IsCatalogRelation(rel) || bit.


> The attached patch fixes this inconsistency

I think I prefer applying the fix and the larger changes separately.


> Another approach might be changing GlobalVisTestFor(rel) instead to
> reflect the conditions in GetOldestNonRemovableTransactionId.

No, that'd not be correct, afaict.

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

09 June 2021, 20:45:32

On Wed, Jun 9, 2021 at 11:45 AM Andres Freund <andres@anarazel.de> wrote:
> Good find!

+1

> > The attached patch fixes this inconsistency
>
> I think I prefer applying the fix and the larger changes separately.

I wonder if it's worth making the goto inside lazy_scan_prune verify
that the heap tuple matches what we expect. I'm sure that we would
have found this issue far sooner if that had been in place already.
Though I'm less sure of how much value adding such a check has now.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

10 June 2021, 15:20:47

On Wed, 9 Jun 2021 at 20:45, Andres Freund <andres@anarazel.de> wrote:
>
> Specifically, the issue is that it uses the innocuous looking
>
>         else if (RelationIsAccessibleInLogicalDecoding(rel))
>                 return horizons.catalog_oldest_nonremovable;
>
> but that's not sufficient, because
>
> #define RelationIsAccessibleInLogicalDecoding(relation) \
>         (XLogLogicalInfoActive() && \
>          RelationNeedsWAL(relation) && \
>          (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation)))
>
> it is never true if wal_level < logical. So what it is missing is the
> IsCatalogRelation(rel) || bit.

Correct.

> > The attached patch fixes this inconsistency
>
> I think I prefer applying the fix and the larger changes separately.

Feel free to change anything in that patch, it was a prototype, or
give me a notice if you want me to split the patch.

> > Another approach might be changing GlobalVisTestFor(rel) instead to
> > reflect the conditions in GetOldestNonRemovableTransactionId.
>
> No, that'd not be correct, afaict.

Allright, I wasn't sure of that myself.

With regards,

Matthias van de Meent.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

10 June 2021, 15:49:05

On Wed, 9 Jun 2021 at 22:45, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Wed, Jun 9, 2021 at 11:45 AM Andres Freund <andres@anarazel.de> wrote:
> > Good find!
>
> +1
>
> > > The attached patch fixes this inconsistency
> >
> > I think I prefer applying the fix and the larger changes separately.
>
> I wonder if it's worth making the goto inside lazy_scan_prune verify
> that the heap tuple matches what we expect. I'm sure that we would
> have found this issue far sooner if that had been in place already.
> Though I'm less sure of how much value adding such a check has now.

Could you elaborate on what this "matches what we expect" entails?

Apart from this, I'm also quite certain that the goto-branch that
created this infinite loop should have been dead code: In a correctly
working system, the GlobalVis*Rels should always be at least as strict
as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
can be updated (i.e. move their horizon forward) during the vacuum. As
such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
that also satisifies the condition of HeapTupleSatisfiesVacuum. That
is, unless we're also going to change code to update / move forward
vacrel->OldestXmin in lazy_scan_prune between the HPSV call and the
loop with HTSV.

With regards,

Matthias van de Meent

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

10 June 2021, 16:03:06

On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> Could you elaborate on what this "matches what we expect" entails?
>
> Apart from this, I'm also quite certain that the goto-branch that
> created this infinite loop should have been dead code: In a correctly
> working system, the GlobalVis*Rels should always be at least as strict
> as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> can be updated (i.e. move their horizon forward) during the vacuum. As
> such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> that also satisifies the condition of HeapTupleSatisfiesVacuum.

It's true that these two similar functions should be in perfect
agreement in general (given the same OldestXmin). That in itself
doesn't mean that they must always agree about a tuple in practice,
when they're called in turn inside lazy_scan_prune(). In particular,
nothing stops a transaction that was in progress to
heap_prune_satisfies_vacuum (when it saw some tuples it inserted)
concurrently aborting. That will render the same tuples fully DEAD
inside HeapTupleSatisfiesVacuum(). So we need to restart using the
goto purely to cover that case. See the commit message of commit
8523492d4e3.

By "matches what we expect", I meant "involves a just-aborted
transaction". We could defensively verify that the inserting
transaction concurrently aborted at the point of retrying/calling
heap_page_prune() a second time. If there is no aborted transaction
involved (as was the case with this bug), then we can be confident
that something is seriously broken.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

10 June 2021, 16:57:08

On Thu, 10 Jun 2021 at 18:03, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > Could you elaborate on what this "matches what we expect" entails?
> >
> > Apart from this, I'm also quite certain that the goto-branch that
> > created this infinite loop should have been dead code: In a correctly
> > working system, the GlobalVis*Rels should always be at least as strict
> > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> > can be updated (i.e. move their horizon forward) during the vacuum. As
> > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> > that also satisifies the condition of HeapTupleSatisfiesVacuum.
>
> It's true that these two similar functions should be in perfect
> agreement in general (given the same OldestXmin). That in itself
> doesn't mean that they must always agree about a tuple in practice,
> when they're called in turn inside lazy_scan_prune(). In particular,
> nothing stops a transaction that was in progress to
> heap_prune_satisfies_vacuum (when it saw some tuples it inserted)
> concurrently aborting. That will render the same tuples fully DEAD
> inside HeapTupleSatisfiesVacuum(). So we need to restart using the
> goto purely to cover that case. See the commit message of commit
> 8523492d4e3.

I totally overlooked that HeapTupleSatisfiesVacuumHorizon does the
heavyweight XID validation and does return HEAPTUPLE_DEAD in those
recently rolled back cases. Thank you for reminding me.

> By "matches what we expect", I meant "involves a just-aborted
> transaction". We could defensively verify that the inserting
> transaction concurrently aborted at the point of retrying/calling
> heap_page_prune() a second time. If there is no aborted transaction
> involved (as was the case with this bug), then we can be confident
> that something is seriously broken.

I believe there are more cases than only the rolled back case, but
checking for those cases would potentially help, yes.

With regards,

Matthias van de Meent.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

10 June 2021, 17:07:36

On Thu, Jun 10, 2021 at 9:57 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> > By "matches what we expect", I meant "involves a just-aborted
> > transaction". We could defensively verify that the inserting
> > transaction concurrently aborted at the point of retrying/calling
> > heap_page_prune() a second time. If there is no aborted transaction
> > involved (as was the case with this bug), then we can be confident
> > that something is seriously broken.
>
> I believe there are more cases than only the rolled back case, but
> checking for those cases would potentially help, yes.

Why do you believe that there are other cases?

I'm not aware of any case that causes lazy_scan_prune() to retry using
the goto, other than the aborted transaction case I described
(excluding the bug that you diagnosed, which was of course never
supposed to happen). If it really is possible to observe a retry for
any other reason then I'd very much like to know all the details - it
might well signal a distinct bug of the same general variety.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

10 June 2021, 17:09:50

Hi,

On 2021-06-10 17:49:05 +0200, Matthias van de Meent wrote:
> Apart from this, I'm also quite certain that the goto-branch that
> created this infinite loop should have been dead code: In a correctly
> working system, the GlobalVis*Rels should always be at least as strict
> as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels
> can be updated (i.e. move their horizon forward) during the vacuum. As
> such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple
> that also satisifies the condition of HeapTupleSatisfiesVacuum. That
> is, unless we're also going to change code to update / move forward
> vacrel->OldestXmin in lazy_scan_prune between the HPSV call and the
> loop with HTSV.

Consider the case of a transaction that inserted a row aborting. That
tuple will be "fully dead" regardless of any xid horizons.

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

10 June 2021, 17:29:29

On Thu, 10 Jun 2021 at 19:07, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jun 10, 2021 at 9:57 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > > By "matches what we expect", I meant "involves a just-aborted
> > > transaction". We could defensively verify that the inserting
> > > transaction concurrently aborted at the point of retrying/calling
> > > heap_page_prune() a second time. If there is no aborted transaction
> > > involved (as was the case with this bug), then we can be confident
> > > that something is seriously broken.
> >
> > I believe there are more cases than only the rolled back case, but
> > checking for those cases would potentially help, yes.
>
> Why do you believe that there are other cases?
>
> I'm not aware of any case that causes lazy_scan_prune() to retry using
> the goto, other than the aborted transaction case I described
> (excluding the bug that you diagnosed, which was of course never
> supposed to happen). If it really is possible to observe a retry for
> any other reason then I'd very much like to know all the details - it
> might well signal a distinct bug of the same general variety.

I see one exit for HEAPTUPLE_DEAD on a potentially recently committed
xvac (?), and we might also check against recently committed
transactions if xmin == xmax, although apparently that is not
implemented right now.

With regards,

Matthias van de Meent

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

10 June 2021, 17:42:47

On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> I see one exit for HEAPTUPLE_DEAD on a potentially recently committed
> xvac (?), and we might also check against recently committed
> transactions if xmin == xmax, although apparently that is not
> implemented right now.

I don't follow. Perhaps you can produce a test case?

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Alvaro Herrera

Date:

10 June 2021, 18:14:55

On 2021-Jun-10, Peter Geoghegan wrote:

> On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed
> > xvac (?), and we might also check against recently committed
> > transactions if xmin == xmax, although apparently that is not
> > implemented right now.

xvac was used by the pre-9.0 VACUUM FULL, so I don't think it's possible
to see a recently committed one.  (I think you'd have to find a table
that was pg_upgraded from 8.4 or older, with leftover tuples from an
aborted VACUUM FULL, and never vacuumed after that.)

A scenario with such a tuple on disk is not impossible [in theory],
but if it does exist, then the VACUUM FULL would not be in the
possibly-visible horizon.

-- 
Álvaro Herrera       Valdivia, Chile
"Linux transformó mi computadora, de una `máquina para hacer cosas',
en un aparato realmente entretenido, sobre el cual cada día aprendo
algo nuevo" (Jaime Salinas)

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

11 June 2021, 00:58:07

Hi,

On 2021-06-08 19:18:18 -0500, Justin Pryzby wrote:
> I reproduced the issue on a new/fresh cluster like this:
> 
> ./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c
log_autovacuum_min_duration=-1
> psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON
t(i);"
> time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done&
> time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done&
> 
> TRAP: FailedAssertion("restarts == 0", File: "vacuumlazy.c", Line: 1803, PID: 10367)

Has anybody looked at getting test coverage for the retry path? Not with
the goal of triggering an assertion, just to have at least basic
coverage.

The problem with writing a test is likely to find a way to halfway
reliably schedule a transaction abort after pruning, but before the
tuple-removal loop? Does anybody see a trick to do so?

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

11 June 2021, 01:49:50

On Thu, Jun 10, 2021 at 5:58 PM Andres Freund <andres@anarazel.de> wrote:
> The problem with writing a test is likely to find a way to halfway
> reliably schedule a transaction abort after pruning, but before the
> tuple-removal loop? Does anybody see a trick to do so?

I asked Alexander about using his pending stop events infrastructure
patch to test this code, back when it did the tupgone stuff rather
than loop:

https://postgr.es/m/CAH2-Wz=Tb7bAgCFt0VFA0YJ5Vd1RxJqZRc

I can't see any better way.

ISTM that it would be much more useful to focus on adding an assertion
(or maybe even a "can't happen" error) that fails when the DEAD/goto
path is reached with a tuple whose xmin wasn't aborted. If that was in
place then we would have caught the bug in
GetOldestNonRemovableTransactionId() far sooner. That might actually
catch other bugs in the future.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

11 June 2021, 02:00:13

Hi,

On 2021-06-10 18:49:50 -0700, Peter Geoghegan wrote:
> ISTM that it would be much more useful to focus on adding an assertion
> (or maybe even a "can't happen" error) that fails when the DEAD/goto
> path is reached with a tuple whose xmin wasn't aborted. If that was in
> place then we would have caught the bug in
> GetOldestNonRemovableTransactionId() far sooner. That might actually
> catch other bugs in the future.

I'm not convinced - right now we don't exercise this path in tests at
all. More assertions won't change that - stuff that can be triggered in
production-ish loads doesn't help during development. I do think that
that makes it far too easy to have state management bugs (e.g. a wrong
pincount in retry cases or such).

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Tom Lane

Date:

11 June 2021, 02:00:24

Peter Geoghegan <pg@bowt.ie> writes:
> ISTM that it would be much more useful to focus on adding an assertion
> (or maybe even a "can't happen" error) that fails when the DEAD/goto
> path is reached with a tuple whose xmin wasn't aborted. If that was in
> place then we would have caught the bug in
> GetOldestNonRemovableTransactionId() far sooner. That might actually
> catch other bugs in the future.

Sounds like a good idea.  If we expect that path to be taken only
rarely, then a test-and-elog would be worth its keep.  Otherwise
maybe it should just be an Assert.

            regards, tom lane

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

11 June 2021, 02:15:59

On Thu, Jun 10, 2021 at 7:00 PM Andres Freund <andres@anarazel.de> wrote:
> I'm not convinced - right now we don't exercise this path in tests at
> all. More assertions won't change that - stuff that can be triggered in
> production-ish loads doesn't help during development. I do think that
> that makes it far too easy to have state management bugs (e.g. a wrong
> pincount in retry cases or such).

The code in lazy_scan_prune() led to our detecting this bug (albeit in
a fairly nasty way). The problematic VACUUM operations never actually
exercised the goto as originally designed, for the purpose it was
intended for. Perhaps we should add test coverage for the intended
behavior too, but that doesn't seem particularly relevant right now.

--
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

11 June 2021, 02:38:16

Hi,

On 2021-06-10 19:15:59 -0700, Peter Geoghegan wrote:
> On Thu, Jun 10, 2021 at 7:00 PM Andres Freund <andres@anarazel.de> wrote:
> > I'm not convinced - right now we don't exercise this path in tests at
> > all. More assertions won't change that - stuff that can be triggered in
> > production-ish loads doesn't help during development. I do think that
> > that makes it far too easy to have state management bugs (e.g. a wrong
> > pincount in retry cases or such).
> 
> The code in lazy_scan_prune() led to our detecting this bug (albeit in
> a fairly nasty way). The problematic VACUUM operations never actually
> exercised the goto as originally designed, for the purpose it was
> intended for. Perhaps we should add test coverage for the intended
> behavior too, but that doesn't seem particularly relevant right now.

Well, I'd like to add assertions ensuring the retry path is only entered
when correct - but I feel hesitant about doing so when I can't exercise
that path reliably in at least some of the situations.

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

11 June 2021, 03:16:40

On Thu, Jun 10, 2021 at 7:38 PM Andres Freund <andres@anarazel.de> wrote:
> Well, I'd like to add assertions ensuring the retry path is only entered
> when correct - but I feel hesitant about doing so when I can't exercise
> that path reliably in at least some of the situations.

I originally tested the lazy_scan_prune() goto in the obvious way: by
adding a pg_usleep() just after its heap_page_prune() call. I'm not
too worried about the restart corrupting state or something, because
the state is pretty trivial. In any case the infrastructure to
exercise the goto inside the tests doesn't exist yet -- I don't see
any way around that on HEAD.

OTOH I *am* concerned about the goto doing the wrong thing due to bugs
in distant code. I cannot imagine any possible downside to at least
asserting HeapTupleHeaderXminInvalid() against the "concurrently
inserted then abort" tuple. That simple measure would have been enough
to at least catch this particular bug far sooner.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

14 June 2021, 09:53:47

On Thu, 10 Jun 2021 at 19:43, Peter Geoghegan <pg@bowt.ie> wrote:
>
> On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent
> <boekewurm+postgres@gmail.com> wrote:
> > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed
> > xvac (?), and we might also check against recently committed
> > transactions if xmin == xmax, although apparently that is not
> > implemented right now.
>
> I don't follow. Perhaps you can produce a test case?

If you were to delete a tuple in the same transaction that you create
it (without checkpoints / subtransactions), I would assume that this
would allow us to vacuum the tuple, as the only snapshot that could
see the tuple must commit or roll back. In any case, inside the
transaction the tuple is not visible anymore, and outside the
transaction the tuple will never be seen. That being the case, any
such tuple that has xmin == xmax should be vacuumable at any time,
except that you might want to wait for the transaction to have
committed/rolled back to prevent any race conditions with (delayed)
index insertions.

example:

BEGIN;
INSERT INTO tab VALUES (1);
DELETE FROM tab;
-- At this point, the tuple inserted cannot be seen in any
-- current or future snapshot, and could thus be vacuumed.
COMMIT;

Because I am not quite yet well versed with the xid assignment and
heapam deletion subsystems, it could very well be that either this
case is impossible to reach, or that the heapam tuple delete logic
already applies this at tuple delete time.

With regards,

Matthias van de Meent

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

14 June 2021, 22:12:26

Hi,

On 2021-06-14 11:53:47 +0200, Matthias van de Meent wrote:
> On Thu, 10 Jun 2021 at 19:43, Peter Geoghegan <pg@bowt.ie> wrote:
> >
> > On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent
> > <boekewurm+postgres@gmail.com> wrote:
> > > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed
> > > xvac (?), and we might also check against recently committed
> > > transactions if xmin == xmax, although apparently that is not
> > > implemented right now.
> >
> > I don't follow. Perhaps you can produce a test case?
> 
> If you were to delete a tuple in the same transaction that you create
> it (without checkpoints / subtransactions), I would assume that this
> would allow us to vacuum the tuple, as the only snapshot that could
> see the tuple must commit or roll back.

Right now we do not do so, but I think we talked about adding such logic
a couple times.

I think a more robust assertion than aborted-ness could be to assert
that repeated retries are not allowed to have the same "oldest xid" than
a previous retry. With oldest xid be the older of xmin/xmax?

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

15 June 2021, 01:22:42

Hi,

> @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
>  static void
>  GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
>  {
> +    /* assert non-decreasing nature of horizons */
> +    Assert(FullTransactionIdFollowsOrEquals(
> +               FullXidRelativeTo(horizons->latest_completed,
> +                                 horizons->shared_oldest_nonremovable),
> +               GlobalVisSharedRels.maybe_needed));
> +    Assert(FullTransactionIdFollowsOrEquals(
> +               FullXidRelativeTo(horizons->latest_completed,
> +                                 horizons->catalog_oldest_nonremovable),
> +               GlobalVisCatalogRels.maybe_needed));
> +    Assert(FullTransactionIdFollowsOrEquals(
> +               FullXidRelativeTo(horizons->latest_completed,
> +                                 horizons->data_oldest_nonremovable),
> +               GlobalVisDataRels.maybe_needed));
> +    Assert(FullTransactionIdFollowsOrEquals(
> +               FullXidRelativeTo(horizons->latest_completed,
> +                                 horizons->temp_oldest_nonremovable),
> +               GlobalVisTempRels.maybe_needed));
> +
>      GlobalVisSharedRels.maybe_needed =
>          FullXidRelativeTo(horizons->latest_completed,
>                            horizons->shared_oldest_nonremovable);

Thinking more about it, I don't think these are correct. See the
following comment in procarray.c:

 * Note: despite the above, it's possible for the calculated values to move
 * backwards on repeated calls. The calculated values are conservative, so
 * that anything older is definitely not considered as running by anyone
 * anymore, but the exact values calculated depend on a number of things. For
 * example, if there are no transactions running in the current database, the
 * horizon for normal tables will be latestCompletedXid. If a transaction
 * begins after that, its xmin will include in-progress transactions in other
 * databases that started earlier, so another call will return a lower value.
 * Nonetheless it is safe to vacuum a table in the current database with the
 * first result.  There are also replication-related effects: a walsender
 * process can set its xmin based on transactions that are no longer running
 * on the primary but are still being replayed on the standby, thus possibly
 * making the values go backwards.  In this case there is a possibility that
 * we lose data that the standby would like to have, but unless the standby
 * uses a replication slot to make its xmin persistent there is little we can
 * do about that --- data is only protected if the walsender runs continuously
 * while queries are executed on the standby.  (The Hot Standby code deals
 * with such cases by failing standby queries that needed to access
 * already-removed data, so there's no integrity bug.)  The computed values
 * are also adjusted with vacuum_defer_cleanup_age, so increasing that setting
 * on the fly is another easy way to make horizons move backwards, with no
 * consequences for data integrity.

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

16 June 2021, 10:59:33

On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
> >  static void
> >  GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
> >  {
> > +     /* assert non-decreasing nature of horizons */
>
> Thinking more about it, I don't think these are correct. See the
> following comment in procarray.c:
>
>  * Note: despite the above, it's possible for the calculated values to move
>  * backwards on repeated calls.

So the implicit assumption in heap_page_prune that
HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
we'll need to redo the condition in heap_page_prune as well.

PFA my adapted patch that fixes this new-ish issue, and does not
include the (incorrect) assertions in GlobalVisUpdateApply. I've
tested this against the reproducing case, both with and without the
fix in GetOldestNonRemovableTransactionId, and it fails fall into an
infinite loop.

I would appreciate it if someone could validate the new logic in the
HEAPTUPLE_DEAD case. Although I believe it correctly handles the case
where the vistest non-removable horizon moved backwards, a second pair
of eyes would be appreciated.

With regards,

Matthias van de Meent

Attachment

v2-0001-Fix-a-bug-in-GetOldestNonRemovableTransactionId.patch

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

16 June 2021, 16:03:29

On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
<boekewurm+postgres@gmail.com> wrote:
> On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres@anarazel.de> wrote:
> > > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state)
> > >  static void
> > >  GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons)
> > >  {
> > > +     /* assert non-decreasing nature of horizons */
> >
> > Thinking more about it, I don't think these are correct. See the
> > following comment in procarray.c:
> >
> >  * Note: despite the above, it's possible for the calculated values to move
> >  * backwards on repeated calls.
>
> So the implicit assumption in heap_page_prune that
> HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> we'll need to redo the condition in heap_page_prune as well.

I don't think that this shows that the assumption within
lazy_scan_prune() (the assumption that both "satisfies vacuum"
functions agree) is wrong, with the obvious exception of cases
involving the bug that Justin reported. GlobalVis*.maybe_needed is
supposed to be conservative.

> PFA my adapted patch that fixes this new-ish issue, and does not
> include the (incorrect) assertions in GlobalVisUpdateApply. I've
> tested this against the reproducing case, both with and without the
> fix in GetOldestNonRemovableTransactionId, and it fails fall into an
> infinite loop.
>
> I would appreciate it if someone could validate the new logic in the
> HEAPTUPLE_DEAD case. Although I believe it correctly handles the case
> where the vistest non-removable horizon moved backwards, a second pair
> of eyes would be appreciated.

If you look at the lazy_scan_prune() logic immediately prior to commit
8523492d4e3, you'll see that it used to have a HEAPTUPLE_DEAD case
that didn't involve a restart -- this was the "tupgone" mechanism.
Back then we actually had to remove any corresponding index tuples
from indexes when in this rare case. Plus there was a huge amount of
complicated mechanism to handle a very rare case, all of which was
removed by commit 8523492d4e3. Things like extra recovery conflict
code just for this rare case, or needing to acquire a super exclusive
lock on pages during VACUUM's second heap pass. This is all cruft that
I was happy to get rid of.

This is a good discussion of the tupgone stuff and the problems it
caused, which is good background information:

https://www.postgresql.org/message-id/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de

Even if it was true that heap_prune_satisfies_vacuum() won't agree
with HeapTupleSatisfiesVacuum() after repeated retries within
lazy_scan_prune(), it would probably best if we then made code outside
vacuumlazy.c agree with the lazy_scan_prune() assumption, rather than
the other way around.

Have you actually been able to demonstrate a problem involving
lazy_scan_prune()'s goto, except the main
GetOldestNonRemovableTransactionId() bug reported by Justin?

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

16 June 2021, 16:46:07

On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > So the implicit assumption in heap_page_prune that
> > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > we'll need to redo the condition in heap_page_prune as well.
>
> I don't think that this shows that the assumption within
> lazy_scan_prune() (the assumption that both "satisfies vacuum"
> functions agree) is wrong, with the obvious exception of cases
> involving the bug that Justin reported. GlobalVis*.maybe_needed is
> supposed to be conservative.

I suppose it's true that they can disagree because we call
vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
before calling GlobalVisTestFor() inside vacuumlazy.c to get a
vistest. But that only implies that a tuple that would have been
considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
being considered DEAD according to OldestXmin) is seen as an LP_DEAD
stub line pointer. Which really means it's DEAD to lazy_scan_prune()
anyway. These days the only way that lazy_scan_prune() can consider a
tuple fully DEAD is if it's no longer a tuple -- it has to actually be
an LP_DEAD stub line pointer.

It's really no different to an opportunistic prune that concurrently
prunes tuples that VACUUM would have seen as RECENTLY_DEAD if it was
going solely on the OldestXmin cutoff. There are certain kinds of
tables where non-HOT updates and opportunistic pruning constantly
leave behind loads of LP_DEAD items. Pruning inside VACUUM won't do
much of the total required pruning at all. That'll mean that some
DEAD/LP_DEAD items will become dead long after a VACUUM starts, while
nevertheless being removed by the same VACUUM. Of course there is no
way for lazy_scan_prune() to distinguish one LP_DEAD item from another
-- they're all stubs without tuple storage, and without a tuple header
with XIDs.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

16 June 2021, 19:12:23

Hi,

On 2021-06-16 12:59:33 +0200, Matthias van de Meent wrote:
> PFA my adapted patch that fixes this new-ish issue, and does not
> include the (incorrect) assertions in GlobalVisUpdateApply. I've
> tested this against the reproducing case, both with and without the
> fix in GetOldestNonRemovableTransactionId, and it fails fall into an
> infinite loop.

Could you share your testcase? I've been working on a series of patches
to address this (I'll share in a bit), and I've run quite a few tests,
and didn't hit any infinite loops.



> diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> index 4b600e951a..f4320d5a34 100644
> --- a/src/backend/access/heap/vacuumlazy.c
> +++ b/src/backend/access/heap/vacuumlazy.c
> @@ -1675,6 +1675,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
>   * that any items that make it into the dead_tuples array are simple LP_DEAD
>   * line pointers, and that every remaining item with tuple storage is
>   * considered as a candidate for freezing.
> + * 
> + * Note: It is possible that vistest's window moves back from the
> + * vacrel->OldestXmin (see ComputeXidHorizons). To prevent an infinite
> + * loop where we bounce between HeapTupleSatisfiesVacuum and 
> + * heap_prune_satisfies_vacuum who disagree on the [almost]deadness of
> + * a tuple, we only retry when we know HTSV agrees with HPSV.
>   */

HTSV is quite widely used because HeapTupleSatisfiesVacuum is quite
widely used. HPSV isn't, so it's a bit confusing to use this.


Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

16 June 2021, 19:22:02

Hi,

On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote:
> On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > > So the implicit assumption in heap_page_prune that
> > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > > we'll need to redo the condition in heap_page_prune as well.
> >
> > I don't think that this shows that the assumption within
> > lazy_scan_prune() (the assumption that both "satisfies vacuum"
> > functions agree) is wrong, with the obvious exception of cases
> > involving the bug that Justin reported. GlobalVis*.maybe_needed is
> > supposed to be conservative.
> 
> I suppose it's true that they can disagree because we call
> vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
> before calling GlobalVisTestFor() inside vacuumlazy.c to get a
> vistest. But that only implies that a tuple that would have been
> considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
> being considered DEAD according to OldestXmin) is seen as an LP_DEAD
> stub line pointer. Which really means it's DEAD to lazy_scan_prune()
> anyway. These days the only way that lazy_scan_prune() can consider a
> tuple fully DEAD is if it's no longer a tuple -- it has to actually be
> an LP_DEAD stub line pointer.

I think it's more complicated than that - "before" isn't a guarantee when the
horizon can go backwards. Consider the case where a hot_standby_feedback=on
replica without a slot connects - that can result in the xid horizon going
backwards.

I think a good way to address this might be to have GlobalVisUpdateApply()
ensure that maybe_needed does not go backwards within one backend.

This is *nearly* already guaranteed within vacuum, except for the case where a
catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
lead to an attempt at pruning, which could move maybe_needed to go backwards
theoretically if inbetween those two steps a replica connected that causes the
horizon to go backwards.

Greetings,

Andres Freund

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

16 June 2021, 19:23:06

On Wed, 16 Jun 2021 at 21:12, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-06-16 12:59:33 +0200, Matthias van de Meent wrote:
> > PFA my adapted patch that fixes this new-ish issue, and does not
> > include the (incorrect) assertions in GlobalVisUpdateApply. I've
> > tested this against the reproducing case, both with and without the
> > fix in GetOldestNonRemovableTransactionId, and it fails fall into an
> > infinite loop.

* Failst _to_ fall into an infinite loop. Sorry, failed to add a "to".
It passes tests

> Could you share your testcase? I've been working on a series of patches
> to address this (I'll share in a bit), and I've run quite a few tests,
> and didn't hit any infinite loops.

Basically, I've tested using the test case shared earlier; 2 sessions
spamming connections with "reindex concurrently some_index" and
"analyze pg_attribute" against the same database.

>
>
> > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c
> > index 4b600e951a..f4320d5a34 100644
> > --- a/src/backend/access/heap/vacuumlazy.c
> > +++ b/src/backend/access/heap/vacuumlazy.c
> > @@ -1675,6 +1675,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive)
> >   * that any items that make it into the dead_tuples array are simple LP_DEAD
> >   * line pointers, and that every remaining item with tuple storage is
> >   * considered as a candidate for freezing.
> > + *
> > + * Note: It is possible that vistest's window moves back from the
> > + * vacrel->OldestXmin (see ComputeXidHorizons). To prevent an infinite
> > + * loop where we bounce between HeapTupleSatisfiesVacuum and
> > + * heap_prune_satisfies_vacuum who disagree on the [almost]deadness of
> > + * a tuple, we only retry when we know HTSV agrees with HPSV.
> >   */
>
> HTSV is quite widely used because HeapTupleSatisfiesVacuum is quite
> widely used. HPSV isn't, so it's a bit confusing to use this.

Sure. I thought it was fine to shorten, as the full function name was
just named the line above and it's a long name, but I'm fine with
either.

Kind regards,

Matthias

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Matthias van de Meent

Date:

16 June 2021, 20:08:39

On Wed, 16 Jun 2021 at 21:22, Andres Freund <andres@anarazel.de> wrote:
>
> Hi,
>
> On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote:
> > On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote:
> > > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent
> > > > So the implicit assumption in heap_page_prune that
> > > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with
> > > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case,
> > > > we'll need to redo the condition in heap_page_prune as well.
> > >
> > > I don't think that this shows that the assumption within
> > > lazy_scan_prune() (the assumption that both "satisfies vacuum"
> > > functions agree) is wrong, with the obvious exception of cases
> > > involving the bug that Justin reported. GlobalVis*.maybe_needed is
> > > supposed to be conservative.
> >
> > I suppose it's true that they can disagree because we call
> > vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c
> > before calling GlobalVisTestFor() inside vacuumlazy.c to get a
> > vistest. But that only implies that a tuple that would have been
> > considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed
> > being considered DEAD according to OldestXmin) is seen as an LP_DEAD
> > stub line pointer. Which really means it's DEAD to lazy_scan_prune()
> > anyway. These days the only way that lazy_scan_prune() can consider a
> > tuple fully DEAD is if it's no longer a tuple -- it has to actually be
> > an LP_DEAD stub line pointer.
>
> I think it's more complicated than that - "before" isn't a guarantee when the
> horizon can go backwards. Consider the case where a hot_standby_feedback=on
> replica without a slot connects - that can result in the xid horizon going
> backwards.
>
> I think a good way to address this might be to have GlobalVisUpdateApply()
> ensure that maybe_needed does not go backwards within one backend.
>
> This is *nearly* already guaranteed within vacuum, except for the case where a
> catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
> lead to an attempt at pruning, which could move maybe_needed to go backwards
> theoretically if inbetween those two steps a replica connected that causes the
> horizon to go backwards.

I'm tempted to suggest "update one of GlobalVisUpdateApply /
ComputeXidHorizons to be non-decreasing". We already have the
information that any previous GlobalVis*->maybe_needed is correct, and
that if maybe_needed has been higher that that value is still correct,
so we might just as well update the code to envelop that case. There's
some cases where this might be dangerous: New transactions after a
time with no active backends (in this case it should be fine to
guarantee non-decreasing GlobalVisTestNonRemovableHorizon), and
walsender. I'm uncertain whether or not it's dangerous to _not_
rollback maybe_needed for a new walsender-backend (e.g. the backend
might want to construct a snapshot of (then) before
GlobalVisTestNonRemovableHorizon), especially when considering the
comment in ProcessStandbyHSFeedbackMessage:

     * There is a small window for a race condition here: although we just
     * checked that feedbackXmin precedes nextXid, the nextXid could have
     * gotten advanced between our fetching it and applying the xmin below,
     * perhaps far enough to make feedbackXmin wrap around.  In that case the
     * xmin we set here would be "in the future" and have no effect.  No point
     * in worrying about this since it's too late to save the desired data
     * anyway.  Assuming that the standby sends us an increasing sequence of
     * xmins, this could only happen during the first reply cycle, else our
     * own xmin would prevent nextXid from advancing so far.

At the very least, changing GlobalVisUpdateApply/ComputeXidHorizons
would increase the potential amount of data lost in such race
conditions, if any.

As further note, my suggested changes in vacuumlazy (specifically, the
'continue' path added in lazy_scan_prune in my recent v2 patchset) is
likely incorrect because of a potential undocumented requirement of
heap_page_prune: leave no dead tuples with xmax < vacrel->OldestXmin.
I realised that in my patch, we would allow some these tuples to
continue to exist IFF the GlobalVisTestNonRemovableHorizon moved back
during the vacuum, which would violate such requirement.

Kind regards,

Matthias van de Meent.

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

16 June 2021, 20:21:58

On Wed, Jun 16, 2021 at 12:22 PM Andres Freund <andres@anarazel.de> wrote:
> I think it's more complicated than that - "before" isn't a guarantee when the
> horizon can go backwards. Consider the case where a hot_standby_feedback=on
> replica without a slot connects - that can result in the xid horizon going
> backwards.

Oh yeah, I think that I get it now. Tell me if this sounds right to you:

It's not so much that HeapTupleSatisfiesVacuum() "disagrees" with
heap_prune_satisfies_vacuum() in a way that actually matters to
VACUUM. While there does seem to be a fairly mundane bug in
GetOldestNonRemovableTransactionId() that really is a matter of
disagreement between the two functions, the fundamental issue is
deeper than that. The fundamental issue is that it's not okay to
assume that the XID horizon won't go backwards. This probably matters
for lots of reasons. The most obvious reason is that in theory it
could cause lazy_scan_prune() to get stuck in about the same way as
Justin reported, with the GetOldestNonRemovableTransactionId() bug.

This isn't an issue in the backbranches because we're using the same
OldestXmin value directly when calling heap_page_prune(). We only ever
have one xid horizon cutoff like that per VACUUM (we only have
OldestXmin, no vistest), so clearly it's not a problem.

> I think a good way to address this might be to have GlobalVisUpdateApply()
> ensure that maybe_needed does not go backwards within one backend.
>
> This is *nearly* already guaranteed within vacuum, except for the case where a
> catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could
> lead to an attempt at pruning, which could move maybe_needed to go backwards
> theoretically if inbetween those two steps a replica connected that causes the
> horizon to go backwards.

This would at least be easy to test. I like the idea of adding invariants.

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

21 June 2021, 12:29:19

Hi,

On 2021-06-16 12:12:23 -0700, Andres Freund wrote:
> Could you share your testcase? I've been working on a series of patches
> to address this (I'll share in a bit), and I've run quite a few tests,
> and didn't hit any infinite loops.

Sorry for not yet doing that. Unfortunately I have an ongoing family
health issue (& associated travel) claiming time and energy :(.

I've pushed the minimal fix due to beta 2.

Beyond beta 2 I am thinking of the below to unify the horizon
determination:

static inline GlobalVisHorizonKind
GlobalVisHorizonKindForRel(Relation rel)
{
   if (!rel)
       return VISHORIZON_SHARED;

   /*
    * Other relkkinds currently don't contain xids, nor always the necessary
    * logical decoding markers.
    */
   Assert(rel->rd_rel->relkind == RELKIND_RELATION ||
          rel->rd_rel->relkind == RELKIND_MATVIEW ||
          rel->rd_rel->relkind == RELKIND_TOASTVALUE);

   if (rel->rd_rel->relisshared || RecoveryInProgress())
       return VISHORIZON_SHARED;
   else if (IsCatalogRelation(rel) ||
            RelationIsAccessibleInLogicalDecoding(rel))
       return VISHORIZON_CATALOG;
   else if (!RELATION_IS_LOCAL(rel))
       return VISHORIZON_DATA;
   else
       return VISHORIZON_TEMP;
}

That's then used in GetOldestNonRemovableTransactionId(),
GlobalVisTestFor(). Makes sense?

Regards,

Andres

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Peter Geoghegan

Date:

16 July 2021, 23:13:03

On Wed, Jun 16, 2021 at 1:21 PM Peter Geoghegan <pg@bowt.ie> wrote:
> Oh yeah, I think that I get it now. Tell me if this sounds right to you:
>
> It's not so much that HeapTupleSatisfiesVacuum() "disagrees" with
> heap_prune_satisfies_vacuum() in a way that actually matters to
> VACUUM. While there does seem to be a fairly mundane bug in
> GetOldestNonRemovableTransactionId() that really is a matter of
> disagreement between the two functions, the fundamental issue is
> deeper than that. The fundamental issue is that it's not okay to
> assume that the XID horizon won't go backwards. This probably matters
> for lots of reasons. The most obvious reason is that in theory it
> could cause lazy_scan_prune() to get stuck in about the same way as
> Justin reported, with the GetOldestNonRemovableTransactionId() bug.

Any update on this, Andres?

-- 
Peter Geoghegan

Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic

From

Andres Freund

Date:

25 July 2021, 03:34:55

Hi,

On 2021-06-21 05:29:19 -0700, Andres Freund wrote:
> On 2021-06-16 12:12:23 -0700, Andres Freund wrote:
> > Could you share your testcase? I've been working on a series of patches
> > to address this (I'll share in a bit), and I've run quite a few tests,
> > and didn't hit any infinite loops.
> 
> Sorry for not yet doing that. Unfortunately I have an ongoing family
> health issue (& associated travel) claiming time and energy :(.
> 
> I've pushed the minimal fix due to beta 2.
> 
> Beyond beta 2 I am thinking of the below to unify the horizon
> determination:
> 
> static inline GlobalVisHorizonKind
> GlobalVisHorizonKindForRel(Relation rel)

I finally pushed this cleanup.

Greetings,

Andres Freund