Thread: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
An internal instance was rejecting connections with "too many clients". I found a bunch of processes waiting on a futex and I was going to upgrade the kernel (3.10.0-514) and dismiss the issue. However, I also found an autovacuum chewing 100% CPU, and it appears the problem is actually because autovacuum has locked a page of pg-statistic, and every other process then gets stuck waiting in the planner. I checked a few and found these: #13 0x0000000000961908 in SearchSysCache3 (cacheId=cacheId@entry=59, key1=key1@entry=2610, key2=key2@entry=2, key3=key3@entry=0)at syscache.c:1156 As for the autovacuum: $ ps -wwf 18950 UID PID PPID C STIME TTY STAT TIME CMD postgres 18950 7179 93 Jun04 ? ts 2049:20 postgres: autovacuum worker ts (gdb) #0 0x00000000004f995c in heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7ffe7a0cd0c0, tup=tup@entry=0x7ffe7a0cce50,buffer=buffer@entry=14138) at pruneheap.c:423 #1 0x00000000004fa154 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, rootoffnum=11, buffer=14138) at pruneheap.c:644 #2 heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=<optimized out>, off_loc@entry=0x1d1b3fc)at pruneheap.c:278 #3 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75, page=page@entry=0x2aaab2089e00"G\f", vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7ffe7a0ced80) at vacuumlazy.c:1712 #4 0x0000000000500263 in lazy_scan_heap (aggressive=<optimized out>, params=0x1c77b7c, vacrel=<optimized out>) at vacuumlazy.c:1347 #5 heap_vacuum_rel (rel=0x7f0349466d28, params=0x1c77b7c, bstrategy=<optimized out>) at vacuumlazy.c:612 #6 0x000000000067418a in table_relation_vacuum (bstrategy=<optimized out>, params=0x1c77b7c, rel=0x7f0349466d28) at ../../../src/include/access/tableam.h:1678 #7 vacuum_rel (relid=2619, relation=<optimized out>, params=params@entry=0x1c77b7c) at vacuum.c:2001 #8 0x000000000067556e in vacuum (relations=0x1cc5008, params=params@entry=0x1c77b7c, bstrategy=<optimized out>, bstrategy@entry=0x1c77400,isTopLevel=isTopLevel@entry=true) at vacuum.c:461 #9 0x0000000000783c13 in autovacuum_do_vac_analyze (bstrategy=0x1c77400, tab=0x1c77b78) at autovacuum.c:3284 #10 do_autovacuum () at autovacuum.c:2537 #11 0x0000000000784073 in AutoVacWorkerMain (argv=0x0, argc=0) at autovacuum.c:1715 #12 0x00000000007841c9 in StartAutoVacWorker () at autovacuum.c:1500 #13 0x0000000000792b9c in StartAutovacuumWorker () at postmaster.c:5547 #14 sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5251 #15 <signal handler called> #16 0x00007f0346c56783 in __select_nocancel () from /lib64/libc.so.6 #17 0x000000000048ee7d in ServerLoop () at postmaster.c:1709 #18 0x0000000000793e98 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x1bed280) at postmaster.c:1417 #19 0x0000000000491272 in main (argc=3, argv=0x1bed280) at main.c:209 heap_page_prune() is being called repeatedly, with (I think) the same arguments. (gdb) c Continuing. Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225 225 in pruneheap.c (gdb) Continuing. Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225 225 in pruneheap.c (gdb) Continuing. Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225 225 in pruneheap.c (gdb) Continuing. Breakpoint 3, heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x1d1b3fc) at pruneheap.c:225 225 in pruneheap.c (gdb) p *vacrel $3 = {rel = 0x7f0349466d28, indrels = 0x1d1b500, nindexes = 1, do_index_vacuuming = true, do_index_cleanup = true, do_failsafe= false, bstrategy = 0x1c77400, lps = 0x0, old_rel_pages = 80, old_live_tuples = 1101, relfrozenxid = 909081649, relminmxid = 53341561, OldestXmin = 913730329, FreezeLimit = 863730329,MultiXactCutoff = 48553302, relnamespace = 0x1d1b520 "pg_catalog", relname = 0x1d1b548 "pg_statistic", indname = 0x0, blkno = 75, offnum = 15, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP, dead_tuples= 0x1ccef10, rel_pages = 85, scanned_pages = 76, pinskipped_pages = 0, frozenskipped_pages = 0, tupcount_pages = 76, pages_removed = 0, lpdead_item_pages = 65, nonempty_pages= 75, lock_waiter_detected = false, new_rel_tuples = 0, new_live_tuples = 0, indstats = 0x1d1b590, num_index_scans = 0, tuples_deleted = 757, lpdead_items = 1103, new_dead_tuples= 0, num_tuples = 973, live_tuples = 973} (gdb) p *rel $2 = {rd_node = {spcNode = 1663, dbNode = 16886, relNode = 107230415}, rd_smgr = 0x1d0a670, rd_refcnt = 1, rd_backend = -1,rd_islocaltemp = false, rd_isnailed = false, rd_isvalid = true, rd_indexvalid = true, rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_firstRelfilenodeSubid =0, rd_droppedSubid = 0, rd_rel = 0x7f0349466f40, rd_att = 0x7f0349467058, rd_id = 2619, rd_lockInfo = {lockRelId = {relId = 2619, dbId = 16886}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0,rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0, rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0,rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x7f0349498e40, rd_pkindex = 2696, rd_replidindex = 0,rd_statlist = 0x0, rd_indexattr = 0x7f0349498ee8, rd_keyattr = 0x7f0349498e98, rd_pkattr = 0x7f0349498ec0, rd_idattr = 0x0, rd_pubactions = 0x0, rd_options = 0x0, rd_amhandler= 3, rd_tableam = 0x9ccfc0 <heapam_methods>, rd_index = 0x0, rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo= 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0,rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_info = 0x1c88940} (gdb) p *relation->rd_rel $8 = {oid = 2619, relname = {data = "pg_statistic", '\000' <repeats 51 times>}, relnamespace = 11, reltype = 13029, reloftype= 0, relowner = 10, relam = 2, relfilenode = 107230415, reltablespace = 0, relpages = 80, reltuples = 1101, relallvisible = 11, reltoastrelid = 2840, relhasindex = true, relisshared= false, relpersistence = 112 'p', relkind = 114 'r', relnatts = 31, relchecks = 0, relhasrules = false, relhastriggers = false, relhassubclass = false, relrowsecurity = false, relforcerowsecurity= false, relispopulated = true, relreplident = 110 'n', relispartition = false, relrewrite = 0, relfrozenxid = 909081649, relminmxid = 53341561} (gdb) info locals rel = 0x7f0349466d28 offnum = 4 maxoff = 21 itemid = 0x2aaab2089e24 tuple = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data = 0x2aaab208bbc8} (gdb) p *itemid $2 = {lp_off = 7624, lp_flags = 1, lp_len = 564} I'll leave the instance running for a little bit before restarting (or kill-9) in case someone requests more info. See also: https://www.postgresql.org/message-id/2591376.1621196582@sss.pgh.pa.us These commits may be relevant. commit 3c3b8a4b26891892bccf3d220580a7f413c0b9ca Author: Peter Geoghegan <pg@bowt.ie> Date: Wed Apr 7 08:47:15 2021 -0700 Truncate line pointer array during VACUUM. commit 7ab96cf6b312cfcd79cdc1a69c6bdb75de0ed30f Author: Peter Geoghegan <pg@bowt.ie> Date: Tue Apr 6 07:49:39 2021 -0700 Refactor lazy_scan_heap() loop. commit 8523492d4e349c4714aa2ab0291be175a88cb4fc Author: Peter Geoghegan <pg@bowt.ie> Date: Tue Apr 6 08:49:22 2021 -0700 Remove tupgone special case from vacuumlazy.c. commit dc7420c2c9274a283779ec19718d2d16323640c0 Author: Andres Freund <andres@anarazel.de> Date: Wed Aug 12 16:03:49 2020 -0700 snapshot scalability: Don't compute global horizons while building snapshots.
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote: > > An internal instance was rejecting connections with "too many clients". > I found a bunch of processes waiting on a futex and I was going to upgrade the > kernel (3.10.0-514) and dismiss the issue. > > However, I also found an autovacuum chewing 100% CPU, and it appears the > problem is actually because autovacuum has locked a page of pg-statistic, and > every other process then gets stuck waiting in the planner. I checked a few > and found these: My suspicion is that for some tuple on that page HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it thinks should have been cleaned up by heap_page_prune, but isn't. This would result in an infinite loop in lazy_scan_prune where the condition on vacuumlazy.c:1800 will always be true, but the retry will not do the job it's expected to do. Apart from reporting this suspicion, I sadly can't help you much further, as my knowledge and experience on vacuum and snapshot horizons is only limited and probably won't help you in this. I think it would be helpful for further debugging if we would have the state of the all tuples on that page (well, the tuple headers with their transactionids and their line pointers), as that would help with determining if my suspicion could be correct. With regards, Matthias van de Meent
Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote: >> However, I also found an autovacuum chewing 100% CPU, and it appears the >> problem is actually because autovacuum has locked a page of pg-statistic, and >> every other process then gets stuck waiting in the planner. I checked a few >> and found these: > My suspicion is that for some tuple on that page > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it > thinks should have been cleaned up by heap_page_prune, but isn't. This > would result in an infinite loop in lazy_scan_prune where the > condition on vacuumlazy.c:1800 will always be true, but the retry will > not do the job it's expected to do. Since Justin's got a debugger on the process already, it probably wouldn't be too hard to confirm or disprove that theory by stepping through the code. regards, tom lane
On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > I'll leave the instance running for a little bit before restarting (or kill-9) > in case someone requests more info. How about dumping the page image out, and sharing it with the list? This procedure should work fine from gdb: https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB I suggest that you dump the "page" pointer inside lazy_scan_prune(). I imagine that you have the instance already stuck in an infinite loop, so what we'll probably see from the page image is the page after the first prune and another no-progress prune. -- Peter Geoghegan
Hi, On Sun, Jun 6, 2021, at 10:59, Tom Lane wrote: > Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote: > >> However, I also found an autovacuum chewing 100% CPU, and it appears the > >> problem is actually because autovacuum has locked a page of pg-statistic, and > >> every other process then gets stuck waiting in the planner. I checked a few > >> and found these: > > > My suspicion is that for some tuple on that page > > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it > > thinks should have been cleaned up by heap_page_prune, but isn't. This > > would result in an infinite loop in lazy_scan_prune where the > > condition on vacuumlazy.c:1800 will always be true, but the retry will > > not do the job it's expected to do. > > Since Justin's got a debugger on the process already, it probably > wouldn't be too hard to confirm or disprove that theory by stepping > through the code. If that turns out to be the issue, it'd be good to check what prevents the tuple from being considered fully dead, by steppingthrough the visibility test... Andres
On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I'll leave the instance running for a little bit before restarting (or kill-9) > > in case someone requests more info. > > How about dumping the page image out, and sharing it with the list? > This procedure should work fine from gdb: Sorry, but I already killed the process to try to follow Matthias' suggestion. I have a core file from "gcore" but it looks like it's incomplete and the address is now "out of bounds"... #2 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75, page=page@entry=0x2aaab2089e00<Address 0x2aaab2089e00 out of bounds>, I saved a copy of the datadir, but a manual "vacuum" doesn't trigger the problem. So if Matthias' theory is right, it seems like there may be a race condition. Maybe that goes without saying. -- Justin
On Sun, Jun 6, 2021 at 11:43 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > Sorry, but I already killed the process to try to follow Matthias' suggestion. > I have a core file from "gcore" but it looks like it's incomplete and the > address is now "out of bounds"... Based on what you said about ending up back in lazy_scan_prune() alone, I think he's right. That is, I agree that it's very likely that the stuck VACUUM would not have become stuck had the "goto retry on HEAPTUPLE_DEAD inside lazy_scan_prune" thing not been added by commit 8523492d4e3. But that in itself doesn't necessarily implicate commit 8523492d4e3. The interesting question is: Why doesn't heap_page_prune() ever agree with HeapTupleSatisfiesVacuum() calls made from lazy_scan_prune(), no matter how many times the call to heap_page_prune() is repeated? (It's repeated to try to resolve the disagreement that aborted xacts can sometimes cause.) If I had to guess I'd say that the underlying problem has something to do with heap_prune_satisfies_vacuum() not agreeing with HeapTupleSatisfiesVacuum(), perhaps only when GlobalVisCatalogRels is used. But that's a pretty wild guess at this point. -- Peter Geoghegan
On Sun, Jun 06, 2021 at 07:26:22PM +0200, Matthias van de Meent wrote: > I think it would be helpful for further debugging if we would have the > state of the all tuples on that page (well, the tuple headers with > their transactionids and their line pointers), as that would help with > determining if my suspicion could be correct. This is the state of the page after I killed the cluster and started a postmaster on a copy of its datadir, with autovacuum=off: SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oid FROMheap_page_items(get_raw_page('pg_statistic', 75)) ; lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ----+--------+----------+--------+-----------+-----------+----------+---------+-------------+------------+--------+----------------------------------+------- 1 | 0 | 3 | 0 | | | | | | | | | 2 | 0 | 3 | 0 | | | | | | | | | 3 | 0 | 3 | 0 | | | | | | | | | 4 | 7624 | 1 | 564 | 913726913 | 913730328 | 0 | (83,19) | 31 | 8451 | 32 | 11111111111111111111101000100000| 5 | 6432 | 1 | 1188 | 913726913 | 913730328 | 0 | (83,20) | 31 | 8451 | 32 | 11111111111111111111110100110000| 6 | 6232 | 1 | 195 | 913726913 | 913730328 | 0 | (83,21) | 31 | 8451 | 32 | 11111111111111111111111000100000| 7 | 6032 | 1 | 195 | 913726913 | 913730328 | 0 | (83,22) | 31 | 8451 | 32 | 11111111111111111111111000100000| 8 | 5848 | 1 | 181 | 913726913 | 913730328 | 0 | (83,23) | 31 | 8451 | 32 | 11111111111111111111111000100000| 9 | 5664 | 1 | 181 | 913726913 | 913730328 | 0 | (81,13) | 31 | 8451 | 32 | 11111111111111111111111000100000| 10 | 5488 | 1 | 176 | 913726913 | 913730328 | 0 | (81,14) | 31 | 8451 | 32 | 11111111111111111111111000100000| 11 | 5312 | 1 | 176 | 913726913 | 913730328 | 0 | (82,13) | 31 | 8451 | 32 | 11111111111111111111111000100000| 12 | 5128 | 1 | 181 | 913726913 | 913730328 | 0 | (79,57) | 31 | 8451 | 32 | 11111111111111111111111000100000| 13 | 4952 | 1 | 176 | 913726913 | 913730328 | 0 | (80,25) | 31 | 8451 | 32 | 11111111111111111111111000100000| 14 | 4776 | 1 | 176 | 913726913 | 913730328 | 0 | (80,26) | 31 | 8451 | 32 | 11111111111111111111111000100000| 15 | 4600 | 1 | 176 | 913726913 | 913730328 | 0 | (84,1) | 31 | 8451 | 32 | 11111111111111111111111000100000| 16 | 4424 | 1 | 176 | 913726913 | 913730328 | 0 | (84,2) | 31 | 8451 | 32 | 11111111111111111111111000100000| 17 | 4248 | 1 | 176 | 913726913 | 913730328 | 0 | (84,3) | 31 | 8451 | 32 | 11111111111111111111111000100000| 18 | 2880 | 1 | 1364 | 913726913 | 913730328 | 0 | (84,4) | 31 | 8451 | 32 | 11111111111111111111110100110000| 19 | 2696 | 1 | 179 | 913726914 | 0 | 0 | (75,19) | 31 | 10499 | 32 | 11111111111111111111111000100000| 20 | 2520 | 1 | 176 | 913726914 | 0 | 0 | (75,20) | 31 | 10499 | 32 | 11111111111111111111111000100000| 21 | 2336 | 1 | 179 | 913726914 | 0 | 0 | (75,21) | 31 | 10499 | 32 | 11111111111111111111111000100000| (21 rows) (In the interest of full disclosure, this was a dumb cp -a of the live datadir where the processes had been stuck for a day, and where I was unable to open a client session). 8451 = HEAP_KEYS_UPDATED + 259 atts? Note that after I vacuum pg_statistic, it looks like this: ts=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits, t_oidFROM heap_page_items(get_raw_page('pg_statistic', 75)); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ----+--------+----------+--------+-----------+--------+----------+---------+-------------+------------+--------+----------------------------------+------- 1 | 0 | 0 | 0 | | | | | | | | | ... 18 | 0 | 0 | 0 | | | | | | | | | 19 | 8008 | 1 | 179 | 913726914 | 0 | 0 | (75,19) | 31 | 10499 | 32 | 11111111111111111111111000100000| 20 | 7832 | 1 | 176 | 913726914 | 0 | 0 | (75,20) | 31 | 10499 | 32 | 11111111111111111111111000100000| 21 | 7648 | 1 | 179 | 913726914 | 0 | 0 | (75,21) | 31 | 10499 | 32 | 11111111111111111111111000100000| ts=# VACUUM VERBOSE pg_statistic; |INFO: vacuuming "pg_catalog.pg_statistic" |INFO: scanned index "pg_statistic_relid_att_inh_index" to remove 278403 row versions |DETAIL: CPU: user: 0.10 s, system: 0.00 s, elapsed: 0.14 s |INFO: "pg_statistic": removed 278403 dead item identifiers in 4747 pages |DETAIL: CPU: user: 0.12 s, system: 0.07 s, elapsed: 1.99 s |INFO: index "pg_statistic_relid_att_inh_index" now contains 1101 row versions in 758 pages |DETAIL: 277271 index row versions were removed. |747 index pages were newly deleted. |747 index pages are currently deleted, of which 0 are currently reusable. |CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. |INFO: "pg_statistic": found 277216 removable, 1101 nonremovable row versions in 4758 out of 4758 pages |DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 920282115 |0 pages removed. |Skipped 0 pages due to buffer pins, 0 frozen pages. |CPU: user: 1.75 s, system: 0.18 s, elapsed: 6.52 s. |INFO: "pg_statistic": truncated 4758 to 96 pages |DETAIL: CPU: user: 0.02 s, system: 0.02 s, elapsed: 0.06 s |INFO: vacuuming "pg_toast.pg_toast_2619" |INFO: scanned index "pg_toast_2619_index" to remove 30 row versions |DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s |INFO: "pg_toast_2619": removed 30 dead item identifiers in 11 pages |DETAIL: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s |INFO: index "pg_toast_2619_index" now contains 115 row versions in 2 pages |DETAIL: 3 index row versions were removed. |0 index pages were newly deleted. |0 index pages are currently deleted, of which 0 are currently reusable. |CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s. |INFO: "pg_toast_2619": found 29 removable, 115 nonremovable row versions in 43 out of 43 pages |DETAIL: 0 dead row versions cannot be removed yet, oldest xmin: 920282115 |0 pages removed. |Skipped 0 pages due to buffer pins, 0 frozen pages. |CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.03 s. |VACUUM Before: pg_catalog | pg_statistic | table | postgres | permanent | heap | 38 MB | After: pg_catalog | pg_statistic | table | postgres | permanent | heap | 1192 kB | I also have nearly-intact bt f from the partial core: #0 0x00000000004fa063 in heap_prune_chain (prstate=0x7ffe7a0cd0c0, rootoffnum=4, buffer=14138) at pruneheap.c:592 lp = <optimized out> tupdead = <optimized out> recent_dead = <optimized out> rootlp = 0x2aaab2089e24 nchain = 0 tup = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data = 0x2aaab208bbc8} ndeleted = 0 priorXmax = 0 htup = <optimized out> maxoff = 21 offnum = 4 ... #1 heap_page_prune (relation=relation@entry=0x7f0349466d28, buffer=buffer@entry=14138, vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=<optimized out>, off_loc@entry=0x1d1b3fc) at pruneheap.c:278 itemid = 0x2aaab2089e24 ndeleted = 0 page = 0x2aaab2089e00 <Address 0x2aaab2089e00 out of bounds> offnum = 4 maxoff = 21 ... #2 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x1d1b390, buf=buf@entry=14138, blkno=blkno@entry=75, page=page@entry=0x2aaab2089e00<Address 0x2aaab2089e00 out of bounds>, vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7ffe7a0ced80) at vacuumlazy.c:1712 rel = 0x7f0349466d28 offnum = 4 maxoff = 21 itemid = 0x2aaab2089e24 tuple = {t_len = 564, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 75}, ip_posid = 4}, t_tableOid = 2619, t_data = 0x2aaab208bbc8} res = <optimized out> tuples_deleted = 0 lpdead_items = 0 new_dead_tuples = 0 num_tuples = 0 live_tuples = 0 nfrozen = 0 -- Justin
On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I'll leave the instance running for a little bit before restarting (or kill-9) > > in case someone requests more info. > > How about dumping the page image out, and sharing it with the list? > This procedure should work fine from gdb: > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I > imagine that you have the instance already stuck in an infinite loop, > so what we'll probably see from the page image is the page after the > first prune and another no-progress prune. The cluster was again rejecting with "too many clients already". I was able to open a shell this time, but it immediately froze when I tried to tab complete "pg_stat_acti"... I was able to dump the page image, though - attached. I can send you its "data" privately, if desirable. I'll also try to step through this. postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits,t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page')); lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid ----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+------- 1 | 1320 | 1 | 259 | 926025112 | 0 | 0 | (1,1) | 32799 | 10499 | 32 | 11111111111111111111111000100000| 2 | 8088 | 1 | 104 | 926018702 | 0 | 0 | (1,2) | 32799 | 10497 | 32 | 11111111111111111111100000000000| 3 | 0 | 0 | 0 | | | | | | | | | 4 | 7904 | 1 | 179 | 926018702 | 0 | 0 | (1,4) | 32799 | 10499 | 32 | 11111111111111111111111000100000| 5 | 7728 | 1 | 176 | 926018702 | 0 | 0 | (1,5) | 32799 | 10499 | 32 | 11111111111111111111111000100000| 6 | 7464 | 1 | 259 | 926014884 | 926025112 | 0 | (1,1) | 49183 | 9475 | 32 | 11111111111111111111111000100000| 7 | 2 | 2 | 0 | | | | | | | | | 8 | 4 | 2 | 0 | | | | | | | | | 9 | 19 | 2 | 0 | | | | | | | | | 10 | 0 | 0 | 0 | | | | | | | | | 11 | 20 | 2 | 0 | | | | | | | | | 12 | 5792 | 1 | 1666 | 926014887 | 0 | 0 | (1,12) | 31 | 10499 | 32 | 11111111111111111111101000100000| 13 | 5 | 2 | 0 | | | | | | | | | 14 | 3912 | 1 | 1880 | 925994211 | 0 | 0 | (1,14) | 31 | 10499 | 32 | 11111111111111111111110100110000| 15 | 0 | 3 | 0 | | | | | | | | | 16 | 18 | 2 | 0 | | | | | | | | | 17 | 0 | 3 | 0 | | | | | | | | | 18 | 1936 | 1 | 1976 | 926013259 | 0 | 0 | (1,18) | 32799 | 10499 | 32 | 11111111111111111111110100110000| 19 | 1760 | 1 | 176 | 926014887 | 0 | 0 | (1,19) | 32799 | 10499 | 32 | 11111111111111111111111000100000| 20 | 1584 | 1 | 176 | 926014889 | 0 | 0 | (1,20) | 32799 | 10499 | 32 | 11111111111111111111111000100000| 21 | 6 | 2 | 0 | | | | | | | | | 22 | 0 | 3 | 0 | | | | | | | | | 23 | 0 | 3 | 0 | | | | | | | | | 24 | 0 | 3 | 0 | | | | | | | | | 25 | 0 | 3 | 0 | | | | | | | | | 26 | 0 | 3 | 0 | | | | | | | | | 27 | 0 | 3 | 0 | | | | | | | | | 28 | 0 | 3 | 0 | | | | | | | | | (28 rows) No great surprise that it's again in pg_statistic. #0 GetPrivateRefCountEntry (buffer=buffer@entry=411, do_move=do_move@entry=false) at bufmgr.c:313 #1 0x00000000007ecb4f in GetPrivateRefCount (buffer=411) at bufmgr.c:398 #2 BufferGetBlockNumber (buffer=buffer@entry=411) at bufmgr.c:2762 #3 0x00000000004fa0f3 in heap_prune_chain (prstate=0x7fff7e4a9180, rootoffnum=7, buffer=411) at pruneheap.c:625 #4 heap_page_prune (relation=relation@entry=0x7fe636faed28, buffer=buffer@entry=411, vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>,old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=<optimized out>, off_loc@entry=0x12b433c) at pruneheap.c:278 #5 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1, page=page@entry=0x2aaaab54be00"J\f", vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7fff7e4aae40) at vacuumlazy.c:1712 #6 0x0000000000500263 in lazy_scan_heap (aggressive=<optimized out>, params=0x12ce89c, vacrel=<optimized out>) at vacuumlazy.c:1347 #7 heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612 #8 0x000000000067418a in table_relation_vacuum (bstrategy=<optimized out>, params=0x12ce89c, rel=0x7fe636faed28) at ../../../src/include/access/tableam.h:1678 #9 vacuum_rel (relid=2619, relation=<optimized out>, params=params@entry=0x12ce89c) at vacuum.c:2001 -- Justin
Attachment
On Sun, Jun 06, 2021 at 01:59:10PM -0400, Tom Lane wrote: > Matthias van de Meent <boekewurm+postgres@gmail.com> writes: > > On Sun, 6 Jun 2021 at 18:35, Justin Pryzby <pryzby@telsasoft.com> wrote: > >> However, I also found an autovacuum chewing 100% CPU, and it appears the > >> problem is actually because autovacuum has locked a page of pg-statistic, and > >> every other process then gets stuck waiting in the planner. I checked a few > >> and found these: > > > My suspicion is that for some tuple on that page > > HeapTupleSatisfiesVacuum() returns HEAPTUPLE_DEAD for a tuple that it > > thinks should have been cleaned up by heap_page_prune, but isn't. This > > would result in an infinite loop in lazy_scan_prune where the > > condition on vacuumlazy.c:1800 will always be true, but the retry will > > not do the job it's expected to do. > > Since Justin's got a debugger on the process already, it probably > wouldn't be too hard to confirm or disprove that theory by stepping > through the code. Breakpoint 2, HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fff7e4a9ca0, OldestXmin=926025113, buffer=buffer@entry=411) atheapam_visibility.c:1163 1163 heapam_visibility.c: No such file or directory. (gdb) fin Run till exit from #0 HeapTupleSatisfiesVacuum (htup=htup@entry=0x7fff7e4a9ca0, OldestXmin=926025113, buffer=buffer@entry=411)at heapam_visibility.c:1163 lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1, page=page@entry=0x2aaaab54be00 "J\f",vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7fff7e4aae40) at vacuumlazy.c:1790 1790 vacuumlazy.c: No such file or directory. Value returned is $23 = HEAPTUPLE_DEAD offnum = 6 tuple = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data = 0x2aaaab54db28} res = HEAPTUPLE_DEAD (gdb) p *itemid $51 = {lp_off = 7464, lp_flags = 1, lp_len = 259} (gdb) p *tuple->t_data $54 = {t_choice = {t_heap = {t_xmin = 926014884, t_xmax = 926025112, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_= 926014884, datum_typmod = 926025112, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 1}, t_infomask2 = 49183, t_infomask = 9475, t_hoff = 32 ' ', t_bits = 0x2aaaab54db3f "\377\377\177\004"} lp_flags = LP_NORMAL 1 ?? t_infomask2 = HEAP_ONLY_TUPLE | HEAP_HOT_UPDATED + 31 atts infomask = HEAP_UPDATED | HEAP_XMAX_COMMITTED | HEAP_XMIN_COMMITTED | HEAP_HASVARWIDTH | HEAP_HASNULL -- Justin
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > I'll leave the instance running for a little bit before restarting (or kill-9) > > > in case someone requests more info. > > > > How about dumping the page image out, and sharing it with the list? > > This procedure should work fine from gdb: > > > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB > > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I > > imagine that you have the instance already stuck in an infinite loop, > > so what we'll probably see from the page image is the page after the > > first prune and another no-progress prune. > > The cluster was again rejecting with "too many clients already". > > I was able to open a shell this time, but it immediately froze when I tried to > tab complete "pg_stat_acti"... > > I was able to dump the page image, though - attached. I can send you its > "data" privately, if desirable. I'll also try to step through this. Could you attach a dump of lazy_scan_prune's vacrel, all the global visibility states (GlobalVisCatalogRels, and possibly GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels), and heap_page_prune's PruneState? Additionally, the locals of lazy_scan_prune (more specifically, the 'offnum' when it enters heap_page_prune) would also be appreciated, as it helps indicate the tuple. I've been looking at whatever might have done this, and I'm currently stuck on lacking information in GlobalVisCatalogRels and the PruneState. One curiosity that I did notice is that the t_xmax of the problematic tuples has been exactly one lower than the OldestXmin. Not weird, but a curiosity. With regards, Matthias van de Meent. PS. Attached a few of my current research notes, which are mainly comparisons between heap_prune_satisfies_vacuum and HeapTupleSatisfiesVacuum.
Attachment
On Tue, Jun 08, 2021 at 01:54:41PM +0200, Matthias van de Meent wrote: > On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > I'll leave the instance running for a little bit before restarting (or kill-9) > > > > in case someone requests more info. > > > > > > How about dumping the page image out, and sharing it with the list? > > > This procedure should work fine from gdb: > > > > > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB > > > > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I > > > imagine that you have the instance already stuck in an infinite loop, > > > so what we'll probably see from the page image is the page after the > > > first prune and another no-progress prune. > > > > The cluster was again rejecting with "too many clients already". > > > > I was able to open a shell this time, but it immediately froze when I tried to > > tab complete "pg_stat_acti"... > > > > I was able to dump the page image, though - attached. I can send you its > > "data" privately, if desirable. I'll also try to step through this. > > Could you attach a dump of lazy_scan_prune's vacrel, all the global > visibility states (GlobalVisCatalogRels, and possibly > GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels), and > heap_page_prune's PruneState? (gdb) p *vacrel $56 = {rel = 0x7fe636faed28, indrels = 0x12b4440, nindexes = 1, do_index_vacuuming = true, do_index_cleanup = true, do_failsafe= false, bstrategy = 0x1210340, lps = 0x0, old_rel_pages = 81, old_live_tuples = 1100, relfrozenxid = 921613998,relminmxid = 53878631, OldestXmin = 926025113, FreezeLimit = 876025113, MultiXactCutoff = 49085856, relnamespace= 0x12b4460 "pg_catalog", relname = 0x12b4488 "pg_statistic", indname = 0x0, blkno = 1, offnum = 6, phase = VACUUM_ERRCB_PHASE_SCAN_HEAP,dead_tuples = 0x127a980, rel_pages = 81, scanned_pages = 2, pinskipped_pages = 0, frozenskipped_pages= 0, tupcount_pages = 2, pages_removed = 0, lpdead_item_pages = 1, nonempty_pages = 1, lock_waiter_detected= false, new_rel_tuples = 0, new_live_tuples = 0, indstats = 0x12b4568, num_index_scans = 0, tuples_deleted= 0, lpdead_items = 3, new_dead_tuples = 0, num_tuples = 14, live_tuples = 14} (gdb) p GlobalVisCatalogRels $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} (gdb) p GlobalVisSharedRels $58 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} (gdb) p GlobalVisDataRels $59 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025113}} (gdb) p GlobalVisTempRels $60 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025113}} I don't know when you want prstate from, but here it is at some point: (gdb) p *prstate $77 = {rel = 0x7fe636faed28, vistest = 0xe7bcc0 <GlobalVisCatalogRels>, old_snap_ts = 0, old_snap_xmin = 0, old_snap_used= false, new_prune_xid = 0, latestRemovedXid = 0, nredirected = 0, ndead = 0, nunused = 0, > Additionally, the locals of lazy_scan_prune (more specifically, the > 'offnum' when it enters heap_page_prune) would also be appreciated, as > it helps indicate the tuple. Breakpoint 1, heap_page_prune (relation=relation@entry=0x7fe636faed28, buffer=buffer@entry=411, vistest=vistest@entry=0xe7bcc0<GlobalVisCatalogRels>, old_snap_xmin=old_snap_xmin@entry=0, old_snap_ts=old_snap_ts@entry=0, report_stats=report_stats@entry=false, off_loc=off_loc@entry=0x12b433c) at pruneheap.c:225 225 pruneheap.c: No such file or directory. (gdb) up #1 0x00000000004fd9bf in lazy_scan_prune (vacrel=vacrel@entry=0x12b42d0, buf=buf@entry=411, blkno=blkno@entry=1, page=page@entry=0x2aaaab54be00"J\f", vistest=vistest@entry=0xe7bcc0 <GlobalVisCatalogRels>, prunestate=prunestate@entry=0x7fff7e4aae40) at vacuumlazy.c:1712 1712 vacuumlazy.c: No such file or directory. (gdb) info locals rel = 0x7fe636faed28 offnum = 6 maxoff = 28 itemid = 0x2aaaab54be2c tuple = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data = 0x2aaaab54db28} res = <optimized out> tuples_deleted = 0 lpdead_items = 0 new_dead_tuples = 0 num_tuples = 0 live_tuples = 0 nfrozen = 0 Maybe you need to know that this is also returning RECENTLY_DEAD. Breakpoint 4, heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7fff7e4a9180, tup=tup@entry=0x7fff7e4a8f10, buffer=buffer@entry=411)at pruneheap.c:423 423 in pruneheap.c (gdb) Run till exit from #0 heap_prune_satisfies_vacuum (prstate=prstate@entry=0x7fff7e4a9180, tup=tup@entry=0x7fff7e4a8f10, buffer=buffer@entry=411)at pruneheap.c:423 0x00000000004fa887 in heap_prune_chain (prstate=0x7fff7e4a9180, rootoffnum=6, buffer=411) at pruneheap.c:560 560 in pruneheap.c Value returned is $72 = HEAPTUPLE_RECENTLY_DEAD tup = {t_len = 259, t_self = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid = 6}, t_tableOid = 2619, t_data = 0x2aaaab54db28} (gdb) p * htup $82 = {t_choice = {t_heap = {t_xmin = 926014884, t_xmax = 926025112, t_field3 = {t_cid = 0, t_xvac = 0}}, t_datum = {datum_len_= 926014884, datum_typmod = 926025112, datum_typeid = 0}}, t_ctid = {ip_blkid = {bi_hi = 0, bi_lo = 1}, ip_posid= 1}, t_infomask2 = 49183, t_infomask = 9475, t_hoff = 32 ' ', t_bits = 0x2aaaab54db3f "\377\377\177\004"} -- Justin System Administrator Telsasoft +1-952-707-8581
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Tue, 8 Jun 2021 at 14:11, Justin Pryzby <pryzby@telsasoft.com> wrote: > > On Tue, Jun 08, 2021 at 01:54:41PM +0200, Matthias van de Meent wrote: > > On Tue, 8 Jun 2021 at 13:03, Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > > On Sun, Jun 06, 2021 at 11:00:38AM -0700, Peter Geoghegan wrote: > > > > On Sun, Jun 6, 2021 at 9:35 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > > > I'll leave the instance running for a little bit before restarting (or kill-9) > > > > > in case someone requests more info. > > > > > > > > How about dumping the page image out, and sharing it with the list? > > > > This procedure should work fine from gdb: > > > > > > > > https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Dumping_a_page_image_from_within_GDB > > > > > > > I suggest that you dump the "page" pointer inside lazy_scan_prune(). I > > > > imagine that you have the instance already stuck in an infinite loop, > > > > so what we'll probably see from the page image is the page after the > > > > first prune and another no-progress prune. > > > > > > The cluster was again rejecting with "too many clients already". > > > > > > I was able to open a shell this time, but it immediately froze when I tried to > > > tab complete "pg_stat_acti"... > > > > > > I was able to dump the page image, though - attached. I can send you its > > > "data" privately, if desirable. I'll also try to step through this. > > > > Could you attach a dump of lazy_scan_prune's vacrel, all the global > > visibility states (GlobalVisCatalogRels, and possibly > > GlobalVisSharedRels, GlobalVisDataRels, and GlobalVisTempRels), and > > heap_page_prune's PruneState? > > (gdb) p *vacrel > $56 = {... OldestXmin = 926025113, ...} > > (gdb) p GlobalVisCatalogRels > $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} This maybe_needed is older than the OldestXmin, which indeed gives us this problematic behaviour: heap_prune_satisfies_vacuum considers 1 more transaction to be unvacuumable, and thus indeed won't vacuum that tuple that HeapTupleSatisfiesVacuum does want to be vacuumed. The new open question is now: Why is GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC GLobalVisCatalogRels->maybe_needed is constructed from the same ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is returned to be used in vacrel->OldestXmin. > Maybe you need to know that this is also returning RECENTLY_DEAD. I had expected that, but good to have confirmation. Thanks for the information! With regards, Matthias van de Meent.
On Tue, Jun 08, 2021 at 02:27:14PM +0200, Matthias van de Meent wrote: > Thanks for the information! I created an apparently-complete core file by first doing this: | echo 127 |sudo tee /proc/22591/coredump_filter *and updated wiki:Developer_FAQ to work with huge pages I'm planning to kill the process shortly if nobody asks for anything else.
On 2021-Jun-06, Justin Pryzby wrote: > However, I also found an autovacuum chewing 100% CPU, and it appears the > problem is actually because autovacuum has locked a page of pg-statistic, and > every other process then gets stuck waiting in the planner. I checked a few > and found these: > [...] Hmm ... I wonder if this could be related to commits d9d076222f5b, c98763bf51bf, etc. I don't have any connecting thoughts other than the tuple visibility code being involved. Do you see any procs with the PROC_IN_SAFE_IC flag set? -- Álvaro Herrera 39°49'30"S 73°17'W
Hi, On 2021-06-08 14:27:14 +0200, Matthias van de Meent wrote: > heap_prune_satisfies_vacuum considers 1 more transaction to be > unvacuumable, and thus indeed won't vacuum that tuple that > HeapTupleSatisfiesVacuum does want to be vacuumed. > > The new open question is now: Why is > GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC > GLobalVisCatalogRels->maybe_needed is constructed from the same > ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is > returned to be used in vacrel->OldestXmin. The horizon used by pruning is only updated once per transaction (well, approximately). What presumably is happening is that the retry loop is retrying, without updating the horizon, therefore the same thing is happening over and over again? Greetings, Andres Freund
On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote: > On 2021-Jun-06, Justin Pryzby wrote: > > > However, I also found an autovacuum chewing 100% CPU, and it appears the > > problem is actually because autovacuum has locked a page of pg-statistic, and > > every other process then gets stuck waiting in the planner. I checked a few > > and found these: > > Hmm ... I wonder if this could be related to commits d9d076222f5b, > c98763bf51bf, etc. I don't have any connecting thoughts other than the > tuple visibility code being involved. Do you see any procs with the > PROC_IN_SAFE_IC flag set? Can you give me a hint how to do that from a corefile ? I got this far: (gdb) p MyProc->links $13 = {prev = 0x0, next = 0x0} I do have a reindex script which runs CIC, and it does looks suspicious. <<Mon Jun 7 22:00:54 CDT 2021: starting db=ts ... Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription_rel: pg_subscription_rel_srrelid_srsubid_index(reindex system)... Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_oid_index(reindex system)... Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_subscription_subname_index(reindex system)... Mon Jun 7 22:01:16 CDT 2021: ts: pg_subscription: pg_toast.pg_toast_6100_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_statistic_ext_data_stxoid_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext_data: pg_toast.pg_toast_3429_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_name_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_oid_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_statistic_ext_relid_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic_ext: pg_toast.pg_toast_3381_index(reindex system)... Mon Jun 7 22:01:17 CDT 2021: ts: pg_statistic: pg_statistic_relid_att_inh_index(reindex system)... Mon Jun 7 22:02:56 CDT 2021: ts: pg_statistic: pg_toast.pg_toast_2619_index(reindex system)... Mon Jun 7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)... Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)... Tue Jun 8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)... ... Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)... Tue Jun 8 00:02:57 CDT 2021: ts: pg_shseclabel: pg_shseclabel_object_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shseclabel: pg_toast.pg_toast_3592_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_shdescription_o_c_index(reindex system)... Tue Jun 8 00:02:58 CDT 2021: ts: pg_shdescription: pg_toast.pg_toast_2396_index(reindex system)... ... Tue Jun 8 01:21:20 CDT 2021: ts: pg_aggregate: pg_aggregate_fnoid_index(reindex system)... Tue Jun 8 01:21:20 CDT 2021: ts: pg_aggregate: pg_toast.pg_toast_2600_index(reindex system)... Tue Jun 8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew Tue Jun 8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew Tue Jun 8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)... Tue Jun 8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew Tue Jun 8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)... ERROR: canceling statement due to statement timeout reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew => It's strange that these timed out, since the statio tables are less than 15MB, and permissions and patchfiles are both under 100kB. And it seems to say that it time out after less than 1sec. They're running this: | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i" And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad" I've killed it a little bit ago, but I should've saved the start time of the autovacuum. I found this: #5 heap_vacuum_rel (rel=0x7fe636faed28, params=0x12ce89c, bstrategy=<optimized out>) at vacuumlazy.c:612 starttime = 676436464463888 ru0 = {tv = {tv_sec = 1623121264, tv_usec = 463887}, ru = {ru_utime = {tv_sec = 0, tv_usec = 77418}, ru_stime = {tv_sec =0, tv_usec = 13866}, {ru_maxrss = 7440, __ru_maxrss_word = 7440}, {ru_ixrss = 0, __ru_ixrss_word = 0}, { ru_idrss = 0, __ru_idrss_word = 0}, {ru_isrss = 0, __ru_isrss_word = 0}, {ru_minflt = 1826, __ru_minflt_word = 1826},{ru_majflt = 1, __ru_majflt_word = 1}, {ru_nswap = 0, __ru_nswap_word = 0}, {ru_inblock = 2008, __ru_inblock_word = 2008}, {ru_oublock = 192, __ru_oublock_word = 192}, {ru_msgsnd = 0, __ru_msgsnd_word = 0}, {ru_msgrcv= 0, __ru_msgrcv_word = 0}, {ru_nsignals = 0, __ru_nsignals_word = 0}, {ru_nvcsw = 29, __ru_nvcsw_word = 29}, {ru_nivcsw = 9, __ru_nivcsw_word = 9}}} $ date -d @1623121264 Mon Jun 7 22:01:04 CDT 2021 $ date -d '2000-01-01 UTC + 676436464seconds' Mon Jun 7 22:01:04 CDT 2021 -- Justin
On 2021-Jun-08, Justin Pryzby wrote: > On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote: > > On 2021-Jun-06, Justin Pryzby wrote: > > > > > However, I also found an autovacuum chewing 100% CPU, and it appears the > > > problem is actually because autovacuum has locked a page of pg-statistic, and > > > every other process then gets stuck waiting in the planner. I checked a few > > > and found these: > > > > Hmm ... I wonder if this could be related to commits d9d076222f5b, > > c98763bf51bf, etc. I don't have any connecting thoughts other than the > > tuple visibility code being involved. Do you see any procs with the > > PROC_IN_SAFE_IC flag set? > > Can you give me a hint how to do that from a corefile ? (gdb) set $i=0 (gdb) set $total = ProcGlobal->allProcCount (gdb) while($i<$total) >print ProcGlobal->allProcs[$i++]->statusFlags >end -- Álvaro Herrera Valdivia, Chile "In fact, the basic problem with Perl 5's subroutines is that they're not crufty enough, so the cruft leaks out into user-defined code instead, by the Conservation of Cruft Principle." (Larry Wall, Apocalypse 6)
Reminds me of the other bug that you also reported about a year ago, Justin - which was never fixed. The one with the duplicate tids from a cic (see pg 14 open item).
Is this essentially the same environment as the one that led to your other bug report?
Peter Geoghegan
(Sent from my phone)
(Sent from my phone)
On Tue, Jun 08, 2021 at 02:01:51PM -0400, Alvaro Herrera wrote: > On 2021-Jun-08, Justin Pryzby wrote: > > > On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote: > > > On 2021-Jun-06, Justin Pryzby wrote: > > > > > > > However, I also found an autovacuum chewing 100% CPU, and it appears the > > > > problem is actually because autovacuum has locked a page of pg-statistic, and > > > > every other process then gets stuck waiting in the planner. I checked a few > > > > and found these: > > > > > > Hmm ... I wonder if this could be related to commits d9d076222f5b, > > > c98763bf51bf, etc. I don't have any connecting thoughts other than the > > > tuple visibility code being involved. Do you see any procs with the > > > PROC_IN_SAFE_IC flag set? > > > > Can you give me a hint how to do that from a corefile ? > > (gdb) set $i=0 > (gdb) set $total = ProcGlobal->allProcCount > (gdb) while($i<$total) > >print ProcGlobal->allProcs[$i++]->statusFlags > >end They're all zero except for: $201 = 1 '\001' $202 = 3 '\003' $203 = 1 '\001' src/include/storage/proc.h-#define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */ src/include/storage/proc.h-#define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */ src/include/storage/proc.h:#define PROC_IN_SAFE_IC 0x04 /* currently running CREATE INDEX -- Justin
On Tue, Jun 08, 2021 at 12:34:04PM -0500, Justin Pryzby wrote: > On Tue, Jun 08, 2021 at 12:06:02PM -0400, Alvaro Herrera wrote: > > On 2021-Jun-06, Justin Pryzby wrote: > > > > > However, I also found an autovacuum chewing 100% CPU, and it appears the > > > problem is actually because autovacuum has locked a page of pg-statistic, and > > > every other process then gets stuck waiting in the planner. I checked a few > > > and found these: > > > > Hmm ... I wonder if this could be related to commits d9d076222f5b, > > c98763bf51bf, etc. I don't have any connecting thoughts other than the > > tuple visibility code being involved. Do you see any procs with the > > PROC_IN_SAFE_IC flag set? > > I do have a reindex script which runs CIC, and it does looks suspicious. > > <<Mon Jun 7 22:00:54 CDT 2021: starting db=ts > ... > Mon Jun 7 22:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_statio_all_tables_snap_t_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: pg_statio_all_tables_snap_t_idx_ccnew > Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_tables_snap: pg_toast.pg_toast_33011_index(reindex system)... > Mon Jun 7 23:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_statio_all_indexes_snap_t_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: pg_statio_all_indexes_snap_t_idx_ccnew > Tue Jun 8 00:02:57 CDT 2021: ts: pg_statio_all_indexes_snap: pg_toast.pg_toast_33020_index(reindex system)... > Tue Jun 8 01:21:20 CDT 2021: ts: permissions: perm_group_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: perm_group_idx_ccnew > Tue Jun 8 02:21:20 CDT 2021: ts: permissions: perm_user_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: perm_user_idx_ccnew > Tue Jun 8 03:21:20 CDT 2021: ts: permissions: pg_toast.pg_toast_33577_index(reindex system)... > Tue Jun 8 03:21:20 CDT 2021: ts: patchfiles: patchfiles_filename_idx(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: patchfiles_filename_idx_ccnew > Tue Jun 8 04:21:21 CDT 2021: ts: patchfiles: patchfiles_pkey(reindex non-partitioned)... > ERROR: canceling statement due to statement timeout > reindex: warning, dropping invalid/unswapped index: patchfiles_pkey_ccnew > > => It's strange that these timed out, since the statio tables are less than > 15MB, and permissions and patchfiles are both under 100kB. > > And it seems to say that it time out after less than 1sec. Oops, no: it timed out after 3600sec, as requested. > They're running this: > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i" > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad" ... > $ date -d @1623121264 > Mon Jun 7 22:01:04 CDT 2021 Which is probably because the reindex was waiting for the vacuum process to finish (or maybe waiting on the page that vacuum had locked?). -- Justin
On 2021-Jun-08, Justin Pryzby wrote: > They're all zero except for: > > $201 = 1 '\001' > $202 = 3 '\003' > $203 = 1 '\001' > > src/include/storage/proc.h-#define PROC_IS_AUTOVACUUM 0x01 /* is it an autovac worker? */ > src/include/storage/proc.h-#define PROC_IN_VACUUM 0x02 /* currently running lazy vacuum */ > src/include/storage/proc.h:#define PROC_IN_SAFE_IC 0x04 /* currently running CREATE INDEX Ah okay, not related then. Thanks for checking. -- Álvaro Herrera 39°49'30"S 73°17'W
On Tue, Jun 8, 2021 at 12:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > They're running this: > > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i" > > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad" > ... > > $ date -d @1623121264 > > Mon Jun 7 22:01:04 CDT 2021 Perhaps reindex was waiting on the VACUUM process to finish, while VACUUM was (in effect) busy waiting on the REINDEX to finish. If the bug is hard to reproduce then it might just be that the circumstances that lead to livelock require that things line up exactly and the heap page + XID level -- which I'd expect to be tricky to reproduce. As I said upthread, I'm almost certain that the "goto retry" added by commit 8523492d is a factor here -- that is what I mean by busy waiting inside VACUUM. It's possible that busy waiting like this happens much more often than an actual undetected deadlock/livelock. We only expect to "goto retry" in the event of a concurrently aborting transaction. The other bug that you reported back in July of last year [1] (which involved a "REINDEX INDEX pg_class_tblspc_relfilenode_index") was pretty easy to recreate, just by running the REINDEX in a tight loop. Could you describe how tricky it is to repro this issue now? If you instrument the "goto retry" code added to lazy_scan_prune() by commit 8523492d, then you might notice that it is hit in contexts that it was never intended to work with. If you can reduce reproducing the problem to reproducing hitting that goto in the absence of an aborted transaction, then it might be a lot easier to produce a simple repro. The livelock/deadlock is probably nothing more than the worst consequence of the same issue, and so may not need to be reproduced directly to fix the issue. [1] https://www.postgresql.org/message-id/CAH2-WzkjjCoq5Y4LeeHJcjYJVxGm3M3SAWZ0%3D6J8K1FPSC9K0w%40mail.gmail.com -- Peter Geoghegan
On Tue, Jun 08, 2021 at 01:52:40PM -0700, Peter Geoghegan wrote: > On Tue, Jun 8, 2021 at 12:27 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > > They're running this: > > > | PGOPTIONS="--deadlock_timeout=333ms -cstatement-timeout=3600s" psql -c "REINDEX INDEX CONCURRENTLY $i" > > > And if it times out, it then runs: $PSQL "DROP INDEX CONCURRENTLY $bad" > > ... > > > $ date -d @1623121264 > > > Mon Jun 7 22:01:04 CDT 2021 > > Perhaps reindex was waiting on the VACUUM process to finish, while > VACUUM was (in effect) busy waiting on the REINDEX to finish. But when the reindex exited, the vacuum kept spinning until I sent SIGABRT 12 hours later. > The other bug that you reported back in July of last year [1] (which > involved a "REINDEX INDEX pg_class_tblspc_relfilenode_index") was > pretty easy to recreate, just by running the REINDEX in a tight loop. > Could you describe how tricky it is to repro this issue now? I didn't try to reproduce it, but now hit it twice in 3 days. (Actuallly, I did try to reproduce it, by running tight loops around vacuum/analyze pg_statistic, which didn't work. Maybe because reindex is what's important.) I mentioned that we've been running pg14b1 since 2021-05-20. So it ran fine for 13 days before breaking in an obvious way. OH - in the first instance, I recorded the stuck process, but not its timestamp. It looks like that autovacuum process *also* started right after 10pm, which is when the reindex job starts. So it seems like REINDEX may trigger this pretty consistently: (gdb) frame 4 #4 heap_vacuum_rel (rel=0x7f0349466d28, params=0x1c77b7c, bstrategy=<optimized out>) at vacuumlazy.c:612 612 vacuumlazy.c: No such file or directory. (gdb) info locals starttime = 676177375524485 $ date -d '2000-01-01 UTC + 676177375seconds' Fri Jun 4 22:02:55 CDT 2021 > If you instrument the "goto retry" code added to lazy_scan_prune() by > commit 8523492d, then you might notice that it is hit in contexts that > it was never intended to work with. If you can reduce reproducing the > problem to reproducing hitting that goto in the absence of an aborted > transaction, then it might be a lot easier to produce a simple repro. I'm not sure what you're suggesting ? Maybe I should add some NOTICES there. I'm not sure why/if pg_statistic is special, but I guess when analyze happens, it gets updated, and eventually processed by autovacuum. The main table here is a partitioned table which receives UPDATEs which moves tuples into a different partition (probably more often than what's recommended). autovacuum_analyze_threshold | 2 autovacuum_analyze_scale_factor | 0.005 autovacuum_vacuum_scale_factor | 0.005 log_autovacuum_min_duration | 9000 checkpoint_timeout | 60 wal_level | minimal In pg14, the parent table is auto-analyzed. -- Justin
On Tue, Jun 8, 2021 at 2:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I'm not sure what you're suggesting ? Maybe I should add some NOTICES there. Here is one approach that might work: Can you check if the assertion added by the attached patch fails very quickly with your test case? This does nothing more than trigger an assertion failure in the event of retrying a second time for any given heap page. Theoretically that could happen without there being any bug -- in principle we might have to retry several times for the same page. In practice the chances of it happening even once are vanishingly low, though -- so two times strongly signals a bug. It was quite hard to hit the "goto restart" even once during my testing. There is still no test coverage for the line of code because it's so hard to hit. If you find that the assertion is hit pretty quickly with the same workload then you've all but reproduced the issue, probably in far less time. And, if you know that there were no concurrently aborting transactions then you can be 100% sure that you have reproduced the issue -- this goto is only supposed to be executed when a transaction that was in progress during the heap_page_prune() aborts after it returns, but before we call HeapTupleSatisfiesVacuum() for one of the aborted-xact tuples. It's supposed to be a super narrow thing. > I'm not sure why/if pg_statistic is special, but I guess when analyze happens, > it gets updated, and eventually processed by autovacuum. pg_statistic is probably special, though only in a superficial way: it is the system catalog that tends to be the most frequently vacuumed in practice. > In pg14, the parent table is auto-analyzed. I wouldn't expect that to matter. The "ANALYZE portion" of the VACUUM ANALYZE won't have started at the point that we get stuck. -- Peter Geoghegan
Attachment
On Tue, Jun 8, 2021 at 4:03 AM Justin Pryzby <pryzby@telsasoft.com> wrote: > postgres=# SELECT lp, lp_off, lp_flags, lp_len, t_xmin, t_xmax, t_field3, t_ctid, t_infomask2, t_infomask, t_hoff, t_bits,t_oid FROM heap_page_items(pg_read_binary_file('/tmp/dump_block.page')); > lp | lp_off | lp_flags | lp_len | t_xmin | t_xmax | t_field3 | t_ctid | t_infomask2 | t_infomask | t_hoff | t_bits | t_oid > ----+--------+----------+--------+-----------+-----------+----------+--------+-------------+------------+--------+----------------------------------+------- > 1 | 1320 | 1 | 259 | 926025112 | 0 | 0 | (1,1) | 32799 | 10499 | 32 | 11111111111111111111111000100000| *** SNIP *** > 6 | 7464 | 1 | 259 | 926014884 | 926025112 | 0 | (1,1) | 49183 | 9475 | 32 | 11111111111111111111111000100000| As I understand it from your remarks + gdb output from earlier [1], the tuple at offset number 6 is the tuple that triggers the suspicious "goto restart" here. There was a regular UPDATE (not a HOT UPDATE) that produced a successor version on the same heap page -- which is lp 1. Here are the t_infomask details for both tuples: lp 6: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_COMMITTED|HEAP_UPDATED <-- points to (1,1) lp 1: HEAP_HASNULL|HEAP_HASVARWIDTH|HEAP_XMIN_COMMITTED|HEAP_XMAX_INVALID|HEAP_UPDATED <-- This is (1,1) So if lp 1's xmin and lp 6's xmax XID/Xact committed (i.e., if XID 926025112 committed), why shouldn't HeapTupleSatisfiesVacuum() think that lp 6 is DEAD (and not RECENTLY_DEAD)? You also say that vacuumlazy.c's OldestXmin is 926025113, so it is hard to fault HTSV here. The only way it could be wrong is if the hint bits were somehow spuriously set, which seems unlikely to me. [1] https://postgr.es/m/20210608113333.GC16435@telsasoft.com -- Peter Geoghegan
On Tue, Jun 8, 2021 at 5:27 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > (gdb) p *vacrel > > $56 = {... OldestXmin = 926025113, ...} > > > > (gdb) p GlobalVisCatalogRels > > $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} > > This maybe_needed is older than the OldestXmin, which indeed gives us > this problematic behaviour: Good catch. > heap_prune_satisfies_vacuum considers 1 more transaction to be > unvacuumable, and thus indeed won't vacuum that tuple that > HeapTupleSatisfiesVacuum does want to be vacuumed. Following up from my email from an hour ago here. Since I have no reason to suspect HeapTupleSatisfiesVacuum (per the earlier analysis), this is very much starting to look like a heap_prune_satisfies_vacuum() problem. And therefore likely a problem in the snapshot scalability work. Note that GlobalVisCatalogRels.maybe_needed is 926025112, which doesn't match OldestXmin in VACUUM (that's 926025113). Though both GlobalVisDataRels.definitely_needed and GlobalVisDataRels.maybe_needed *are* 926025113, and therefore agree with VACUUM's OldestXmin. But this is pg_statistic we're vacuuming, and so GlobalVisCatalogRels is what matters. > The new open question is now: Why is > GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC > GLobalVisCatalogRels->maybe_needed is constructed from the same > ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is > returned to be used in vacrel->OldestXmin. Exactly. -- Peter Geoghegan
Peter Geoghegan <pg@bowt.ie> writes: > On Tue, Jun 8, 2021 at 5:27 AM Matthias van de Meent >>> (gdb) p GlobalVisCatalogRels >>> $57 = {definitely_needed = {value = 926025113}, maybe_needed = {value = 926025112}} >> This maybe_needed is older than the OldestXmin, which indeed gives us >> this problematic behaviour: > Good catch. I wonder if this is a variant of the problem shown at https://www.postgresql.org/message-id/2591376.1621196582%40sss.pgh.pa.us where maybe_needed was visibly quite insane. This value is less visibly insane, but it's still wrong. It might be interesting to try running this test case with the extra assertions I proposed there, to try to narrow down where it's going off the rails. regards, tom lane
On Tue, Jun 08, 2021 at 02:38:37PM -0700, Peter Geoghegan wrote: > On Tue, Jun 8, 2021 at 2:23 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > > I'm not sure what you're suggesting ? Maybe I should add some NOTICES there. > > Here is one approach that might work: Can you check if the assertion > added by the attached patch fails very quickly with your test case? I reproduced the issue on a new/fresh cluster like this: ./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c log_autovacuum_min_duration=-1 psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON t(i);" time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done& time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done& TRAP: FailedAssertion("restarts == 0", File: "vacuumlazy.c", Line: 1803, PID: 10367) postgres: autovacuum worker postgres(ExceptionalCondition+0x99)[0x5633f3ad6b09] postgres: autovacuum worker postgres(+0x1c0a37)[0x5633f36cca37] postgres: autovacuum worker postgres(heap_vacuum_rel+0xfca)[0x5633f36cf75a] postgres: autovacuum worker postgres(+0x305fed)[0x5633f3811fed] postgres: autovacuum worker postgres(vacuum+0x61a)[0x5633f38137ea] postgres: autovacuum worker postgres(+0x409dd3)[0x5633f3915dd3] postgres: autovacuum worker postgres(+0x40ae46)[0x5633f3916e46] postgres: autovacuum worker postgres(AutoVacuumUpdateDelay+0x0)[0x5633f3916f50] postgres: autovacuum worker postgres(+0x41985b)[0x5633f392585b] /lib/x86_64-linux-gnu/libpthread.so.0(+0x12890)[0x7f085c591890] /lib/x86_64-linux-gnu/libc.so.6(__select+0x17)[0x7f085bafaff7] postgres: autovacuum worker postgres(+0x419d06)[0x5633f3925d06] postgres: autovacuum worker postgres(PostmasterMain+0xcbb)[0x5633f39277bb] postgres: autovacuum worker postgres(main+0x4d4)[0x5633f3660a14] /lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xe7)[0x7f085ba05b97] postgres: autovacuum worker postgres(_start+0x2a)[0x5633f3660aba] 2021-06-08 19:10:36.875 CDT postmaster[13483] LOG: server process (PID 10367) was terminated by signal 6: Aborted 2021-06-08 19:10:36.875 CDT postmaster[13483] DETAIL: Failed process was running: autovacuum: VACUUM pg_toast.pg_toast_2619 2021-06-08 19:10:36.875 CDT postmaster[13483] LOG: terminating any other active server processes WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because anotherserver process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost real 0m14.477s
On Tue, Jun 8, 2021 at 5:11 PM Tom Lane <tgl@sss.pgh.pa.us> wrote: > I wonder if this is a variant of the problem shown at > > https://www.postgresql.org/message-id/2591376.1621196582%40sss.pgh.pa.us > > where maybe_needed was visibly quite insane. This value is > less visibly insane, but it's still wrong. It might be > interesting to try running this test case with the extra > assertions I proposed there, to try to narrow down where > it's going off the rails. Oh yeah. Justin didn't say anything about upgrading using pg_upgrade (just something about upgrading the kernel). Did you use pg_upgrade here, Justin? I'm going to see Andres in person in 20 minutes time (for the first time in over a year!). I'll discuss this issue with him. -- Peter Geoghegan
On Tue, Jun 8, 2021 at 5:18 PM Justin Pryzby <pryzby@telsasoft.com> wrote: > I reproduced the issue on a new/fresh cluster like this: > > ./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c log_autovacuum_min_duration=-1 > psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON t(i);" > time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done& > time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done& I don't have time to try this out myself today, but offhand I'm pretty confident that this is sufficient to reproduce the underlying bug itself. And if that's true then I guess it can't have anything to do with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the apparent similarity. Thanks -- Peter Geoghegan
On Wed, Jun 9, 2021 at 2:17 AM Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-06-08 14:27:14 +0200, Matthias van de Meent wrote: > > heap_prune_satisfies_vacuum considers 1 more transaction to be > > unvacuumable, and thus indeed won't vacuum that tuple that > > HeapTupleSatisfiesVacuum does want to be vacuumed. > > > > The new open question is now: Why is > > GlobalVisCatalogRels->maybe_needed < OldestXmin? IIRC > > GLobalVisCatalogRels->maybe_needed is constructed from the same > > ComputeXidHorizonsResult->catalog_oldest_nonremovable which later is > > returned to be used in vacrel->OldestXmin. > > The horizon used by pruning is only updated once per transaction (well, > approximately). What presumably is happening is that the retry loop is > retrying, without updating the horizon, therefore the same thing is > happening over and over again? When we calculated vacrel->OldestXmin in vacuum_set_xid_limits(), vacrel->OldestXmin and GlogalVisCatalogRels->maybe_needed must have been the same value. That is, those were 926025113. After that, vacrel->OldestXmin is not changed throughout lazy vacuum whereas GlobalVisCatalogRels->maybe_needed could be updated (right?). Is there any chance that GlobalVisCatalogRels->maybe_needed goes backward? For example, a case like where when re-calculating catalog_oldest_nonremovable (i.g. updating GlobalVisCatalogRels->maybe_needed) we take a process into account who has an old XID but was ignored last time for some reason (e.g., its statusFlag). Regards, -- Masahiko Sawada EDB: https://www.enterprisedb.com/
On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote: > I don't have time to try this out myself today, but offhand I'm pretty > confident that this is sufficient to reproduce the underlying bug > itself. And if that's true then I guess it can't have anything to do > with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the > apparent similarity. Agreed. It took me a couple of minutes to get autovacuum to run in an infinite loop with a standalone instance. Nice catch, Justin! -- Michael
Attachment
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Wed, 9 Jun 2021 at 04:42, Michael Paquier <michael@paquier.xyz> wrote: > > On Tue, Jun 08, 2021 at 05:47:28PM -0700, Peter Geoghegan wrote: > > I don't have time to try this out myself today, but offhand I'm pretty > > confident that this is sufficient to reproduce the underlying bug > > itself. And if that's true then I guess it can't have anything to do > > with the pg_upgrade/pg_resetwal issue Tom just referenced, despite the > > apparent similarity. > > Agreed. It took me a couple of minutes to get autovacuum to run in an > infinite loop with a standalone instance. Nice catch, Justin! I believe that I've found the culprit: GetOldestNonRemovableTransactionId(rel) does not use the exact same conditions for returning OldestXmin as GlobalVisTestFor(rel) does. This results in different minimal XIDs, and subsequently this failure. The attached patch fixes this inconsistency, and adds a set of asserts to ensure that GetOldesNonRemovableTransactionId is equal to the maybe_needed of the GlobalVisTest of that relation, plus some at GlobalVisUpdateApply such that it will fail whenever it is called with arguments that would move the horizons in the wrong direction. Note that there was no problem in GlobalVisUpdateApply, but it helped me determine that that part was not the source of the problem, and I think that having this safeguard is a net-positive. Another approach might be changing GlobalVisTestFor(rel) instead to reflect the conditions in GetOldestNonRemovableTransactionId. With attached prototype patch, I was unable to reproduce the problematic case in 10 minutes. Without, I got the problematic behaviour in seconds. With regards, Matthias
Attachment
Hi, Good find! On 2021-06-09 17:42:34 +0200, Matthias van de Meent wrote: > I believe that I've found the culprit: > GetOldestNonRemovableTransactionId(rel) does not use the exact same > conditions for returning OldestXmin as GlobalVisTestFor(rel) does. > This results in different minimal XIDs, and subsequently this failure. Specifically, the issue is that it uses the innocuous looking else if (RelationIsAccessibleInLogicalDecoding(rel)) return horizons.catalog_oldest_nonremovable; but that's not sufficient, because #define RelationIsAccessibleInLogicalDecoding(relation) \ (XLogLogicalInfoActive() && \ RelationNeedsWAL(relation) && \ (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))) it is never true if wal_level < logical. So what it is missing is the IsCatalogRelation(rel) || bit. > The attached patch fixes this inconsistency I think I prefer applying the fix and the larger changes separately. > Another approach might be changing GlobalVisTestFor(rel) instead to > reflect the conditions in GetOldestNonRemovableTransactionId. No, that'd not be correct, afaict. Greetings, Andres Freund
On Wed, Jun 9, 2021 at 11:45 AM Andres Freund <andres@anarazel.de> wrote: > Good find! +1 > > The attached patch fixes this inconsistency > > I think I prefer applying the fix and the larger changes separately. I wonder if it's worth making the goto inside lazy_scan_prune verify that the heap tuple matches what we expect. I'm sure that we would have found this issue far sooner if that had been in place already. Though I'm less sure of how much value adding such a check has now. -- Peter Geoghegan
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Wed, 9 Jun 2021 at 20:45, Andres Freund <andres@anarazel.de> wrote: > > Specifically, the issue is that it uses the innocuous looking > > else if (RelationIsAccessibleInLogicalDecoding(rel)) > return horizons.catalog_oldest_nonremovable; > > but that's not sufficient, because > > #define RelationIsAccessibleInLogicalDecoding(relation) \ > (XLogLogicalInfoActive() && \ > RelationNeedsWAL(relation) && \ > (IsCatalogRelation(relation) || RelationIsUsedAsCatalogTable(relation))) > > it is never true if wal_level < logical. So what it is missing is the > IsCatalogRelation(rel) || bit. Correct. > > The attached patch fixes this inconsistency > > I think I prefer applying the fix and the larger changes separately. Feel free to change anything in that patch, it was a prototype, or give me a notice if you want me to split the patch. > > Another approach might be changing GlobalVisTestFor(rel) instead to > > reflect the conditions in GetOldestNonRemovableTransactionId. > > No, that'd not be correct, afaict. Allright, I wasn't sure of that myself. With regards, Matthias van de Meent.
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Wed, 9 Jun 2021 at 22:45, Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Jun 9, 2021 at 11:45 AM Andres Freund <andres@anarazel.de> wrote: > > Good find! > > +1 > > > > The attached patch fixes this inconsistency > > > > I think I prefer applying the fix and the larger changes separately. > > I wonder if it's worth making the goto inside lazy_scan_prune verify > that the heap tuple matches what we expect. I'm sure that we would > have found this issue far sooner if that had been in place already. > Though I'm less sure of how much value adding such a check has now. Could you elaborate on what this "matches what we expect" entails? Apart from this, I'm also quite certain that the goto-branch that created this infinite loop should have been dead code: In a correctly working system, the GlobalVis*Rels should always be at least as strict as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels can be updated (i.e. move their horizon forward) during the vacuum. As such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple that also satisifies the condition of HeapTupleSatisfiesVacuum. That is, unless we're also going to change code to update / move forward vacrel->OldestXmin in lazy_scan_prune between the HPSV call and the loop with HTSV. With regards, Matthias van de Meent
On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > Could you elaborate on what this "matches what we expect" entails? > > Apart from this, I'm also quite certain that the goto-branch that > created this infinite loop should have been dead code: In a correctly > working system, the GlobalVis*Rels should always be at least as strict > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels > can be updated (i.e. move their horizon forward) during the vacuum. As > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple > that also satisifies the condition of HeapTupleSatisfiesVacuum. It's true that these two similar functions should be in perfect agreement in general (given the same OldestXmin). That in itself doesn't mean that they must always agree about a tuple in practice, when they're called in turn inside lazy_scan_prune(). In particular, nothing stops a transaction that was in progress to heap_prune_satisfies_vacuum (when it saw some tuples it inserted) concurrently aborting. That will render the same tuples fully DEAD inside HeapTupleSatisfiesVacuum(). So we need to restart using the goto purely to cover that case. See the commit message of commit 8523492d4e3. By "matches what we expect", I meant "involves a just-aborted transaction". We could defensively verify that the inserting transaction concurrently aborted at the point of retrying/calling heap_page_prune() a second time. If there is no aborted transaction involved (as was the case with this bug), then we can be confident that something is seriously broken. -- Peter Geoghegan
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Thu, 10 Jun 2021 at 18:03, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jun 10, 2021 at 8:49 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > Could you elaborate on what this "matches what we expect" entails? > > > > Apart from this, I'm also quite certain that the goto-branch that > > created this infinite loop should have been dead code: In a correctly > > working system, the GlobalVis*Rels should always be at least as strict > > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels > > can be updated (i.e. move their horizon forward) during the vacuum. As > > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple > > that also satisifies the condition of HeapTupleSatisfiesVacuum. > > It's true that these two similar functions should be in perfect > agreement in general (given the same OldestXmin). That in itself > doesn't mean that they must always agree about a tuple in practice, > when they're called in turn inside lazy_scan_prune(). In particular, > nothing stops a transaction that was in progress to > heap_prune_satisfies_vacuum (when it saw some tuples it inserted) > concurrently aborting. That will render the same tuples fully DEAD > inside HeapTupleSatisfiesVacuum(). So we need to restart using the > goto purely to cover that case. See the commit message of commit > 8523492d4e3. I totally overlooked that HeapTupleSatisfiesVacuumHorizon does the heavyweight XID validation and does return HEAPTUPLE_DEAD in those recently rolled back cases. Thank you for reminding me. > By "matches what we expect", I meant "involves a just-aborted > transaction". We could defensively verify that the inserting > transaction concurrently aborted at the point of retrying/calling > heap_page_prune() a second time. If there is no aborted transaction > involved (as was the case with this bug), then we can be confident > that something is seriously broken. I believe there are more cases than only the rolled back case, but checking for those cases would potentially help, yes. With regards, Matthias van de Meent.
On Thu, Jun 10, 2021 at 9:57 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > > By "matches what we expect", I meant "involves a just-aborted > > transaction". We could defensively verify that the inserting > > transaction concurrently aborted at the point of retrying/calling > > heap_page_prune() a second time. If there is no aborted transaction > > involved (as was the case with this bug), then we can be confident > > that something is seriously broken. > > I believe there are more cases than only the rolled back case, but > checking for those cases would potentially help, yes. Why do you believe that there are other cases? I'm not aware of any case that causes lazy_scan_prune() to retry using the goto, other than the aborted transaction case I described (excluding the bug that you diagnosed, which was of course never supposed to happen). If it really is possible to observe a retry for any other reason then I'd very much like to know all the details - it might well signal a distinct bug of the same general variety. -- Peter Geoghegan
Hi, On 2021-06-10 17:49:05 +0200, Matthias van de Meent wrote: > Apart from this, I'm also quite certain that the goto-branch that > created this infinite loop should have been dead code: In a correctly > working system, the GlobalVis*Rels should always be at least as strict > as the vacrel->OldestXmin, but at the same time only GlobalVis*Rels > can be updated (i.e. move their horizon forward) during the vacuum. As > such, heap_prune_satisfies_vacuum should never fail to vacuum a tuple > that also satisifies the condition of HeapTupleSatisfiesVacuum. That > is, unless we're also going to change code to update / move forward > vacrel->OldestXmin in lazy_scan_prune between the HPSV call and the > loop with HTSV. Consider the case of a transaction that inserted a row aborting. That tuple will be "fully dead" regardless of any xid horizons. Greetings, Andres Freund
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Thu, 10 Jun 2021 at 19:07, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jun 10, 2021 at 9:57 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > > By "matches what we expect", I meant "involves a just-aborted > > > transaction". We could defensively verify that the inserting > > > transaction concurrently aborted at the point of retrying/calling > > > heap_page_prune() a second time. If there is no aborted transaction > > > involved (as was the case with this bug), then we can be confident > > > that something is seriously broken. > > > > I believe there are more cases than only the rolled back case, but > > checking for those cases would potentially help, yes. > > Why do you believe that there are other cases? > > I'm not aware of any case that causes lazy_scan_prune() to retry using > the goto, other than the aborted transaction case I described > (excluding the bug that you diagnosed, which was of course never > supposed to happen). If it really is possible to observe a retry for > any other reason then I'd very much like to know all the details - it > might well signal a distinct bug of the same general variety. I see one exit for HEAPTUPLE_DEAD on a potentially recently committed xvac (?), and we might also check against recently committed transactions if xmin == xmax, although apparently that is not implemented right now. With regards, Matthias van de Meent
On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed > xvac (?), and we might also check against recently committed > transactions if xmin == xmax, although apparently that is not > implemented right now. I don't follow. Perhaps you can produce a test case? -- Peter Geoghegan
On 2021-Jun-10, Peter Geoghegan wrote: > On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed > > xvac (?), and we might also check against recently committed > > transactions if xmin == xmax, although apparently that is not > > implemented right now. xvac was used by the pre-9.0 VACUUM FULL, so I don't think it's possible to see a recently committed one. (I think you'd have to find a table that was pg_upgraded from 8.4 or older, with leftover tuples from an aborted VACUUM FULL, and never vacuumed after that.) A scenario with such a tuple on disk is not impossible [in theory], but if it does exist, then the VACUUM FULL would not be in the possibly-visible horizon. -- Álvaro Herrera Valdivia, Chile "Linux transformó mi computadora, de una `máquina para hacer cosas', en un aparato realmente entretenido, sobre el cual cada día aprendo algo nuevo" (Jaime Salinas)
Hi, On 2021-06-08 19:18:18 -0500, Justin Pryzby wrote: > I reproduced the issue on a new/fresh cluster like this: > > ./postgres -D data -c autovacuum_naptime=1 -c autovacuum_analyze_scale_factor=0.005 -c log_autovacuum_min_duration=-1 > psql -h /tmp postgres -c "CREATE TABLE t(i int); INSERT INTO t SELECT generate_series(1,99999); CREATE INDEX ON t(i);" > time while psql -h /tmp postgres -qc 'REINDEX (CONCURRENTLY) INDEX t_i_idx'; do :; done& > time while psql -h /tmp postgres -qc 'ANALYZE pg_attribute'; do :; done& > > TRAP: FailedAssertion("restarts == 0", File: "vacuumlazy.c", Line: 1803, PID: 10367) Has anybody looked at getting test coverage for the retry path? Not with the goal of triggering an assertion, just to have at least basic coverage. The problem with writing a test is likely to find a way to halfway reliably schedule a transaction abort after pruning, but before the tuple-removal loop? Does anybody see a trick to do so? Greetings, Andres Freund
On Thu, Jun 10, 2021 at 5:58 PM Andres Freund <andres@anarazel.de> wrote: > The problem with writing a test is likely to find a way to halfway > reliably schedule a transaction abort after pruning, but before the > tuple-removal loop? Does anybody see a trick to do so? I asked Alexander about using his pending stop events infrastructure patch to test this code, back when it did the tupgone stuff rather than loop: https://postgr.es/m/CAH2-Wz=Tb7bAgCFt0VFA0YJ5Vd1RxJqZRc I can't see any better way. ISTM that it would be much more useful to focus on adding an assertion (or maybe even a "can't happen" error) that fails when the DEAD/goto path is reached with a tuple whose xmin wasn't aborted. If that was in place then we would have caught the bug in GetOldestNonRemovableTransactionId() far sooner. That might actually catch other bugs in the future. -- Peter Geoghegan
Hi, On 2021-06-10 18:49:50 -0700, Peter Geoghegan wrote: > ISTM that it would be much more useful to focus on adding an assertion > (or maybe even a "can't happen" error) that fails when the DEAD/goto > path is reached with a tuple whose xmin wasn't aborted. If that was in > place then we would have caught the bug in > GetOldestNonRemovableTransactionId() far sooner. That might actually > catch other bugs in the future. I'm not convinced - right now we don't exercise this path in tests at all. More assertions won't change that - stuff that can be triggered in production-ish loads doesn't help during development. I do think that that makes it far too easy to have state management bugs (e.g. a wrong pincount in retry cases or such). Greetings, Andres Freund
Peter Geoghegan <pg@bowt.ie> writes: > ISTM that it would be much more useful to focus on adding an assertion > (or maybe even a "can't happen" error) that fails when the DEAD/goto > path is reached with a tuple whose xmin wasn't aborted. If that was in > place then we would have caught the bug in > GetOldestNonRemovableTransactionId() far sooner. That might actually > catch other bugs in the future. Sounds like a good idea. If we expect that path to be taken only rarely, then a test-and-elog would be worth its keep. Otherwise maybe it should just be an Assert. regards, tom lane
On Thu, Jun 10, 2021 at 7:00 PM Andres Freund <andres@anarazel.de> wrote: > I'm not convinced - right now we don't exercise this path in tests at > all. More assertions won't change that - stuff that can be triggered in > production-ish loads doesn't help during development. I do think that > that makes it far too easy to have state management bugs (e.g. a wrong > pincount in retry cases or such). The code in lazy_scan_prune() led to our detecting this bug (albeit in a fairly nasty way). The problematic VACUUM operations never actually exercised the goto as originally designed, for the purpose it was intended for. Perhaps we should add test coverage for the intended behavior too, but that doesn't seem particularly relevant right now. -- Peter Geoghegan
Hi, On 2021-06-10 19:15:59 -0700, Peter Geoghegan wrote: > On Thu, Jun 10, 2021 at 7:00 PM Andres Freund <andres@anarazel.de> wrote: > > I'm not convinced - right now we don't exercise this path in tests at > > all. More assertions won't change that - stuff that can be triggered in > > production-ish loads doesn't help during development. I do think that > > that makes it far too easy to have state management bugs (e.g. a wrong > > pincount in retry cases or such). > > The code in lazy_scan_prune() led to our detecting this bug (albeit in > a fairly nasty way). The problematic VACUUM operations never actually > exercised the goto as originally designed, for the purpose it was > intended for. Perhaps we should add test coverage for the intended > behavior too, but that doesn't seem particularly relevant right now. Well, I'd like to add assertions ensuring the retry path is only entered when correct - but I feel hesitant about doing so when I can't exercise that path reliably in at least some of the situations. Greetings, Andres Freund
On Thu, Jun 10, 2021 at 7:38 PM Andres Freund <andres@anarazel.de> wrote: > Well, I'd like to add assertions ensuring the retry path is only entered > when correct - but I feel hesitant about doing so when I can't exercise > that path reliably in at least some of the situations. I originally tested the lazy_scan_prune() goto in the obvious way: by adding a pg_usleep() just after its heap_page_prune() call. I'm not too worried about the restart corrupting state or something, because the state is pretty trivial. In any case the infrastructure to exercise the goto inside the tests doesn't exist yet -- I don't see any way around that on HEAD. OTOH I *am* concerned about the goto doing the wrong thing due to bugs in distant code. I cannot imagine any possible downside to at least asserting HeapTupleHeaderXminInvalid() against the "concurrently inserted then abort" tuple. That simple measure would have been enough to at least catch this particular bug far sooner. -- Peter Geoghegan
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Thu, 10 Jun 2021 at 19:43, Peter Geoghegan <pg@bowt.ie> wrote: > > On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent > <boekewurm+postgres@gmail.com> wrote: > > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed > > xvac (?), and we might also check against recently committed > > transactions if xmin == xmax, although apparently that is not > > implemented right now. > > I don't follow. Perhaps you can produce a test case? If you were to delete a tuple in the same transaction that you create it (without checkpoints / subtransactions), I would assume that this would allow us to vacuum the tuple, as the only snapshot that could see the tuple must commit or roll back. In any case, inside the transaction the tuple is not visible anymore, and outside the transaction the tuple will never be seen. That being the case, any such tuple that has xmin == xmax should be vacuumable at any time, except that you might want to wait for the transaction to have committed/rolled back to prevent any race conditions with (delayed) index insertions. example: BEGIN; INSERT INTO tab VALUES (1); DELETE FROM tab; -- At this point, the tuple inserted cannot be seen in any -- current or future snapshot, and could thus be vacuumed. COMMIT; Because I am not quite yet well versed with the xid assignment and heapam deletion subsystems, it could very well be that either this case is impossible to reach, or that the heapam tuple delete logic already applies this at tuple delete time. With regards, Matthias van de Meent
Hi, On 2021-06-14 11:53:47 +0200, Matthias van de Meent wrote: > On Thu, 10 Jun 2021 at 19:43, Peter Geoghegan <pg@bowt.ie> wrote: > > > > On Thu, Jun 10, 2021 at 10:29 AM Matthias van de Meent > > <boekewurm+postgres@gmail.com> wrote: > > > I see one exit for HEAPTUPLE_DEAD on a potentially recently committed > > > xvac (?), and we might also check against recently committed > > > transactions if xmin == xmax, although apparently that is not > > > implemented right now. > > > > I don't follow. Perhaps you can produce a test case? > > If you were to delete a tuple in the same transaction that you create > it (without checkpoints / subtransactions), I would assume that this > would allow us to vacuum the tuple, as the only snapshot that could > see the tuple must commit or roll back. Right now we do not do so, but I think we talked about adding such logic a couple times. I think a more robust assertion than aborted-ness could be to assert that repeated retries are not allowed to have the same "oldest xid" than a previous retry. With oldest xid be the older of xmin/xmax? Greetings, Andres Freund
Hi, > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state) > static void > GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons) > { > + /* assert non-decreasing nature of horizons */ > + Assert(FullTransactionIdFollowsOrEquals( > + FullXidRelativeTo(horizons->latest_completed, > + horizons->shared_oldest_nonremovable), > + GlobalVisSharedRels.maybe_needed)); > + Assert(FullTransactionIdFollowsOrEquals( > + FullXidRelativeTo(horizons->latest_completed, > + horizons->catalog_oldest_nonremovable), > + GlobalVisCatalogRels.maybe_needed)); > + Assert(FullTransactionIdFollowsOrEquals( > + FullXidRelativeTo(horizons->latest_completed, > + horizons->data_oldest_nonremovable), > + GlobalVisDataRels.maybe_needed)); > + Assert(FullTransactionIdFollowsOrEquals( > + FullXidRelativeTo(horizons->latest_completed, > + horizons->temp_oldest_nonremovable), > + GlobalVisTempRels.maybe_needed)); > + > GlobalVisSharedRels.maybe_needed = > FullXidRelativeTo(horizons->latest_completed, > horizons->shared_oldest_nonremovable); Thinking more about it, I don't think these are correct. See the following comment in procarray.c: * Note: despite the above, it's possible for the calculated values to move * backwards on repeated calls. The calculated values are conservative, so * that anything older is definitely not considered as running by anyone * anymore, but the exact values calculated depend on a number of things. For * example, if there are no transactions running in the current database, the * horizon for normal tables will be latestCompletedXid. If a transaction * begins after that, its xmin will include in-progress transactions in other * databases that started earlier, so another call will return a lower value. * Nonetheless it is safe to vacuum a table in the current database with the * first result. There are also replication-related effects: a walsender * process can set its xmin based on transactions that are no longer running * on the primary but are still being replayed on the standby, thus possibly * making the values go backwards. In this case there is a possibility that * we lose data that the standby would like to have, but unless the standby * uses a replication slot to make its xmin persistent there is little we can * do about that --- data is only protected if the walsender runs continuously * while queries are executed on the standby. (The Hot Standby code deals * with such cases by failing standby queries that needed to access * already-removed data, so there's no integrity bug.) The computed values * are also adjusted with vacuum_defer_cleanup_age, so increasing that setting * on the fly is another easy way to make horizons move backwards, with no * consequences for data integrity. Greetings, Andres Freund
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state) > > static void > > GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons) > > { > > + /* assert non-decreasing nature of horizons */ > > Thinking more about it, I don't think these are correct. See the > following comment in procarray.c: > > * Note: despite the above, it's possible for the calculated values to move > * backwards on repeated calls. So the implicit assumption in heap_page_prune that HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, we'll need to redo the condition in heap_page_prune as well. PFA my adapted patch that fixes this new-ish issue, and does not include the (incorrect) assertions in GlobalVisUpdateApply. I've tested this against the reproducing case, both with and without the fix in GetOldestNonRemovableTransactionId, and it fails fall into an infinite loop. I would appreciate it if someone could validate the new logic in the HEAPTUPLE_DEAD case. Although I believe it correctly handles the case where the vistest non-removable horizon moved backwards, a second pair of eyes would be appreciated. With regards, Matthias van de Meent
Attachment
On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent <boekewurm+postgres@gmail.com> wrote: > On Tue, 15 Jun 2021 at 03:22, Andres Freund <andres@anarazel.de> wrote: > > > @@ -4032,6 +4039,24 @@ GlobalVisTestShouldUpdate(GlobalVisState *state) > > > static void > > > GlobalVisUpdateApply(ComputeXidHorizonsResult *horizons) > > > { > > > + /* assert non-decreasing nature of horizons */ > > > > Thinking more about it, I don't think these are correct. See the > > following comment in procarray.c: > > > > * Note: despite the above, it's possible for the calculated values to move > > * backwards on repeated calls. > > So the implicit assumption in heap_page_prune that > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > we'll need to redo the condition in heap_page_prune as well. I don't think that this shows that the assumption within lazy_scan_prune() (the assumption that both "satisfies vacuum" functions agree) is wrong, with the obvious exception of cases involving the bug that Justin reported. GlobalVis*.maybe_needed is supposed to be conservative. > PFA my adapted patch that fixes this new-ish issue, and does not > include the (incorrect) assertions in GlobalVisUpdateApply. I've > tested this against the reproducing case, both with and without the > fix in GetOldestNonRemovableTransactionId, and it fails fall into an > infinite loop. > > I would appreciate it if someone could validate the new logic in the > HEAPTUPLE_DEAD case. Although I believe it correctly handles the case > where the vistest non-removable horizon moved backwards, a second pair > of eyes would be appreciated. If you look at the lazy_scan_prune() logic immediately prior to commit 8523492d4e3, you'll see that it used to have a HEAPTUPLE_DEAD case that didn't involve a restart -- this was the "tupgone" mechanism. Back then we actually had to remove any corresponding index tuples from indexes when in this rare case. Plus there was a huge amount of complicated mechanism to handle a very rare case, all of which was removed by commit 8523492d4e3. Things like extra recovery conflict code just for this rare case, or needing to acquire a super exclusive lock on pages during VACUUM's second heap pass. This is all cruft that I was happy to get rid of. This is a good discussion of the tupgone stuff and the problems it caused, which is good background information: https://www.postgresql.org/message-id/20200724165514.dnu5hr4vvgkssf5p%40alap3.anarazel.de Even if it was true that heap_prune_satisfies_vacuum() won't agree with HeapTupleSatisfiesVacuum() after repeated retries within lazy_scan_prune(), it would probably best if we then made code outside vacuumlazy.c agree with the lazy_scan_prune() assumption, rather than the other way around. Have you actually been able to demonstrate a problem involving lazy_scan_prune()'s goto, except the main GetOldestNonRemovableTransactionId() bug reported by Justin? -- Peter Geoghegan
On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent > > So the implicit assumption in heap_page_prune that > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > > we'll need to redo the condition in heap_page_prune as well. > > I don't think that this shows that the assumption within > lazy_scan_prune() (the assumption that both "satisfies vacuum" > functions agree) is wrong, with the obvious exception of cases > involving the bug that Justin reported. GlobalVis*.maybe_needed is > supposed to be conservative. I suppose it's true that they can disagree because we call vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c before calling GlobalVisTestFor() inside vacuumlazy.c to get a vistest. But that only implies that a tuple that would have been considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed being considered DEAD according to OldestXmin) is seen as an LP_DEAD stub line pointer. Which really means it's DEAD to lazy_scan_prune() anyway. These days the only way that lazy_scan_prune() can consider a tuple fully DEAD is if it's no longer a tuple -- it has to actually be an LP_DEAD stub line pointer. It's really no different to an opportunistic prune that concurrently prunes tuples that VACUUM would have seen as RECENTLY_DEAD if it was going solely on the OldestXmin cutoff. There are certain kinds of tables where non-HOT updates and opportunistic pruning constantly leave behind loads of LP_DEAD items. Pruning inside VACUUM won't do much of the total required pruning at all. That'll mean that some DEAD/LP_DEAD items will become dead long after a VACUUM starts, while nevertheless being removed by the same VACUUM. Of course there is no way for lazy_scan_prune() to distinguish one LP_DEAD item from another -- they're all stubs without tuple storage, and without a tuple header with XIDs. -- Peter Geoghegan
Hi, On 2021-06-16 12:59:33 +0200, Matthias van de Meent wrote: > PFA my adapted patch that fixes this new-ish issue, and does not > include the (incorrect) assertions in GlobalVisUpdateApply. I've > tested this against the reproducing case, both with and without the > fix in GetOldestNonRemovableTransactionId, and it fails fall into an > infinite loop. Could you share your testcase? I've been working on a series of patches to address this (I'll share in a bit), and I've run quite a few tests, and didn't hit any infinite loops. > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c > index 4b600e951a..f4320d5a34 100644 > --- a/src/backend/access/heap/vacuumlazy.c > +++ b/src/backend/access/heap/vacuumlazy.c > @@ -1675,6 +1675,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive) > * that any items that make it into the dead_tuples array are simple LP_DEAD > * line pointers, and that every remaining item with tuple storage is > * considered as a candidate for freezing. > + * > + * Note: It is possible that vistest's window moves back from the > + * vacrel->OldestXmin (see ComputeXidHorizons). To prevent an infinite > + * loop where we bounce between HeapTupleSatisfiesVacuum and > + * heap_prune_satisfies_vacuum who disagree on the [almost]deadness of > + * a tuple, we only retry when we know HTSV agrees with HPSV. > */ HTSV is quite widely used because HeapTupleSatisfiesVacuum is quite widely used. HPSV isn't, so it's a bit confusing to use this. Greetings, Andres Freund
Hi, On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote: > On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent > > > So the implicit assumption in heap_page_prune that > > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > > > we'll need to redo the condition in heap_page_prune as well. > > > > I don't think that this shows that the assumption within > > lazy_scan_prune() (the assumption that both "satisfies vacuum" > > functions agree) is wrong, with the obvious exception of cases > > involving the bug that Justin reported. GlobalVis*.maybe_needed is > > supposed to be conservative. > > I suppose it's true that they can disagree because we call > vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c > before calling GlobalVisTestFor() inside vacuumlazy.c to get a > vistest. But that only implies that a tuple that would have been > considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed > being considered DEAD according to OldestXmin) is seen as an LP_DEAD > stub line pointer. Which really means it's DEAD to lazy_scan_prune() > anyway. These days the only way that lazy_scan_prune() can consider a > tuple fully DEAD is if it's no longer a tuple -- it has to actually be > an LP_DEAD stub line pointer. I think it's more complicated than that - "before" isn't a guarantee when the horizon can go backwards. Consider the case where a hot_standby_feedback=on replica without a slot connects - that can result in the xid horizon going backwards. I think a good way to address this might be to have GlobalVisUpdateApply() ensure that maybe_needed does not go backwards within one backend. This is *nearly* already guaranteed within vacuum, except for the case where a catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could lead to an attempt at pruning, which could move maybe_needed to go backwards theoretically if inbetween those two steps a replica connected that causes the horizon to go backwards. Greetings, Andres Freund
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Wed, 16 Jun 2021 at 21:12, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-06-16 12:59:33 +0200, Matthias van de Meent wrote: > > PFA my adapted patch that fixes this new-ish issue, and does not > > include the (incorrect) assertions in GlobalVisUpdateApply. I've > > tested this against the reproducing case, both with and without the > > fix in GetOldestNonRemovableTransactionId, and it fails fall into an > > infinite loop. * Failst _to_ fall into an infinite loop. Sorry, failed to add a "to". It passes tests > Could you share your testcase? I've been working on a series of patches > to address this (I'll share in a bit), and I've run quite a few tests, > and didn't hit any infinite loops. Basically, I've tested using the test case shared earlier; 2 sessions spamming connections with "reindex concurrently some_index" and "analyze pg_attribute" against the same database. > > > > diff --git a/src/backend/access/heap/vacuumlazy.c b/src/backend/access/heap/vacuumlazy.c > > index 4b600e951a..f4320d5a34 100644 > > --- a/src/backend/access/heap/vacuumlazy.c > > +++ b/src/backend/access/heap/vacuumlazy.c > > @@ -1675,6 +1675,12 @@ lazy_scan_heap(LVRelState *vacrel, VacuumParams *params, bool aggressive) > > * that any items that make it into the dead_tuples array are simple LP_DEAD > > * line pointers, and that every remaining item with tuple storage is > > * considered as a candidate for freezing. > > + * > > + * Note: It is possible that vistest's window moves back from the > > + * vacrel->OldestXmin (see ComputeXidHorizons). To prevent an infinite > > + * loop where we bounce between HeapTupleSatisfiesVacuum and > > + * heap_prune_satisfies_vacuum who disagree on the [almost]deadness of > > + * a tuple, we only retry when we know HTSV agrees with HPSV. > > */ > > HTSV is quite widely used because HeapTupleSatisfiesVacuum is quite > widely used. HPSV isn't, so it's a bit confusing to use this. Sure. I thought it was fine to shorten, as the full function name was just named the line above and it's a long name, but I'm fine with either. Kind regards, Matthias
Re: pg14b1 stuck in lazy_scan_prune/heap_page_prune of pg_statistic
From
Matthias van de Meent
Date:
On Wed, 16 Jun 2021 at 21:22, Andres Freund <andres@anarazel.de> wrote: > > Hi, > > On 2021-06-16 09:46:07 -0700, Peter Geoghegan wrote: > > On Wed, Jun 16, 2021 at 9:03 AM Peter Geoghegan <pg@bowt.ie> wrote: > > > On Wed, Jun 16, 2021 at 3:59 AM Matthias van de Meent > > > > So the implicit assumption in heap_page_prune that > > > > HeapTupleSatisfiesVacuum(OldestXmin) is always consistent with > > > > heap_prune_satisfies_vacuum(vacrel) has never been true. In that case, > > > > we'll need to redo the condition in heap_page_prune as well. > > > > > > I don't think that this shows that the assumption within > > > lazy_scan_prune() (the assumption that both "satisfies vacuum" > > > functions agree) is wrong, with the obvious exception of cases > > > involving the bug that Justin reported. GlobalVis*.maybe_needed is > > > supposed to be conservative. > > > > I suppose it's true that they can disagree because we call > > vacuum_set_xid_limits() to get an OldestXmin inside vacuumlazy.c > > before calling GlobalVisTestFor() inside vacuumlazy.c to get a > > vistest. But that only implies that a tuple that would have been > > considered RECENTLY_DEAD inside lazy_scan_prune() (it just missed > > being considered DEAD according to OldestXmin) is seen as an LP_DEAD > > stub line pointer. Which really means it's DEAD to lazy_scan_prune() > > anyway. These days the only way that lazy_scan_prune() can consider a > > tuple fully DEAD is if it's no longer a tuple -- it has to actually be > > an LP_DEAD stub line pointer. > > I think it's more complicated than that - "before" isn't a guarantee when the > horizon can go backwards. Consider the case where a hot_standby_feedback=on > replica without a slot connects - that can result in the xid horizon going > backwards. > > I think a good way to address this might be to have GlobalVisUpdateApply() > ensure that maybe_needed does not go backwards within one backend. > > This is *nearly* already guaranteed within vacuum, except for the case where a > catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could > lead to an attempt at pruning, which could move maybe_needed to go backwards > theoretically if inbetween those two steps a replica connected that causes the > horizon to go backwards. I'm tempted to suggest "update one of GlobalVisUpdateApply / ComputeXidHorizons to be non-decreasing". We already have the information that any previous GlobalVis*->maybe_needed is correct, and that if maybe_needed has been higher that that value is still correct, so we might just as well update the code to envelop that case. There's some cases where this might be dangerous: New transactions after a time with no active backends (in this case it should be fine to guarantee non-decreasing GlobalVisTestNonRemovableHorizon), and walsender. I'm uncertain whether or not it's dangerous to _not_ rollback maybe_needed for a new walsender-backend (e.g. the backend might want to construct a snapshot of (then) before GlobalVisTestNonRemovableHorizon), especially when considering the comment in ProcessStandbyHSFeedbackMessage: * There is a small window for a race condition here: although we just * checked that feedbackXmin precedes nextXid, the nextXid could have * gotten advanced between our fetching it and applying the xmin below, * perhaps far enough to make feedbackXmin wrap around. In that case the * xmin we set here would be "in the future" and have no effect. No point * in worrying about this since it's too late to save the desired data * anyway. Assuming that the standby sends us an increasing sequence of * xmins, this could only happen during the first reply cycle, else our * own xmin would prevent nextXid from advancing so far. At the very least, changing GlobalVisUpdateApply/ComputeXidHorizons would increase the potential amount of data lost in such race conditions, if any. As further note, my suggested changes in vacuumlazy (specifically, the 'continue' path added in lazy_scan_prune in my recent v2 patchset) is likely incorrect because of a potential undocumented requirement of heap_page_prune: leave no dead tuples with xmax < vacrel->OldestXmin. I realised that in my patch, we would allow some these tuples to continue to exist IFF the GlobalVisTestNonRemovableHorizon moved back during the vacuum, which would violate such requirement. Kind regards, Matthias van de Meent.
On Wed, Jun 16, 2021 at 12:22 PM Andres Freund <andres@anarazel.de> wrote: > I think it's more complicated than that - "before" isn't a guarantee when the > horizon can go backwards. Consider the case where a hot_standby_feedback=on > replica without a slot connects - that can result in the xid horizon going > backwards. Oh yeah, I think that I get it now. Tell me if this sounds right to you: It's not so much that HeapTupleSatisfiesVacuum() "disagrees" with heap_prune_satisfies_vacuum() in a way that actually matters to VACUUM. While there does seem to be a fairly mundane bug in GetOldestNonRemovableTransactionId() that really is a matter of disagreement between the two functions, the fundamental issue is deeper than that. The fundamental issue is that it's not okay to assume that the XID horizon won't go backwards. This probably matters for lots of reasons. The most obvious reason is that in theory it could cause lazy_scan_prune() to get stuck in about the same way as Justin reported, with the GetOldestNonRemovableTransactionId() bug. This isn't an issue in the backbranches because we're using the same OldestXmin value directly when calling heap_page_prune(). We only ever have one xid horizon cutoff like that per VACUUM (we only have OldestXmin, no vistest), so clearly it's not a problem. > I think a good way to address this might be to have GlobalVisUpdateApply() > ensure that maybe_needed does not go backwards within one backend. > > This is *nearly* already guaranteed within vacuum, except for the case where a > catalog access between vacuum_set_xid_limits() and GlobalVisTestFor() could > lead to an attempt at pruning, which could move maybe_needed to go backwards > theoretically if inbetween those two steps a replica connected that causes the > horizon to go backwards. This would at least be easy to test. I like the idea of adding invariants. -- Peter Geoghegan
Hi, On 2021-06-16 12:12:23 -0700, Andres Freund wrote: > Could you share your testcase? I've been working on a series of patches > to address this (I'll share in a bit), and I've run quite a few tests, > and didn't hit any infinite loops. Sorry for not yet doing that. Unfortunately I have an ongoing family health issue (& associated travel) claiming time and energy :(. I've pushed the minimal fix due to beta 2. Beyond beta 2 I am thinking of the below to unify the horizon determination: static inline GlobalVisHorizonKind GlobalVisHorizonKindForRel(Relation rel) { if (!rel) return VISHORIZON_SHARED; /* * Other relkkinds currently don't contain xids, nor always the necessary * logical decoding markers. */ Assert(rel->rd_rel->relkind == RELKIND_RELATION || rel->rd_rel->relkind == RELKIND_MATVIEW || rel->rd_rel->relkind == RELKIND_TOASTVALUE); if (rel->rd_rel->relisshared || RecoveryInProgress()) return VISHORIZON_SHARED; else if (IsCatalogRelation(rel) || RelationIsAccessibleInLogicalDecoding(rel)) return VISHORIZON_CATALOG; else if (!RELATION_IS_LOCAL(rel)) return VISHORIZON_DATA; else return VISHORIZON_TEMP; } That's then used in GetOldestNonRemovableTransactionId(), GlobalVisTestFor(). Makes sense? Regards, Andres
On Wed, Jun 16, 2021 at 1:21 PM Peter Geoghegan <pg@bowt.ie> wrote: > Oh yeah, I think that I get it now. Tell me if this sounds right to you: > > It's not so much that HeapTupleSatisfiesVacuum() "disagrees" with > heap_prune_satisfies_vacuum() in a way that actually matters to > VACUUM. While there does seem to be a fairly mundane bug in > GetOldestNonRemovableTransactionId() that really is a matter of > disagreement between the two functions, the fundamental issue is > deeper than that. The fundamental issue is that it's not okay to > assume that the XID horizon won't go backwards. This probably matters > for lots of reasons. The most obvious reason is that in theory it > could cause lazy_scan_prune() to get stuck in about the same way as > Justin reported, with the GetOldestNonRemovableTransactionId() bug. Any update on this, Andres? -- Peter Geoghegan
Hi, On 2021-06-21 05:29:19 -0700, Andres Freund wrote: > On 2021-06-16 12:12:23 -0700, Andres Freund wrote: > > Could you share your testcase? I've been working on a series of patches > > to address this (I'll share in a bit), and I've run quite a few tests, > > and didn't hit any infinite loops. > > Sorry for not yet doing that. Unfortunately I have an ongoing family > health issue (& associated travel) claiming time and energy :(. > > I've pushed the minimal fix due to beta 2. > > Beyond beta 2 I am thinking of the below to unify the horizon > determination: > > static inline GlobalVisHorizonKind > GlobalVisHorizonKindForRel(Relation rel) I finally pushed this cleanup. Greetings, Andres Freund