Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() - Mailing list pgsql-bugs
From | Alexander Lakhin |
---|---|
Subject | Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
Date | |
Msg-id | d5d5af5d-ba46-aff3-9f91-776c70246cc3@gmail.com Whole thread Raw |
Responses |
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune()
Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() Re: BUG #17257: (auto)vacuum hangs within lazy_scan_prune() |
List | pgsql-bugs |
29.10.2021 16:00, PG Bug reporting form wrote: > The following bug has been logged on the website: > > Bug reference: 17257 > Logged by: Alexander Lakhin > Email address: exclusion@gmail.com > PostgreSQL version: 14.0 > Operating system: Ubuntu 20.04 > Description: > > When running concurrent installchecks (x100) I've observed the autovacuum > process hanging: > I can propose the debugging patch to reproduce the issue that replaces the hang with the assert and modifies a pair of crash-causing test scripts to simplify the reproducing. (Sorry, I have no time now to prune down the scripts further as I have to leave for a week.) The reproducing script is: ### export PGDATABASE=regression createdb regression echo " vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class; select pg_sleep(random()/50); " >/tmp/pseudo-autovacuum.sql ( timeout 10m bash -c "while true; do psql -f src/test/regress/sql/inherit.sql >>psql1.log 2>&1; coredumpctl --no-pager >/dev/null 2>&1 && break; done" ) & ( timeout 10m bash -c "while true; do psql -f src/test/regress/sql/vacuum.sql >>psql2.log 2>&1; coredumpctl --no-pager >/dev/null 2>&1 && break; done" ) & pgbench -n -f /tmp/pseudo-autovacuum.sql -C -c 40 -T 600 >pgbench.log 2>&1 & wait ### (I've set: autovacuum=off fsync=off in postgresql.conf) It leads to: TIME PID UID GID SIG COREFILE EXE Fri 2021-10-29 16:15:59 MSK 2337121 1000 1000 6 present .../usr/local/pgsql/bin/postgres real 2m19,425s user 4m26,078s sys 0m31,658s Core was generated by `postgres: law regression [local] VACUUM '. Program terminated with signal SIGABRT, Aborted. #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 50 ../sysdeps/unix/sysv/linux/raise.c: No such file or directory. (gdb) bt #0 __GI_raise (sig=sig@entry=6) at ../sysdeps/unix/sysv/linux/raise.c:50 #1 0x00007f3d17e88859 in __GI_abort () at abort.c:79 #2 0x0000557358fbecb8 in ExceptionalCondition (conditionName=conditionName@entry=0x557359027544 "numretries < 100", errorType=errorType@entry=0x55735901800b "FailedAssertion", fileName=0x7ffef5bbfa70 "\231\354\373XsU", fileName@entry=0x557359027520 "vacuumlazy.c", lineNumber=lineNumber@entry=1726) at assert.c:69 #3 0x0000557358ba0bdb in lazy_scan_prune (vacrel=0x55735a508d60, buf=16184, blkno=270, page=0x7f3d16ef6f80 "", vistest=0x5573592dd540 <GlobalVisCatalogRels>, prunestate=0x7ffef5bc0fb0) at vacuumlazy.c:1823 #4 0x0000557358ba38c5 in lazy_scan_heap (aggressive=false, params=0x7ffef5bc13a0, vacrel=<optimized out>) at vacuumlazy.c:1384 #5 heap_vacuum_rel (rel=0x7f3d183ebec8, params=0x7ffef5bc13a0, bstrategy=<optimized out>) at vacuumlazy.c:638 #6 0x0000557358cec785 in table_relation_vacuum (bstrategy=<optimized out>, params=0x7ffef5bc13a0, rel=0x7f3d183ebec8) at ../../../src/include/access/tableam.h:1678 #7 vacuum_rel (relid=1259, relation=<optimized out>, params=params@entry=0x7ffef5bc13a0) at vacuum.c:2034 #8 0x0000557358cedf9e in vacuum (relations=0x55735a57bea8, params=0x7ffef5bc13a0, bstrategy=<optimized out>, isTopLevel=<optimized out>) at vacuum.c:475 #9 0x0000557358cee4f5 in ExecVacuum (pstate=pstate@entry=0x55735a4ab730, vacstmt=vacstmt@entry=0x55735a48a408, isTopLevel=isTopLevel@entry=true) at vacuum.c:268 #10 0x0000557358e9de4e in standard_ProcessUtility (pstmt=0x55735a48a758, queryString=0x55735a4895e0 "vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class;", readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55735a48a848, qc=0x7ffef5bc1700) at utility.c:858 #11 0x0000557358e9c411 in PortalRunUtility (portal=portal@entry=0x55735a4ed040, pstmt=pstmt@entry=0x55735a48a758, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55735a48a848, qc=qc@entry=0x7ffef5bc1700) at pquery.c:1155 #12 0x0000557358e9c54d in PortalRunMulti (portal=portal@entry=0x55735a4ed040, isTopLevel=isTopLevel@entry=true, setHoldSnapshot=setHoldSnapshot@entry=false, dest=dest@entry=0x55735a48a848, altdest=altdest@entry=0x55735a48a848, qc=qc@entry=0x7ffef5bc1700) at pquery.c:1312 #13 0x0000557358e9cbe9 in PortalRun (portal=portal@entry=0x55735a4ed040, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55735a48a848, altdest=altdest@entry=0x55735a48a848, qc=0x7ffef5bc1700) at pquery.c:788 #14 0x0000557358e9895b in exec_simple_query ( query_string=0x55735a4895e0 "vacuum (verbose, skip_locked, index_cleanup off) pg_catalog.pg_class;") at postgres.c:1214 #15 0x0000557358e9a561 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffef5bc1b70, dbname=<optimized out>, username=<optimized out>) at postgres.c:4486 #16 0x0000557358e064dd in BackendRun (port=0x55735a4aeb00, port=0x55735a4aeb00) at postmaster.c:4506 #17 BackendStartup (port=0x55735a4aeb00) at postmaster.c:4228 #18 ServerLoop () at postmaster.c:1745 #19 0x0000557358e07481 in PostmasterMain (argc=<optimized out>, argv=<optimized out>) at postmaster.c:1417 #20 0x0000557358b352e2 in main (argc=3, argv=0x55735a4838a0) at main.c:209 Sometimes the server crashes other way as described in bug #17255. And I've also seen the following crash (maybe it's yet another bug, but I can't explore it now): Core was generated by `postgres: law regression [local] EXPLAIN '. Program terminated with signal SIGSEGV, Segmentation fault. #0 RelationBuildPartitionDesc (omit_detached=true, rel=0x7f09b87a27e8) at partdesc.c:230 230 datum = heap_getattr(tuple, Anum_pg_class_relpartbound, (gdb) bt #0 RelationBuildPartitionDesc (omit_detached=true, rel=0x7f09b87a27e8) at partdesc.c:230 #1 RelationGetPartitionDesc (rel=0x7f09b87a27e8, omit_detached=<optimized out>) at partdesc.c:110 #2 0x000055ec12169032 in PartitionDirectoryLookup (pdir=0x55ec12ee7660, rel=rel@entry=0x7f09b87a27e8) at partdesc.c:425 #3 0x000055ec12158b35 in set_relation_partition_info (relation=0x7f09b87a27e8, rel=0x55ec12f337c0, root=0x55ec12effae0) at plancat.c:2197 #4 get_relation_info (root=<optimized out>, relationObjectId=<optimized out>, inhparent=<optimized out>, rel=<optimized out>) at plancat.c:472 #5 0x000055ec1215d26f in build_simple_rel (root=root@entry=0x55ec12effae0, relid=5, parent=parent@entry=0x55ec12ee7330) at relnode.c:307 #6 0x000055ec1214f495 in expand_partitioned_rtentry (root=0x55ec12effae0, relinfo=0x55ec12ee7330, parentrte=0x55ec12e006d0, parentRTindex=1, parentrel=0x7f09b87a0630, top_parentrc=0x0, lockmode=1) at inherit.c:398 #7 0x000055ec1214f67e in expand_inherited_rtentry (root=root@entry=0x55ec12effae0, rel=0x55ec12ee7330, rte=0x55ec12e006d0, rti=rti@entry=1) at inherit.c:143 #8 0x000055ec1212eb8d in add_other_rels_to_query (root=root@entry=0x55ec12effae0) at initsplan.c:163 #9 0x000055ec12131e3f in query_planner (root=root@entry=0x55ec12effae0, qp_callback=qp_callback@entry=0x55ec12132b20 <standard_qp_callback>, qp_extra=qp_extra@entry=0x7ffcffe4e890) at planmain.c:264 #10 0x000055ec121371b7 in grouping_planner (root=<optimized out>, tuple_fraction=<optimized out>) at planner.c:1442 #11 0x000055ec12139d0e in subquery_planner (glob=glob@entry=0x55ec12ef3eb8, parse=parse@entry=0x55ec12e005b8, parent_root=parent_root@entry=0x0, hasRecursion=hasRecursion@entry=false, tuple_fraction=tuple_fraction@entry=0) at planner.c:1019 #12 0x000055ec1213a383 in standard_planner (parse=0x55ec12e005b8, query_string=<optimized out>, cursorOptions=2048, boundParams=<optimized out>) at planner.c:400 #13 0x000055ec1220d7f8 in pg_plan_query (querytree=0x55ec12e005b8, querytree@entry=0x7ffcffe4ead0, query_string=query_string@entry=0x55ec12dff520 "explain (costs off) select * from range_list_parted;", cursorOptions=cursorOptions@entry=2048, boundParams=boundParams@entry=0x0) at postgres.c:847 #14 0x000055ec12025fef in ExplainOneQuery (query=0x7ffcffe4ead0, cursorOptions=2048, into=0x0, es=0x55ec12ef3bd0, queryString=0x55ec12dff520 "explain (costs off) select * from range_list_parted;", params=0x0, queryEnv=0x0) at explain.c:397 #15 0x000055ec1202679f in ExplainQuery (pstate=0x55ec12ef3a70, stmt=0x55ec12e003d8, params=0x0, dest=0x55ec12ef39d8) at ../../../src/include/nodes/nodes.h:604 #16 0x000055ec122132c9 in standard_ProcessUtility (pstmt=0x55ec12e01058, queryString=0x55ec12dff520 "explain (costs off) select * from range_list_parted;", readOnlyTree=<optimized out>, context=PROCESS_UTILITY_TOPLEVEL, params=0x0, queryEnv=0x0, dest=0x55ec12ef39d8, qc=0x7ffcffe4ed60) at utility.c:862 #17 0x000055ec122116cf in PortalRunUtility (portal=portal@entry=0x55ec12e63060, pstmt=0x55ec12e01058, isTopLevel=<optimized out>, setHoldSnapshot=setHoldSnapshot@entry=true, dest=dest@entry=0x55ec12ef39d8, qc=qc@entry=0x7ffcffe4ed60) at pquery.c:1155 #18 0x000055ec12211bb0 in FillPortalStore (portal=0x55ec12e63060, isTopLevel=<optimized out>) at ../../../src/include/nodes/nodes.h:604 #19 0x000055ec12211ebd in PortalRun (portal=portal@entry=0x55ec12e63060, count=count@entry=9223372036854775807, isTopLevel=isTopLevel@entry=true, run_once=run_once@entry=true, dest=dest@entry=0x55ec12e01148, altdest=altdest@entry=0x55ec12e01148, qc=0x7ffcffe4ef60) at pquery.c:760 #20 0x000055ec1220dcf5 in exec_simple_query ( query_string=0x55ec12dff520 "explain (costs off) select * from range_list_parted;") at postgres.c:1214 #21 0x000055ec1220f886 in PostgresMain (argc=argc@entry=1, argv=argv@entry=0x7ffcffe4f4b0, dbname=<optimized out>, username=<optimized out>) at postgres.c:4486 #22 0x000055ec1218071a in BackendRun (port=<optimized out>, port=<optimized out>) at postmaster.c:4506 #23 BackendStartup (port=<optimized out>) at postmaster.c:4228 #24 ServerLoop () at postmaster.c:1745 #25 0x000055ec12181682 in PostmasterMain (argc=argc@entry=3, argv=argv@entry=0x55ec12df9ce0) at postmaster.c:1417 #26 0x000055ec11eca3a0 in main (argc=3, argv=0x55ec12df9ce0) at main.c:209 Best regards, Alexander
Attachment
pgsql-bugs by date: