Re: Inval reliability, especially for inplace updates - Mailing list pgsql-hackers
From | Paul A Jungwirth |
---|---|
Subject | Re: Inval reliability, especially for inplace updates |
Date | |
Msg-id | CA+renyWSo3WigOO68ydGCQmDdZGLvXFGP3mLGo03XN09AKtDig@mail.gmail.com Whole thread Raw |
In response to | Re: Inval reliability, especially for inplace updates (Paul A Jungwirth <pj@illuminatedcomputing.com>) |
List | pgsql-hackers |
On Thu, Jul 31, 2025 at 9:53 AM Paul A Jungwirth <pj@illuminatedcomputing.com> wrote: > On Thu, Oct 31, 2024 at 09:20:52PM -0700, Noah Misch wrote: > > Here, one of the autovacuum workers had the guilty stack trace, appearing at > > the end of this message. heap_inplace_update_and_unlock() calls > > CacheInvalidateHeapTupleInplace() while holding BUFFER_LOCK_EXCLUSIVE on a > > buffer of pg_class. CacheInvalidateHeapTupleInplace() may call > > CatalogCacheInitializeCache(), which opens the cache's rel. If there's not a > > valid relcache entry for the catcache's rel, we scan pg_class to make a valid > > relcache entry. The ensuing hang makes sense. > > Personally I never expected that catcache could depend on relcache, > since it seems lower-level. But it makes sense that you need a > relcache of pg_class at least, so their relationship is more > complicated than just layers. > > I'm struggling to understand how your explanation incorporates > *concurrency*, since a self-deadlock only involves locks from one > backend. But the point is that a concurrent invalidation causes the > relcache entry to vanish, so that we need to rebuild it. (We can't get > this far without having built the relcache for pg_class once already.) > > Specifically, we drop the table while autovacuum is updating its > statistics. But how is that possible? Don't both those things > exclusive-lock the row in pg_class? I must be misunderstanding. I was curious so I decided to look into this some more. We have a self-deadlock, but it's rare. Concurrency is necessary to trigger it. The patch fixes the double-locking, but what was the sequence that caused the problem? The only thing reproi.sh does is add & drop tables over and over. I looked at a few reproductions, and the situation was always the same: one of the autovacuum workers was stuck analyzing pg_attribute, with a stack trace like this (no different than Noah's): #0 __futex_abstimed_wait_common (futex_word=futex_word@entry=0x7fa76af77238, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>, cancel=cancel@entry=true) at ./nptl/futex-internal.c:103 #1 0x00007fa7742d5f7b in __GI___futex_abstimed_wait_cancelable64 (futex_word=futex_word@entry=0x7fa76af77238, expected=expected@entry=0, clockid=clockid@entry=0, abstime=abstime@entry=0x0, private=<optimized out>) at ./nptl/futex-internal.c:139 #2 0x00007fa7742e0cff in do_futex_wait (sem=sem@entry=0x7fa76af77238, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:111 #3 0x00007fa7742e0d90 in __new_sem_wait_slow64 (sem=sem@entry=0x7fa76af77238, abstime=0x0, clockid=0) at ./nptl/sem_waitcommon.c:183 #4 0x00007fa7742e0df9 in __new_sem_wait (sem=sem@entry=0x7fa76af77238) at ./nptl/sem_wait.c:42 #5 0x00005583c641a6a8 in PGSemaphoreLock (sema=0x7fa76af77238) at pg_sema.c:327 #6 0x00005583c652e1da in LWLockAcquire (lock=0x7fa76b4fb4e4, mode=mode@entry=LW_SHARED) at lwlock.c:1318 #7 0x00005583c64df801 in LockBuffer (buffer=36, mode=mode@entry=1) at bufmgr.c:4182 #8 0x00005583c5eda4b3 in heapam_index_fetch_tuple (scan=0x7fa76adc1b30, tid=0x7fa76adc3fb8, snapshot=0x5583e8ede3d8, slot=0x7fa76adc1030, call_again=0x7fa76adc3fbe, all_dead=0x7ffebf643e6f) at heapam_handler.c:146 #9 0x00005583c5efb548 in table_index_fetch_tuple (scan=<optimized out>, tid=tid@entry=0x7fa76adc3fb8, snapshot=<optimized out>, slot=slot@entry=0x7fa76adc1030, call_again=call_again@entry=0x7fa76adc3fbe, all_dead=all_dead@entry=0x7ffebf643e6f) at ../../../../src/include/access/tableam.h:1226 #10 0x00005583c5efd1a7 in index_fetch_heap (scan=scan@entry=0x7fa76adc3f58, slot=slot@entry=0x7fa76adc1030) at indexam.c:622 #11 0x00005583c5efd362 in index_getnext_slot (scan=0x7fa76adc3f58, direction=direction@entry=ForwardScanDirection, slot=0x7fa76adc1030) at indexam.c:682 #12 0x00005583c5efa32a in systable_getnext (sysscan=sysscan@entry=0x7fa76adc3dd0) at genam.c:512 #13 0x00005583c677fd20 in ScanPgRelation (targetRelId=<optimized out>, indexOK=indexOK@entry=true, force_non_historic=force_non_historic@entry=false) at relcache.c:385 #14 0x00005583c6780cf8 in RelationReloadNailed (relation=relation@entry=0x7fa7740fd698) at relcache.c:2381 #15 0x00005583c678b67e in RelationClearRelation (relation=relation@entry=0x7fa7740fd698, rebuild=<optimized out>) at relcache.c:2527 #16 0x00005583c678ce85 in RelationFlushRelation (relation=0x7fa7740fd698) at relcache.c:2839 #17 0x00005583c678cf00 in RelationCacheInvalidateEntry (relationId=<optimized out>) at relcache.c:2900 #18 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644220) at inval.c:666 #19 0x00005583c651585b in ReceiveSharedInvalidMessages ( invalFunction=invalFunction@entry=0x5583c677130d <LocalExecuteInvalidationMessage>, resetFunction=resetFunction@entry=0x5583c67700cf <InvalidateSystemCaches>) at sinval.c:121 #20 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766 #21 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>, lockmode=1) at lmgr.c:137 #22 0x00005583c5e503db in relation_open (relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at relation.c:56 #23 0x00005583c5f595e5 in table_open (relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at table.c:43 #24 0x00005583c677fc8c in ScanPgRelation (targetRelId=<optimized out>, indexOK=<optimized out>, force_non_historic=force_non_historic@entry=false) at relcache.c:368 #25 0x00005583c67807c1 in RelationReloadIndexInfo (relation=relation@entry=0x7fa774141028) at relcache.c:2257 #26 0x00005583c6780dff in RelationReloadNailed (relation=relation@entry=0x7fa774141028) at relcache.c:2359 #27 0x00005583c678b67e in RelationClearRelation (relation=relation@entry=0x7fa774141028, rebuild=<optimized out>) at relcache.c:2527 #28 0x00005583c678ce85 in RelationFlushRelation (relation=0x7fa774141028) at relcache.c:2839 #29 0x00005583c678cf00 in RelationCacheInvalidateEntry (relationId=<optimized out>) at relcache.c:2900 #30 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644640) at inval.c:666 #31 0x00005583c651585b in ReceiveSharedInvalidMessages ( invalFunction=invalFunction@entry=0x5583c677130d <LocalExecuteInvalidationMessage>, resetFunction=resetFunction@entry=0x5583c67700cf <InvalidateSystemCaches>) at sinval.c:121 #32 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766 #33 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>, lockmode=1) at lmgr.c:137 #34 0x00005583c5e503db in relation_open (relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at relation.c:56 #35 0x00005583c5f595e5 in table_open (relationId=relationId@entry=1259, lockmode=lockmode@entry=1) at table.c:43 #36 0x00005583c677fc8c in ScanPgRelation (targetRelId=<optimized out>, indexOK=indexOK@entry=true, force_non_historic=force_non_historic@entry=false) at relcache.c:368 #37 0x00005583c6780cf8 in RelationReloadNailed (relation=relation@entry=0x7fa7740fd698) at relcache.c:2381 #38 0x00005583c678b67e in RelationClearRelation (relation=relation@entry=0x7fa7740fd698, rebuild=<optimized out>) at relcache.c:2527 #39 0x00005583c678ce85 in RelationFlushRelation (relation=0x7fa7740fd698) at relcache.c:2839 #40 0x00005583c678cf00 in RelationCacheInvalidateEntry (relationId=<optimized out>) at relcache.c:2900 #41 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644a40) at inval.c:666 #42 0x00005583c651585b in ReceiveSharedInvalidMessages ( invalFunction=invalFunction@entry=0x5583c677130d <LocalExecuteInvalidationMessage>, resetFunction=resetFunction@entry=0x5583c67700cf <InvalidateSystemCaches>) at sinval.c:121 #43 0x00005583c677012c in AcceptInvalidationMessages () at inval.c:766 #44 0x00005583c651f8d2 in LockRelationOid (relid=<optimized out>, lockmode=1) at lmgr.c:137 #45 0x00005583c5e503db in relation_open (relationId=1259, lockmode=lockmode@entry=1) at relation.c:56 #46 0x00005583c5f595e5 in table_open (relationId=<optimized out>, lockmode=lockmode@entry=1) at table.c:43 #47 0x00005583c676b11a in CatalogCacheInitializeCache (cache=cache@entry=0x5583e8e79f80) at catcache.c:929 #48 0x00005583c676e3c2 in PrepareToInvalidateCacheTuple (relation=relation@entry=0x7fa7740fd698, tuple=tuple@entry=0x7fa76adc3a88, newtuple=newtuple@entry=0x0, function=function@entry=0x5583c676fac8 <RegisterCatcacheInvalidation>, context=context@entry=0x7fa76adc38b8) at catcache.c:2164 #49 0x00005583c676fdc3 in CacheInvalidateHeapTupleCommon (relation=relation@entry=0x7fa7740fd698, tuple=tuple@entry=0x7fa76adc3a88, newtuple=newtuple@entry=0x0, prepare_callback=prepare_callback@entry=0x5583c676fe93 <PrepareInplaceInvalidationState>) at inval.c:1350 #50 0x00005583c6770ab9 in CacheInvalidateHeapTupleInplace (relation=relation@entry=0x7fa7740fd698, tuple=tuple@entry=0x7fa76adc3a88, newtuple=newtuple@entry=0x0) at inval.c:1461 #51 0x00005583c5ec8dc9 in heap_inplace_update_and_unlock (relation=0x7fa7740fd698, oldtup=<optimized out>, tuple=0x7fa76adc3a88, buffer=36) at heapam.c:6311 #52 0x00005583c5efb33a in systable_inplace_update_finish (state=0x7fa76adc3748, tuple=<optimized out>) at genam.c:884 #53 0x00005583c61fbc92 in vac_update_relstats (relation=relation@entry=0x7fa774102b20, num_pages=num_pages@entry=89, num_tuples=4460, num_all_visible_pages=<optimized out>, hasindex=hasindex@entry=true, frozenxid=frozenxid@entry=0, minmulti=minmulti@entry=0, frozenxid_updated=0x0, minmulti_updated=0x0, in_outer_xact=false) at vacuum.c:1455 #54 0x00005583c60ec3f2 in do_analyze_rel (onerel=onerel@entry=0x7fa774102b20, params=params@entry=0x5583e8e9bda4, va_cols=va_cols@entry=0x0, acquirefunc=<optimized out>, relpages=89, inh=inh@entry=false, in_outer_xact=false, elevel=13) at analyze.c:645 #55 0x00005583c60ed155 in analyze_rel (relid=<optimized out>, relation=0x5583e8ef9fa0, params=params@entry=0x5583e8e9bda4, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:262 #56 0x00005583c61fd2d4 in vacuum (relations=0x5583e8efc008, params=params@entry=0x5583e8e9bda4, bstrategy=<optimized out>, bstrategy@entry=0x5583e8ef2e00, isTopLevel=isTopLevel@entry=true) at vacuum.c:493 #57 0x00005583c641e58b in autovacuum_do_vac_analyze (tab=tab@entry=0x5583e8e9bda0, bstrategy=bstrategy@entry=0x5583e8ef2e00) at autovacuum.c:3180 #58 0x00005583c6420f84 in do_autovacuum () at autovacuum.c:2503 #59 0x00005583c64219ae in AutoVacWorkerMain (argc=argc@entry=0, argv=argv@entry=0x0) at autovacuum.c:1716 #60 0x00005583c6421bb3 in StartAutoVacWorker () at autovacuum.c:1494 #61 0x00005583c6433c22 in StartAutovacuumWorker () at postmaster.c:5536 #62 0x00005583c64344ca in sigusr1_handler (postgres_signal_arg=<optimized out>) at postmaster.c:5241 #63 <signal handler called> #64 0x00007fa77434e904 in __GI___select (nfds=nfds@entry=8, readfds=readfds@entry=0x7ffebf6461c0, writefds=writefds@entry=0x0, exceptfds=exceptfds@entry=0x0, timeout=timeout@entry=0x7ffebf6461b0) at ../sysdeps/unix/sysv/linux/select.c:69 #65 0x00005583c6435442 in ServerLoop () at postmaster.c:1773 #66 0x00005583c6437c5e in PostmasterMain (argc=argc@entry=1, argv=argv@entry=0x5583e8e0fe00) at postmaster.c:1481 #67 0x00005583c62d57f5 in main (argc=1, argv=0x5583e8e0fe00) at main.c:202 In frame 55 we are analyzing relation 1249 (pg_attribute): (gdb) f 55 #55 0x00005583c60ed155 in analyze_rel (relid=<optimized out>, relation=0x5583e8ef9fa0, params=params@entry=0x5583e8e9bda4, va_cols=0x0, in_outer_xact=<optimized out>, bstrategy=<optimized out>) at analyze.c:262 262 do_analyze_rel(onerel, params, va_cols, acquirefunc, (gdb) p *relation $1 = {type = T_RangeVar, catalogname = 0x0, schemaname = 0x5583e8ef3308 "pg_catalog", relname = 0x5583e8ef32e0 "pg_attribute", inh = true, relpersistence = 112 'p', alias = 0x0, location = -1} (gdb) p *onerel $2 = {rd_node = {spcNode = 1663, dbNode = 5, relNode = 1249}, rd_smgr = 0x5583e8eb98c0, rd_refcnt = 2, rd_backend = -1, rd_islocaltemp = false, rd_isnailed = true, rd_isvalid = true, rd_indexvalid = true, rd_statvalid = false, rd_createSubid = 0, rd_newRelfilenodeSubid = 0, rd_firstRelfilenodeSubid = 0, rd_droppedSubid = 0, rd_rel = 0x7fa774102d38, rd_att = 0x7fa774102e50, rd_id = 1249, rd_lockInfo = {lockRelId = {relId = 1249, dbId = 5}}, rd_rules = 0x0, rd_rulescxt = 0x0, trigdesc = 0x0, rd_rsdesc = 0x0, rd_fkeylist = 0x0, rd_fkeyvalid = false, rd_partkey = 0x0, rd_partkeycxt = 0x0, rd_partdesc = 0x0, rd_pdcxt = 0x0, rd_partdesc_nodetached = 0x0, rd_pddcxt = 0x0, rd_partdesc_nodetached_xmin = 0, rd_partcheck = 0x0, rd_partcheckvalid = false, rd_partcheckcxt = 0x0, rd_indexlist = 0x7fa774120298, rd_pkindex = 2659, rd_replidindex = 0, rd_statlist = 0x0, rd_indexattr = 0x0, rd_keyattr = 0x0, rd_pkattr = 0x0, rd_idattr = 0x0, rd_pubdesc = 0x0, rd_options = 0x0, rd_amhandler = 3, rd_tableam = 0x5583c6b0bf00 <heapam_methods>, rd_index = 0x0, rd_indextuple = 0x0, rd_indexcxt = 0x0, rd_indam = 0x0, rd_opfamily = 0x0, rd_opcintype = 0x0, rd_support = 0x0, rd_supportinfo = 0x0, rd_indoption = 0x0, rd_indexprs = 0x0, rd_indpred = 0x0, rd_exclops = 0x0, rd_exclprocs = 0x0, rd_exclstrats = 0x0, rd_indcollation = 0x0, rd_opcoptions = 0x0, rd_amcache = 0x0, rd_fdwroutine = 0x0, rd_toastoid = 0, pgstat_enabled = true, pgstat_info = 0x5583e8eb0090} It makes sense that pg_attribute would experience a lot of churn, since we just add & drop tables. We call vac_update_relstats, which does an in-place update to pg_class (frame 53). Within vac_update_relstats, we call system_inplace_update_begin, which LWLocks the buffer we want to update. That's the first part of the self-deadlock. But it's no problem yet. We proceed in vac_update_relstats to sys_inplace_update_finish. . . . That sends an inplace invalidation message for the pg_class tuple (frame 50). So we need to replace the tuple in the pg_class catcache. But before we do that, we make sure the catcache is initialized (frame 47): /* Just in case cache hasn't finished initialization yet... */ if (ccp->cc_tupdesc == NULL) CatalogCacheInitializeCache(ccp); That locks pg_class with AccessShareLock, which looks for pending invalidation messages (frame 43). It finds one invalidating the relcache for pg_class: (gdb) f 41 #41 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644a40) at inval.c:666 666 RelationCacheInvalidateEntry(msg->rc.relId); (gdb) p *msg $9 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue = 1259}, cat = {id = -2 '\376', dbId = 5, catId = 1259}, rc = { id = -2 '\376', dbId = 5, relId = 1259}, sm = {id = -2 '\376', backend_hi = -47 '\321', backend_lo = 59642, rnode = {spcNode = 5, dbNode = 1259, relNode = 32679}}, rm = {id = -2 '\376', dbId = 5}, sn = {id = -2 '\376', dbId = 5, relId = 1259}} (Btw where did this message come from?) So we have to rebuild the relcache for pg_class. Obviously we need to look up its entry in (heh) pg_class (frame 37). So we open that table with an AccessShareLock, which looks for pending invalidation messages (frame 32). We find one telling us to invalidate 2662 (pg_class_oid_index): (gdb) f 30 #30 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644640) at inval.c:666 666 RelationCacheInvalidateEntry(msg->rc.relId); (gdb) p *msg $12 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue = 2662}, cat = {id = -2 '\376', dbId = 5, catId = 2662}, rc = { id = -2 '\376', dbId = 5, relId = 2662}, sm = {id = -2 '\376', backend_hi = -47 '\321', backend_lo = 59642, rnode = {spcNode = 5, dbNode = 2662, relNode = 32679}}, rm = {id = -2 '\376', dbId = 5}, sn = {id = -2 '\376', dbId = 5, relId = 2662}} I guess that is coming from the add & drop table commands. So we reload the relcache entry for 2662 (frame 26). That opens pg_class with AccessShareLock, which looks for pending invalidation messages (frame 20). We find one telling us to invalidate pg_class: (gdb) f 18 #18 0x00005583c67714fe in LocalExecuteInvalidationMessage (msg=0x7ffebf644220) at inval.c:666 666 RelationCacheInvalidateEntry(msg->rc.relId); (gdb) p *msg $14 = {id = -2 '\376', cc = {id = -2 '\376', dbId = 5, hashValue = 1259}, cat = {id = -2 '\376', dbId = 5, catId = 1259}, rc = { id = -2 '\376', dbId = 5, relId = 1259}, sm = {id = -2 '\376', backend_hi = 75 'K', backend_lo = 48996, rnode = {spcNode = 5, dbNode = 1259, relNode = 32679}}, rm = {id = -2 '\376', dbId = 5}, sn = {id = -2 '\376', dbId = 5, relId = 1259}} So we rebuild the relcache for pg_class, which means scanning pg_class (frame 14). We try to lock one of its buffers (frame 7), and we self-deadlock. So the problem is triggered by an invalidation message for the pg_class relcache entry. But what sends that message? (Actually it seems we got two of them.) Why does anything we're doing invalidate pg_class? Normally an inplace update to pg_class sends an invalidation message for the relation described by that tuple, not all of pg_class itself (see CacheInvalidateHeapTupleCommon). I couldn't figure that out, until I put a breakpoint at the top of RegisterRelcacheInvalidation and ran `vacuum analyze`. When we analyze pg_class (and btw the repro script gives it a lot of churn), we call vac_update_relstats on it, so we are doing an inplace update of pg_class on the tuple *about* pg_class. So that's why we send the invalidation about the whole table. The concurrency, then, is analyzing pg_attribute in one worker and pg_class in another. No locks would prevent that. It doesn't have to be pg_attribute; it could be some user table. But since the repro script adds & drops tables, pg_attribute is getting churned. Yours, -- Paul ~{:-) pj@illuminatedcomputing.com
pgsql-hackers by date: