Thread: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
The following bug has been logged on the website: Bug reference: 17994 Logged by: Alexander Lakhin Email address: exclusion@gmail.com PostgreSQL version: 16beta1 Operating system: Ubuntu 22.04 Description: The following script: for ((n=1;n<=10;n++)); do echo "ITERATION $n" createdb test numclients=50 for ((c=1;c<=$numclients;c++)); do cat << EOF | psql test -o /dev/null & CREATE PUBLICATION testpub_$c FOR ALL TABLES; CREATE TYPE rec_$c AS (f1 text, f2 text); CREATE TEMP TABLE tbl_$c (f rec_$c); INSERT INTO tbl_$c VALUES ((repeat('1234567890', 100000), 'a')); UPDATE tbl_$c SET f.f2 = 'b'; EOF done wait dropdb test grep 'TRAP:' server.log && break; done invokes errors or a server crash: ITERATION 1 ITERATION 2 ERROR: unsupported byval length: 32639 ITERATION 3 ITERATION 4 server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. connection to server was lost WARNING: terminating connection because of crash of another server process DETAIL: The postmaster has commanded this server process to roll back the current transaction and exit, because another server process exited abnormally and possibly corrupted shared memory. HINT: In a moment you should be able to reconnect to the database and repeat your command. dropdb: error: database removal failed: server closed the connection unexpectedly This probably means the server terminated abnormally before or while processing the request. TRAP: failed Assert("(thisatt->attalign) == TYPALIGN_SHORT"), File: "heaptuple.c", Line: 1312, PID: 1737623 The stack trace of the crash: Core was generated by `postgres: law test [local] UPDATE '. Program terminated with signal SIGABRT, Aborted. warning: Section `.reg-xstate/1737623' in core file too small. #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140071061268416) at ./nptl/pthread_kill.c:44 44 ./nptl/pthread_kill.c: No such file or directory. (gdb) bt #0 __pthread_kill_implementation (no_tid=0, signo=6, threadid=140071061268416) at ./nptl/pthread_kill.c:44 #1 __pthread_kill_internal (signo=6, threadid=140071061268416) at ./nptl/pthread_kill.c:78 #2 __GI___pthread_kill (threadid=140071061268416, signo=signo@entry=6) at ./nptl/pthread_kill.c:89 #3 0x00007f64d7c27476 in __GI_raise (sig=sig@entry=6) at ../sysdeps/posix/raise.c:26 #4 0x00007f64d7c0d7f3 in __GI_abort () at ./stdlib/abort.c:79 #5 0x0000555ede8dc3b3 in ExceptionalCondition ( conditionName=0x555edeb49700 "(thisatt->attalign) == TYPALIGN_SHORT", fileName=0x555edeb49020 "heaptuple.c", lineNumber=1312) at assert.c:66 #6 0x0000555edc7b76d7 in heap_deform_tuple (tuple=0x7ffe49cfe570, tupleDesc=0x7f64ca4562d8, values=0x62500005ad40, isnull=0x62500005b130) at heaptuple.c:1312 #7 0x0000555edd389d0b in ExecEvalFieldStoreDeForm (state=0x62500005b230, op=0x62500005b338, econtext=0x62500005ac20) at execExprInterp.c:3210 #8 0x0000555edd3729b4 in ExecInterpExpr (state=0x62500005b230, econtext=0x62500005ac20, isnull=0x7ffe49cfe900) at execExprInterp.c:1446 #9 0x0000555edd3791f2 in ExecInterpExprStillValid (state=0x62500005b230, econtext=0x62500005ac20, isNull=0x7ffe49cfe900) at execExprInterp.c:1870 #10 0x0000555edd406e84 in ExecEvalExprSwitchContext (state=0x62500005b230, econtext=0x62500005ac20, isNull=0x7ffe49cfe900) at ../../../src/include/executor/executor.h:355 #11 0x0000555edd4070ab in ExecProject (projInfo=0x62500005b228) at ../../../src/include/executor/executor.h:389 #12 0x0000555edd40883e in ExecScan (node=0x62500005ab10, accessMtd=0x555edd5843cf <SeqNext>, recheckMtd=0x555edd5847cd <SeqRecheck>) at execScan.c:237 #13 0x0000555edd58484e in ExecSeqScan (pstate=0x62500005ab10) at nodeSeqscan.c:112 #14 0x0000555edd3f5a71 in ExecProcNodeFirst (node=0x62500005ab10) at execProcnode.c:464 #15 0x0000555edd5495a5 in ExecProcNode (node=0x62500005ab10) at ../../../src/include/executor/executor.h:273 #16 0x0000555edd5674b3 in ExecModifyTable (pstate=0x62500005a490) at nodeModifyTable.c:3616 #17 0x0000555edd3f5a71 in ExecProcNodeFirst (node=0x62500005a490) at execProcnode.c:464 #18 0x0000555edd3af500 in ExecProcNode (node=0x62500005a490) at ../../../src/include/executor/executor.h:273 #19 0x0000555edd3be25a in ExecutePlan (estate=0x62500005a218, planstate=0x62500005a490, use_parallel_mode=false, operation=CMD_UPDATE, sendTuples=false, numberTuples=0, direction=ForwardScanDirection, dest=0x625000072770, execute_once=true) at execMain.c:1670 #20 0x0000555edd3b17f2 in standard_ExecutorRun (queryDesc=0x619000001598, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:365 #21 0x0000555edd3b0e37 in ExecutorRun (queryDesc=0x619000001598, direction=ForwardScanDirection, count=0, execute_once=true) at execMain.c:309 #22 0x0000555ede0d688a in ProcessQuery (plan=0x625000006c28, sourceText=0x625000005218 "UPDATE tbl_14 SET f.f2 = 'b';", params=0x0, queryEnv=0x0, dest=0x625000072770, qc=0x7ffe49cff1c0) at pquery.c:160 #23 0x0000555ede0deb02 in PortalRunMulti (portal=0x625000025a18, isTopLevel=true, setHoldSnapshot=false, dest=0x625000072770, altdest=0x625000072770, qc=0x7ffe49cff1c0) at pquery.c:1277 #24 0x0000555ede0dba90 in PortalRun (portal=0x625000025a18, count=9223372036854775807, isTopLevel=true, run_once=true, dest=0x625000072770, altdest=0x625000072770, qc=0x7ffe49cff1c0) at pquery.c:791 #25 0x0000555ede0c592d in exec_simple_query (query_string=0x625000005218 "UPDATE tbl_14 SET f.f2 = 'b';") at postgres.c:1274 #26 0x0000555ede0d347a in PostgresMain (dbname=0x62900001b358 "test", username=0x6250000020f8 "law") at postgres.c:4632 #27 0x0000555eddcf401f in BackendRun (port=0x614000016040) at postmaster.c:4461 #28 0x0000555eddcf2723 in BackendStartup (port=0x614000016040) at postmaster.c:4189 #29 0x0000555eddce84dd in ServerLoop () at postmaster.c:1779 #30 0x0000555eddce69fe in PostmasterMain (argc=3, argv=0x6030000006a0) at postmaster.c:1463 #31 0x0000555edd67850e in main (argc=3, argv=0x6030000006a0) at main.c:198 (gdb) frame 6 #6 0x0000555edc7b76d7 in heap_deform_tuple (tuple=0x7ffe49cfe570, tupleDesc=0x7f64ca4562d8, values=0x62500005ad40, isnull=0x62500005b130) at heaptuple.c:1312 1312 off = att_align_nominal(off, thisatt->attalign); (gdb) p thisatt->attalign $1 = 0 '\000' (gdb) frame 7 #7 0x0000555edd389d0b in ExecEvalFieldStoreDeForm (state=0x62500005b230, op=0x62500005b338, econtext=0x62500005ac20) at execExprInterp.c:3210 3210 heap_deform_tuple(&tmptup, tupDesc, (gdb) p *tupDesc $2 = {natts = 18406, tdtypeid = 1952409456, tdtypmod = 1953718639, tdrefcount = 859320671, constr = 0x3338, attrs = 0x7f64ca4562f0} With some debug logging added I see (note the natts value): 2023-06-24 15:23:12.488 MSK [1741232] LOG: get_cached_rowtype() after lookup_type_cache| type_id: 16389, typentry->tupDesc: 0x7ff12f045640, typentry->tupDesc->natts: 2 2023-06-24 15:23:12.488 MSK [1741232] STATEMENT: UPDATE tbl_5 SET f.f2 = 'b'; 2023-06-24 15:23:12.488 MSK [1741293] LOG: statement: CREATE TYPE rec_23 AS (f1 text, f2 text); 2023-06-24 15:23:12.488 MSK [1741232] LOG: ExecEvalFieldStoreDeForm() before heap_deform_tuple| tupDesc: 0x7ff12f045640, tupDesc->natts: 16427 2023-06-24 15:23:12.488 MSK [1741232] STATEMENT: UPDATE tbl_5 SET f.f2 = 'b'; TRAP: failed Assert("(thisatt->attalign) == TYPALIGN_SHORT"), File: "heaptuple.c", Line: 1312, PID: 1741232 So it looks like *tupDesc memory was overridden due to relcache invalidation during ExecEvalFieldStoreDeForm() execution after the get_cached_rowtype() call. The pg_usleep(1000) call added inside ExecEvalFieldStoreDeForm() after get_cached_rowtype() greatly increases the probability of the crash. Reproduced on REL_11_STABLE (since 96e38fa5e) .. master.
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
PG Bug reporting form <noreply@postgresql.org> writes: > The following script: > ... > invokes errors or a server crash: Thanks for the report! The good news is that the case shown here is easily fixed, as attached. get_cached_rowtype() explicitly specifies that its result doesn't hold good across database access if you don't increment the tupdesc's refcount. Most of the callers get this right, but ExecEvalFieldStoreDeForm() is doing things in the wrong order by fetching the tupdesc before detoasting the input datum. We could fix that by temporarily incrementing the refcount, but I don't see any reason we can't just reorder the steps to make it safe. The bad news is that while investigating this I came across another hazard that seems much messier to fix. To wit, if you have a tuple with "missing" pass-by-ref columns, then some of the columns output by heap_deform_tuple() will contain pointers into the tupdesc (cf. getmissingattr()). That's not sufficient lifespan in the scenario we're dealing with here: if the tupdesc gets trashed anytime before the eventual ExecEvalFieldStoreForm(), we'll have garbage in the result tuple. Worse, I fear there may be many other places with latent bugs of the same ilk. Before the attmissingval code was added, one could assume that the result of heap_deform_tuple would hold good as long as you had a pin on the source tuple. But now it depends on metadata as well, and I don't think we have a coherent story about that. Any thoughts what to do? I considered making getmissingattr apply datumCopy, but that would probably lead to unpleasant memory leaks. regards, tom lane diff --git a/src/backend/executor/execExprInterp.c b/src/backend/executor/execExprInterp.c index 7a4d7a4eee..851946a927 100644 --- a/src/backend/executor/execExprInterp.c +++ b/src/backend/executor/execExprInterp.c @@ -2015,7 +2015,8 @@ CheckOpSlotCompatibility(ExprEvalStep *op, TupleTableSlot *slot) * changed: if not NULL, *changed is set to true on any update * * The returned TupleDesc is not guaranteed pinned; caller must pin it - * to use it across any operation that might incur cache invalidation. + * to use it across any operation that might incur cache invalidation, + * including for example detoasting of input tuples. * (The TupleDesc is always refcounted, so just use IncrTupleDescRefCount.) * * NOTE: because composite types can change contents, we must be prepared @@ -3174,17 +3175,6 @@ ExecEvalFieldSelect(ExprState *state, ExprEvalStep *op, ExprContext *econtext) void ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econtext) { - TupleDesc tupDesc; - - /* Lookup tupdesc if first time through or if type changes */ - tupDesc = get_cached_rowtype(op->d.fieldstore.fstore->resulttype, -1, - op->d.fieldstore.rowcache, NULL); - - /* Check that current tupdesc doesn't have more fields than we allocated */ - if (unlikely(tupDesc->natts > op->d.fieldstore.ncolumns)) - elog(ERROR, "too many columns in composite type %u", - op->d.fieldstore.fstore->resulttype); - if (*op->resnull) { /* Convert null input tuple into an all-nulls row */ @@ -3200,6 +3190,7 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte Datum tupDatum = *op->resvalue; HeapTupleHeader tuphdr; HeapTupleData tmptup; + TupleDesc tupDesc; tuphdr = DatumGetHeapTupleHeader(tupDatum); tmptup.t_len = HeapTupleHeaderGetDatumLength(tuphdr); @@ -3207,6 +3198,20 @@ ExecEvalFieldStoreDeForm(ExprState *state, ExprEvalStep *op, ExprContext *econte tmptup.t_tableOid = InvalidOid; tmptup.t_data = tuphdr; + /* + * Lookup tupdesc if first time through or if type changes. Because + * we don't pin the tupdesc, we must not do this lookup until after + * doing DatumGetHeapTupleHeader: that could do database access while + * detoasting the datum. + */ + tupDesc = get_cached_rowtype(op->d.fieldstore.fstore->resulttype, -1, + op->d.fieldstore.rowcache, NULL); + + /* Check that current tupdesc doesn't have more fields than allocated */ + if (unlikely(tupDesc->natts > op->d.fieldstore.ncolumns)) + elog(ERROR, "too many columns in composite type %u", + op->d.fieldstore.fstore->resulttype); + heap_deform_tuple(&tmptup, tupDesc, op->d.fieldstore.values, op->d.fieldstore.nulls);
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-06-28 14:49:34 -0400, Tom Lane wrote: > The bad news is that while investigating this I came across > another hazard that seems much messier to fix. To wit, if > you have a tuple with "missing" pass-by-ref columns, then > some of the columns output by heap_deform_tuple() will > contain pointers into the tupdesc (cf. getmissingattr()). > That's not sufficient lifespan in the scenario we're dealing > with here: if the tupdesc gets trashed anytime before the > eventual ExecEvalFieldStoreForm(), we'll have garbage in the > result tuple. What are the scenarios that could lead to the tupledesc being released before ExecEvalFieldStoreForm()? I guess a function call that does an ALTER TABLE might do the trick? But shouldn't such cases be prohibited by CheckTableNotInUse()? If other sessions caused the tupledesc to be changed, we should already hang onto the old definition via the RememberToFreeTupleDescAtEOX() mechanism? What about erroring out if the get_cached_rowtype() in ExecEvalFieldStoreForm() indicates the tupledesc changed? Hm, I guess that could cause spurious errors, our mechanism for determining whether tupledescs changed is pretty coarse. > Worse, I fear there may be many other places with latent bugs of the > same ilk. Before the attmissingval code was added, one could assume > that the result of heap_deform_tuple would hold good as long as you > had a pin on the source tuple. But now it depends on metadata as > well, and I don't think we have a coherent story about that. > > Any thoughts what to do? I considered making getmissingattr > apply datumCopy, but that would probably lead to unpleasant > memory leaks. Ugh. This is pretty nasty. The only real thing I can think of is to use refcounted tupledescs more widely. Part of the problem would be dealing with releasing the refcounts, another what to do about non-refcounted descs. WRT the former, I guess we could add a variant of ResourceOwnerRememberTupleDesc() that doesn't warn on commit if there are unreleased references. Greetings, Andres Freund
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andres Freund <andres@anarazel.de> writes: > On 2023-06-28 14:49:34 -0400, Tom Lane wrote: >> That's not sufficient lifespan in the scenario we're dealing >> with here: if the tupdesc gets trashed anytime before the >> eventual ExecEvalFieldStoreForm(), we'll have garbage in the >> result tuple. > What are the scenarios that could lead to the tupledesc being released before > ExecEvalFieldStoreForm()? I guess a function call that does an ALTER TABLE > might do the trick? But shouldn't such cases be prohibited by > CheckTableNotInUse()? Any old cache flush could do it; you don't need a local trigger event. (Alexander's original test case uses CREATE PUBLICATION to trigger the cache flush, but that's just an easy way to make it more-or-less deterministic.) > If other sessions caused the tupledesc to be changed, > we should already hang onto the old definition via the > RememberToFreeTupleDescAtEOX() mechanism? I believe the tupdesc in question is actually in the typcache, which doesn't have anything like RememberToFreeTupleDescAtEOX (which is a horrid hack anyway if you ask me). We could probably make things better for this specific case by teaching the typcache not to replace a cached tupdesc unless its contents actually change. But that just makes it harder to get to a bug instance; it's not a cure-all. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-06-28 15:52:28 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > If other sessions caused the tupledesc to be changed, > > we should already hang onto the old definition via the > > RememberToFreeTupleDescAtEOX() mechanism? > > I believe the tupdesc in question is actually in the typcache, > which doesn't have anything like RememberToFreeTupleDescAtEOX > (which is a horrid hack anyway if you ask me). It's in the typecache, but that just uses the relcache's tupledesc for non-record composites. But looks like it doesn't suffice, because TypeCacheRelCallback() releases the refcount the typecache held, regardless of the tupledesc having changed meaningfully or not. So even if there can't have been "important" changes to the tupledesc due to locking, we end up with the newer tupledesc on a second lookup... I agree that the RememberToFreeTupleDescAtEOX thing is a ugly hack, but I don't think it's easy to come up with something good... > We could probably make things better for this specific case by > teaching the typcache not to replace a cached tupdesc unless its > contents actually change. But that just makes it harder to get > to a bug instance; it's not a cure-all. Yea :(. Greetings, Andres Freund
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-06-28 15:52:28 -0400, Tom Lane wrote:Andres Freund <andres@anarazel.de> writes:If other sessions caused the tupledesc to be changed, we should already hang onto the old definition via the RememberToFreeTupleDescAtEOX() mechanism?I believe the tupdesc in question is actually in the typcache, which doesn't have anything like RememberToFreeTupleDescAtEOX (which is a horrid hack anyway if you ask me).It's in the typecache, but that just uses the relcache's tupledesc for non-record composites. But looks like it doesn't suffice, because TypeCacheRelCallback() releases the refcount the typecache held, regardless of the tupledesc having changed meaningfully or not. So even if there can't have been "important" changes to the tupledesc due to locking, we end up with the newer tupledesc on a second lookup... I agree that the RememberToFreeTupleDescAtEOX thing is a ugly hack, but I don't think it's easy to come up with something good...We could probably make things better for this specific case by teaching the typcache not to replace a cached tupdesc unless its contents actually change. But that just makes it harder to get to a bug instance; it's not a cure-all.Yea :(.
:-(
I thought about whether the datumCopy() idea might be manageable, but getmissingattr() is inlined by heap_getttr() which has distressingly large number of call sites, including in third party code.
Could we maybe cache missing values elsewhere in a way that's less volatile? That's an extremely vague idea, just thinking out loud.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
On 2023-06-28 We 16:54, Andres Freund wrote:Hi, On 2023-06-28 15:52:28 -0400, Tom Lane wrote:Andres Freund <andres@anarazel.de> writes:If other sessions caused the tupledesc to be changed, we should already hang onto the old definition via the RememberToFreeTupleDescAtEOX() mechanism?I believe the tupdesc in question is actually in the typcache, which doesn't have anything like RememberToFreeTupleDescAtEOX (which is a horrid hack anyway if you ask me).It's in the typecache, but that just uses the relcache's tupledesc for non-record composites. But looks like it doesn't suffice, because TypeCacheRelCallback() releases the refcount the typecache held, regardless of the tupledesc having changed meaningfully or not. So even if there can't have been "important" changes to the tupledesc due to locking, we end up with the newer tupledesc on a second lookup... I agree that the RememberToFreeTupleDescAtEOX thing is a ugly hack, but I don't think it's easy to come up with something good...We could probably make things better for this specific case by teaching the typcache not to replace a cached tupdesc unless its contents actually change. But that just makes it harder to get to a bug instance; it's not a cure-all.Yea :(.
:-(
I thought about whether the datumCopy() idea might be manageable, but getmissingattr() is inlined by heap_getttr() which has distressingly large number of call sites, including in third party code.
Could we maybe cache missing values elsewhere in a way that's less volatile? That's an extremely vague idea, just thinking out loud.
After (not) sleeping on this overnight, and discussing it with a colleague this morning, here's a suggestion. We have a hash table, keyed by (tdtypeid, attnum) where we store a datumCopy'd version of the value. If it's present just return the value instead of getting it from the tupdesc. The hash table is blown away at the end of the transaction. Assuming that's workable I think it would not be a large patch.
My colleague did ask if we had live reproducible case.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > After (not) sleeping on this overnight, and discussing it with a > colleague this morning, here's a suggestion. We have a hash table, keyed > by (tdtypeid, attnum) where we store a datumCopy'd version of the value. > If it's present just return the value instead of getting it from the > tupdesc. The hash table is blown away at the end of the transaction. > Assuming that's workable I think it would not be a large patch. That sounds possibly workable. I'm a bit concerned about added overhead, and about whether the hashtable needs invalidation support. It might be better to key it off (relfilenode, attnum). > My colleague did ask if we had live reproducible case. I have not attempted to build one, but it'd probably be fairly easy to do if you turn on debug_discard_caches. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:After (not) sleeping on this overnight, and discussing it with a colleague this morning, here's a suggestion. We have a hash table, keyed by (tdtypeid, attnum) where we store a datumCopy'd version of the value. If it's present just return the value instead of getting it from the tupdesc. The hash table is blown away at the end of the transaction. Assuming that's workable I think it would not be a large patch.That sounds possibly workable.
OK, good, we have a plan.
I'm a bit concerned about added overhead, and about whether the hashtable needs invalidation support. It might be better to key it off (relfilenode, attnum).
re overhead: getmissingattr isn't called at all except for "off the end" attributes. Setting up the hash table at most once per txn doesn't seem likely to cost much, and even that won't be done if getmissingattr isn't called.
re relfilenode: we don't have it in getmissingattr, so that would involve looking it up or whacking around the API. Neither of those seem good. Do you have any reason for thinking tdtypeid will not be sufficient?
re invalidation: that seems to suggest that the missing value could change under us. I don't think it can, but if it can then more than just this is broken. If not, how would invalidation affect us?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > On 2023-06-29 Th 10:26, Tom Lane wrote: >> I'm a bit concerned about added >> overhead, and about whether the hashtable needs invalidation support. >> It might be better to key it off (relfilenode, attnum). > re relfilenode: we don't have it in getmissingattr, so that would > involve looking it up or whacking around the API. Yeah, I was afraid of that. > re invalidation: that seems to suggest that the missing value could > change under us. I don't think it can, but if it can then more than just > this is broken. If not, how would invalidation affect us? The scenario I'm afraid of is that we cache a missingval for table X, then X gets dropped, then a new table Y gets created that by bad luck has the same type OID as X did, then we add a column to Y that requires a missingval, and now we have an entry in the hashtable that matches Y but contains the wrong value. Admittedly, it seems very low probability that this would all happen within the span of one transaction, so maybe we can get away with ignoring the case. But if we used relfilenode, we'd have at least a little more protection because of the tombstone files that prevent immediate re-use of a relfilenode OID. I'm not sure that it'd be bulletproof even with relfilenode, though. Maybe we should bite the bullet and provide invalidation based on a pg_type inval callback. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:On 2023-06-29 Th 10:26, Tom Lane wrote:I'm a bit concerned about added overhead, and about whether the hashtable needs invalidation support. It might be better to key it off (relfilenode, attnum).re relfilenode: we don't have it in getmissingattr, so that would involve looking it up or whacking around the API.Yeah, I was afraid of that.re invalidation: that seems to suggest that the missing value could change under us. I don't think it can, but if it can then more than just this is broken. If not, how would invalidation affect us?The scenario I'm afraid of is that we cache a missingval for table X, then X gets dropped, then a new table Y gets created that by bad luck has the same type OID as X did, then we add a column to Y that requires a missingval, and now we have an entry in the hashtable that matches Y but contains the wrong value. Admittedly, it seems very low probability that this would all happen within the span of one transaction, so maybe we can get away with ignoring the case. But if we used relfilenode, we'd have at least a little more protection because of the tombstone files that prevent immediate re-use of a relfilenode OID. I'm not sure that it'd be bulletproof even with relfilenode, though. Maybe we should bite the bullet and provide invalidation based on a pg_type inval callback.
Yeah, Robert has just convinced me, so I'll do it like that. It doesn't look too hard.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > On 2023-06-29 Th 15:25, Tom Lane wrote: >> Maybe we should bite the bullet and provide >> invalidation based on a pg_type inval callback. > Yeah, Robert has just convinced me, so I'll do it like that. It doesn't > look too hard. Oh, I have a better idea. We're only going to need all this for pass-by-ref types, right? Why not make the hash key be the value itself? Wrap it in a bytea perhaps to avoid needing a bespoke hash function. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:On 2023-06-29 Th 15:25, Tom Lane wrote:Maybe we should bite the bullet and provide invalidation based on a pg_type inval callback.Yeah, Robert has just convinced me, so I'll do it like that. It doesn't look too hard.Oh, I have a better idea. We're only going to need all this for pass-by-ref types, right?
Yes, the value we get back for byval types isn't a pointer that might disappear.
Why not make the hash key be the value itself? Wrap it in a bytea perhaps to avoid needing a bespoke hash function.
Not sure I understand.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > On 2023-06-29 Th 18:41, Tom Lane wrote: >> Why not make the hash key be the value >> itself? Wrap it in a bytea perhaps to avoid needing a bespoke >> hash function. > Not sure I understand. Say the missingval for a particular column is text 'abc'. We don't actually care which column it is, all we need is a copy of that datum that will stay put for the rest of the transaction. So I'm thinking that the lookup key for the hash table should actually be the contents of the datum, and we don't need to store anything else at all. (If we happen to have two columns with the same missingval, they can perfectly well share this hash entry.) Then there's no question of invalidation, or at least the existing invalidation mechanisms for tupdescs do all we need. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:On 2023-06-29 Th 18:41, Tom Lane wrote:Why not make the hash key be the value itself? Wrap it in a bytea perhaps to avoid needing a bespoke hash function.Not sure I understand.Say the missingval for a particular column is text 'abc'. We don't actually care which column it is, all we need is a copy of that datum that will stay put for the rest of the transaction. So I'm thinking that the lookup key for the hash table should actually be the contents of the datum, and we don't need to store anything else at all. (If we happen to have two columns with the same missingval, they can perfectly well share this hash entry.) Then there's no question of invalidation, or at least the existing invalidation mechanisms for tupdescs do all we need.
OK, I get it. Do we have a routine to wrap a Datum in a bytea, or do I need to write one? It's slightly amusing that the original patch for fast defaults stored the missing value as a bytea, but someone (Andres IIRC) preferred that we store it as a one element array, so that's what we went with.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > OK, I get it. Do we have a routine to wrap a Datum in a bytea, or do I > need to write one? I don't recall that functionality being exported anywhere, but it'd only be a couple lines of code in any case. (Whether it's actually worth doing, I'm not sure about. It only helps if it lets you use existing hashtable support routines, but I'm not quite sure that any of those exist for bytea either.) regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:OK, I get it. Do we have a routine to wrap a Datum in a bytea, or do I need to write one?I don't recall that functionality being exported anywhere, but it'd only be a couple lines of code in any case. (Whether it's actually worth doing, I'm not sure about. It only helps if it lets you use existing hashtable support routines, but I'm not quite sure that any of those exist for bytea either.)
Me either. I think this might call for too much invention so I'm going to revert to plan A. The invalidation code won't be very much, and it should be a fairly rare event, so it doesn't need to be very clever.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > Me either. I think this might call for too much invention so I'm going > to revert to plan A. The invalidation code won't be very much, and it > should be a fairly rare event, so it doesn't need to be very clever. The problem with rarely-executed code is that it's also hard to test. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:Me either. I think this might call for too much invention so I'm going to revert to plan A. The invalidation code won't be very much, and it should be a fairly rare event, so it doesn't need to be very clever.The problem with rarely-executed code is that it's also hard to test.
Your Plan B in the end proved less difficult than I thought, and certainly seems more robust than having to tangle with invalidations. I didn't try to do anything to wrap the values in a bytea, it didn't seem necessary. Here's a patch - it's not terribly long or invasive. I haven't tried backpatching it yet.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-07-01 09:14:39 -0400, Andrew Dunstan wrote: > On 2023-06-30 Fr 15:08, Tom Lane wrote: > > Andrew Dunstan<andrew@dunslane.net> writes: > > > Me either. I think this might call for too much invention so I'm going > > > to revert to plan A. The invalidation code won't be very much, and it > > > should be a fairly rare event, so it doesn't need to be very clever. > > The problem with rarely-executed code is that it's also hard to test. > Your Plan B in the end proved less difficult than I thought, and certainly > seems more robust than having to tangle with invalidations. I didn't try to > do anything to wrap the values in a bytea, it didn't seem necessary. Here's > a patch - it's not terribly long or invasive. I haven't tried backpatching > it yet. Hm. Is tying this to top-level transactions the right choice? What about sequences like SAVEPOINT x; ALTER TABLE ... ADD COLUMN ... DEFAULT; SELECT use_default FROM ...; ROLLBACK TO SAVEPOINT x; SAVEPOINT y; ALTER TABLE ... ADD COLUMN ... DEFAULT; SELECT use_default FROM ...; ROLLBACK TO SAVEPOINT y; Isn't that going to break the assumption that the key is unique within a transaction? I think at the very least you'd need to make Separately, will this work correctly with procedures keeping values alive across transactions? I don't feel like I have a good grasp at how that whole area is supposed to work... Greetings, Andres Freund
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andres Freund <andres@anarazel.de> writes: > Isn't that going to break the assumption that the key is unique within a > transaction? Huh? "abc" is "abc", no matter what. At least if Andrew did what I suggested (I didn't look at the patch yet). > Separately, will this work correctly with procedures keeping values alive > across transactions? That might be an issue. But couldn't we make this cache just live for the life of the process? It's unlikely to get large. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-07-02 17:57:03 -0400, Tom Lane wrote: > Andres Freund <andres@anarazel.de> writes: > > Isn't that going to break the assumption that the key is unique within a > > transaction? > > Huh? "abc" is "abc", no matter what. At least if Andrew did what > I suggested (I didn't look at the patch yet). Yea, I think that was a brainfart after too briefly skimming the code. > > Separately, will this work correctly with procedures keeping values alive > > across transactions? > > That might be an issue. But couldn't we make this cache just live for > the life of the process? It's unlikely to get large. I don't have a good handle about how big it'd end up being in some of the less common workloads. I can imagine workloads with temp tables or such churning through a lot of default values - often the "keyed by value" approach will save the day, but I imagine not always. .oO(Perhaps we need to add a boehm style GC ... No.) Perhaps we could defer resetting the cache to when we're not inside a procedure? I kinda wonder if this isn't basically the start of a "string interning" style infrastructure, except for more types than just strings... I've wondered about having that quite a few times. Greetings, Andres Freund
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-07-02 17:57:03 -0400, Tom Lane wrote:Andres Freund <andres@anarazel.de> writes:Isn't that going to break the assumption that the key is unique within a transaction?Huh? "abc" is "abc", no matter what. At least if Andrew did what I suggested (I didn't look at the patch yet).Yea, I think that was a brainfart after too briefly skimming the code.Separately, will this work correctly with procedures keeping values alive across transactions?That might be an issue. But couldn't we make this cache just live for the life of the process? It's unlikely to get large.I don't have a good handle about how big it'd end up being in some of the less common workloads. I can imagine workloads with temp tables or such churning through a lot of default values - often the "keyed by value" approach will save the day, but I imagine not always.
The maximum number of entries in the table is the number of pg_attribute rows with atthasmissing = true and attbyval = false. In practice I suspect that's mostly going to be fairly low.
.oO(Perhaps we need to add a boehm style GC ... No.) Perhaps we could defer resetting the cache to when we're not inside a procedure?
I'm kinda leaning towards Tom's suggestion to just make it session-persistent.
I kinda wonder if this isn't basically the start of a "string interning" style infrastructure, except for more types than just strings... I've wondered about having that quite a few times.
maybe.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
On 2023-07-02 Su 18:13, Andres Freund wrote:Hi, On 2023-07-02 17:57:03 -0400, Tom Lane wrote:Andres Freund <andres@anarazel.de> writes:Isn't that going to break the assumption that the key is unique within a transaction?Huh? "abc" is "abc", no matter what. At least if Andrew did what I suggested (I didn't look at the patch yet).Yea, I think that was a brainfart after too briefly skimming the code.Separately, will this work correctly with procedures keeping values alive across transactions?That might be an issue. But couldn't we make this cache just live for the life of the process? It's unlikely to get large.I don't have a good handle about how big it'd end up being in some of the less common workloads. I can imagine workloads with temp tables or such churning through a lot of default values - often the "keyed by value" approach will save the day, but I imagine not always.The maximum number of entries in the table is the number of pg_attribute rows with atthasmissing = true and attbyval = false. In practice I suspect that's mostly going to be fairly low.
.oO(Perhaps we need to add a boehm style GC ... No.) Perhaps we could defer resetting the cache to when we're not inside a procedure?
I'm kinda leaning towards Tom's suggestion to just make it session-persistent.
The thread seems to have died down a bit. Do we have a consensus on Tom's approach?
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-07-08 08:48:00 -0400, Andrew Dunstan wrote: > On 2023-07-02 Su 22:15, Andrew Dunstan wrote: > > > > > Separately, will this work correctly with procedures keeping values alive > > > > > across transactions? > > > > That might be an issue. But couldn't we make this cache just live for > > > > the life of the process? It's unlikely to get large. > > > I don't have a good handle about how big it'd end up being in some of the less > > > common workloads. I can imagine workloads with temp tables or such churning > > > through a lot of default values - often the "keyed by value" approach will > > > save the day, but I imagine not always. > > > > The maximum number of entries in the table is the number of pg_attribute > > rows with atthasmissing = true and attbyval = false. In practice I > > suspect that's mostly going to be fairly low. It's not really bound by that, because the set of rows can change over time. Particularly with temp tables. > The thread seems to have died down a bit. Do we have a consensus on Tom's > approach? I guess so. It's far from pretty, but nobody really has come up with something better. Greetings, Andres Freund
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hi, On 2023-07-08 08:48:00 -0400, Andrew Dunstan wrote:On 2023-07-02 Su 22:15, Andrew Dunstan wrote:Separately, will this work correctly with procedures keeping values alive across transactions?That might be an issue. But couldn't we make this cache just live for the life of the process? It's unlikely to get large.I don't have a good handle about how big it'd end up being in some of the less common workloads. I can imagine workloads with temp tables or such churning through a lot of default values - often the "keyed by value" approach will save the day, but I imagine not always.The maximum number of entries in the table is the number of pg_attribute rows with atthasmissing = true and attbyval = false. In practice I suspect that's mostly going to be fairly low.It's not really bound by that, because the set of rows can change over time. Particularly with temp tables.
How many times are people going to add a new column with a non-null default to a temp table? Usually you know the shape you want for a temp table when you create it, I should think. Even in a long-running pgbouncer session I wouldn't expect this to balloon substantially.
The thread seems to have died down a bit. Do we have a consensus on Tom's approach?I guess so. It's far from pretty, but nobody really has come up with something better.
OK, I'll send a revised patch.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
On 2023-07-10 Mo 15:51, Andres Freund wrote:Hi, On 2023-07-08 08:48:00 -0400, Andrew Dunstan wrote:On 2023-07-02 Su 22:15, Andrew Dunstan wrote:Separately, will this work correctly with procedures keeping values alive across transactions?That might be an issue. But couldn't we make this cache just live for the life of the process? It's unlikely to get large.I don't have a good handle about how big it'd end up being in some of the less common workloads. I can imagine workloads with temp tables or such churning through a lot of default values - often the "keyed by value" approach will save the day, but I imagine not always.The maximum number of entries in the table is the number of pg_attribute rows with atthasmissing = true and attbyval = false. In practice I suspect that's mostly going to be fairly low.It's not really bound by that, because the set of rows can change over time. Particularly with temp tables.
How many times are people going to add a new column with a non-null default to a temp table? Usually you know the shape you want for a temp table when you create it, I should think. Even in a long-running pgbouncer session I wouldn't expect this to balloon substantially.
The thread seems to have died down a bit. Do we have a consensus on Tom's approach?I guess so. It's far from pretty, but nobody really has come up with something better.
OK, I'll send a revised patch.
Here it is. The nice thing here is that the code changes are entirely confined to heaptuple.c
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Attachment
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hello, 28.06.2023 21:49, Tom Lane wrote: > The bad news is that while investigating this I came across > another hazard that seems much messier to fix. To wit, if > you have a tuple with "missing" pass-by-ref columns, then > some of the columns output by heap_deform_tuple() will > contain pointers into the tupdesc (cf. getmissingattr()). > That's not sufficient lifespan in the scenario we're dealing > with here: if the tupdesc gets trashed anytime before the > eventual ExecEvalFieldStoreForm(), we'll have garbage in the > result tuple. Please excuse me, as I'm definitely late to the party, but may I ask some questions to understand the issue discussed? I still can't see a practical way to get garbage data in a tuple with missing columns, so maybe I'm missing something (except for columns). If we talk about ExecEvalFieldStoreDeForm() -> heap_deform_tuple() -> getmissingattr(), then I can't find a way to add a non-null default for a record type to exploit this path. The rowtypes test contains: -- at the moment this will not work due to ALTER TABLE inadequacy: alter table fullname add column suffix text default ''; ERROR: cannot alter table "fullname" because column "people.fn" uses its row type So it seems to me, that ExecEvalFieldStoreDeForm() can't be called for a record/tuple having missing non-null attributes. And even if ALTER TABLE was "adequate", does it mean that ExecEvalFieldStoreDeForm() would return default values from the underlying table type? > Worse, I fear there may be many other places with latent bugs of the > same ilk. Before the attmissingval code was added, one could assume > that the result of heap_deform_tuple would hold good as long as you > had a pin on the source tuple. But now it depends on metadata as > well, and I don't think we have a coherent story about that. But is it true, that having a pin on tupdesc is enough to make sure that the result of heap_deform_tuple() holds good? If so, then probably we should find places, where tupdesc gets unpinned before that result is saved somewhere. I see the following callers of the heap_deform_tuple(): src/backend/replication/logical/reorderbuffer.c ReorderBufferToastReplace(): heap_deform_tuple() followed by heap_form_tuple() src/backend/executor/execTuples.c ExecForceStoreHeapTuple(), ExecForceStoreMinimalTuple(), ExecStoreHeapTupleDatum(): use long-living and hopefully pinned slot->tts_tupleDescriptor src/backend/executor/execExprInterp.c ExecEvalFieldStoreDeForm(): the above question applies here src/backend/executor/spi.c SPI_modifytuple(): heap_deform_tuple() followed by heap_form_tuple() src/backend/utils/adt/rowtypes.c record_out(), hash_record(), hash_record_extended(): tupdesc released after extracting/processing data record_send(): tupdesc released after sending data record_cmp(), record_eq(), record_image_cmp(), record_image_eq(): tupdesc1. tupdesc2 released after comparing data src/backend/utils/adt/jsonfuncs.c populate_record(): heap_deform_tuple() followed by heap_form_tuple() src/backend/utils/adt/expandedrecord.c deconstruct_expanded_record() uses long-living (pinned) erh->er_tupdesc src/backend/utils/adt/ri_triggers.c RI_Initial_Check(), RI_PartitionRemove_Check(): use long-living? SPI_tuptable->tupdesc src/backend/access/heap/heapam_handler.c reform_and_rewrite_tuple(): uses RelationGetDescr(OldHeap) (pinned, I suppose) src/backend/access/heap/heaptoast.c heap_toast_delete(), heap_toast_insert_or_update() use long-living rel->rd_att; toast_flatten_tuple(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc, ...) toast_flatten_tuple_to_datum(): heap_deform_tuple followed by detoasting attrs src/backend/access/heap/heapam.c ExtractReplicaIdentity(): heap_deform_tuple(..., desc, ...) followed by heap_form_tuple(..., desc, ...) src/backend/access/common/heaptuple.c heap_modify_tuple(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc, ...) heap_modify_tuple_by_cols(): heap_deform_tuple(..., tupleDesc, ...) followed by heap_form_tuple(..., tupleDesc, ...) src/backend/access/common/tupconvert.c execute_attr_map_tuple(): heap_deform_tuple() followed by heap_form_tuple() src/test/regress/regress.c make_tuple_indirect(): heap_deform_tuple() followed by heap_form_tuple() src/pl/plpgsql/src/pl_exec.c exec_move_row() called with tupdesc = NULL, SPI_tuptable->tupdesc, or tupdesc released after the call contrib/hstore/hstore_io.c hstore_from_record(), hstore_populate_record(): tupdesc released after extracting data And even if I missed some possibly problematic calls, maybe it's worth to consider fixing exactly that places... Also, besides heap_deform_tuple(), getmissingattr() is called from heap_getattr(). And heap_getattr() is used in many places, but most of them are protected too due to tupdesc pinned. Two suspicious places that I found are GetAttributeByName() and GetAttributeByNum(), so when using these functions you can really get the missing attribute value with the tupdesc released. But what circumstances do we need to end up with an invalid data? 1) Write a function that will call GetAttributeByName() and hold it's result inside for some time. 2) Add a passed-by-ref column with a default value to a table. 3) Pass to that function a tuple without the "missing" column (just "SELECT func(table) FROM table" won't do, but func(OLD) in a trigger should). 4) Trigger a relcache invalidation. 5) Perform a database access inside that function to process the invalidation. 6) Access the result of GetAttributeByName() after that. Maybe we could construct such test case, e.g. add to pg_regress one more SQL-accessible C function, bu9ccvhhihvcnb·t I wonder, is this the way to go? On the other hand, probably there are extensions, that use GetAttributeByName() in the non-safe way (as the function documentation doesn't warn about such issues), but maybe just perform datumCopy inside that function (and similar one(s))? Though, perhaps I just don't notice the elephant in the core... Best regards, Alexander
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
On 2023-07-15 Sa 03:00, Alexander Lakhin wrote: > Hello, > > 28.06.2023 21:49, Tom Lane wrote: >> The bad news is that while investigating this I came across >> another hazard that seems much messier to fix. To wit, if >> you have a tuple with "missing" pass-by-ref columns, then >> some of the columns output by heap_deform_tuple() will >> contain pointers into the tupdesc (cf. getmissingattr()). >> That's not sufficient lifespan in the scenario we're dealing >> with here: if the tupdesc gets trashed anytime before the >> eventual ExecEvalFieldStoreForm(), we'll have garbage in the >> result tuple. > > Please excuse me, as I'm definitely late to the party, but may I ask some > questions to understand the issue discussed? I still can't see a > practical > way to get garbage data in a tuple with missing columns, so maybe I'm > missing something (except for columns). > > If we talk about ExecEvalFieldStoreDeForm() -> heap_deform_tuple() -> > getmissingattr(), then I can't find a way to add a non-null default for a > record type to exploit this path. The rowtypes test contains: > -- at the moment this will not work due to ALTER TABLE inadequacy: > alter table fullname add column suffix text default ''; > ERROR: cannot alter table "fullname" because column "people.fn" uses > its row type > > So it seems to me, that ExecEvalFieldStoreDeForm() can't be called for a > record/tuple having missing non-null attributes. And even if ALTER > TABLE was > "adequate", does it mean that ExecEvalFieldStoreDeForm() would return > default > values from the underlying table type? > >> Worse, I fear there may be many other places with latent bugs of the >> same ilk. Before the attmissingval code was added, one could assume >> that the result of heap_deform_tuple would hold good as long as you >> had a pin on the source tuple. But now it depends on metadata as >> well, and I don't think we have a coherent story about that. > > But is it true, that having a pin on tupdesc is enough to make sure that > the result of heap_deform_tuple() holds good? > > If so, then probably we should find places, where tupdesc gets unpinned > before that result is saved somewhere. > I see the following callers of the heap_deform_tuple(): > src/backend/replication/logical/reorderbuffer.c > ReorderBufferToastReplace(): heap_deform_tuple() followed by > heap_form_tuple() > src/backend/executor/execTuples.c > ExecForceStoreHeapTuple(), ExecForceStoreMinimalTuple(), > ExecStoreHeapTupleDatum(): use long-living and hopefully pinned > slot->tts_tupleDescriptor > src/backend/executor/execExprInterp.c > ExecEvalFieldStoreDeForm(): the above question applies here > src/backend/executor/spi.c > SPI_modifytuple(): heap_deform_tuple() followed by heap_form_tuple() > src/backend/utils/adt/rowtypes.c > record_out(), hash_record(), hash_record_extended(): tupdesc > released after extracting/processing data > record_send(): tupdesc released after sending data > record_cmp(), record_eq(), record_image_cmp(), record_image_eq(): > tupdesc1. tupdesc2 released after comparing data > src/backend/utils/adt/jsonfuncs.c > populate_record(): heap_deform_tuple() followed by heap_form_tuple() > src/backend/utils/adt/expandedrecord.c > deconstruct_expanded_record() uses long-living (pinned) > erh->er_tupdesc > src/backend/utils/adt/ri_triggers.c > RI_Initial_Check(), RI_PartitionRemove_Check(): use long-living? > SPI_tuptable->tupdesc > src/backend/access/heap/heapam_handler.c > reform_and_rewrite_tuple(): uses RelationGetDescr(OldHeap) > (pinned, I suppose) > src/backend/access/heap/heaptoast.c > heap_toast_delete(), heap_toast_insert_or_update() use long-living > rel->rd_att; > toast_flatten_tuple(): heap_deform_tuple(..., tupleDesc, ...) > followed by heap_form_tuple(..., tupleDesc, ...) > toast_flatten_tuple_to_datum(): heap_deform_tuple followed by > detoasting attrs > src/backend/access/heap/heapam.c > ExtractReplicaIdentity(): heap_deform_tuple(..., desc, ...) > followed by heap_form_tuple(..., desc, ...) > src/backend/access/common/heaptuple.c > heap_modify_tuple(): heap_deform_tuple(..., tupleDesc, ...) > followed by heap_form_tuple(..., tupleDesc, ...) > heap_modify_tuple_by_cols(): heap_deform_tuple(..., tupleDesc, > ...) followed by heap_form_tuple(..., tupleDesc, ...) > src/backend/access/common/tupconvert.c > execute_attr_map_tuple(): heap_deform_tuple() followed by > heap_form_tuple() > src/test/regress/regress.c > make_tuple_indirect(): heap_deform_tuple() followed by > heap_form_tuple() > src/pl/plpgsql/src/pl_exec.c > exec_move_row() called with tupdesc = NULL, SPI_tuptable->tupdesc, > or tupdesc released after the call > contrib/hstore/hstore_io.c > hstore_from_record(), hstore_populate_record(): tupdesc released > after extracting data > > And even if I missed some possibly problematic calls, maybe it's worth to > consider fixing exactly that places... > > Also, besides heap_deform_tuple(), getmissingattr() is called from > heap_getattr(). And heap_getattr() is used in many places, but most of > them > are protected too due to tupdesc pinned. > Two suspicious places that I found are GetAttributeByName() and > GetAttributeByNum(), so when using these functions you can really get the > missing attribute value with the tupdesc released. But what > circumstances do > we need to end up with an invalid data? > 1) Write a function that will call GetAttributeByName() and hold it's > result inside for some time. > 2) Add a passed-by-ref column with a default value to a table. > 3) Pass to that function a tuple without the "missing" column (just > "SELECT func(table) FROM table" won't do, but func(OLD) in a trigger > should). > 4) Trigger a relcache invalidation. > 5) Perform a database access inside that function to process the > invalidation. > 6) Access the result of GetAttributeByName() after that. > > Maybe we could construct such test case, e.g. add to pg_regress one more > SQL-accessible C function, bu9ccvhhihvcnb·t I wonder, is this the way > to go? > > On the other hand, probably there are extensions, that use > GetAttributeByName() in the non-safe way (as the function documentation > doesn't warn about such issues), but maybe just perform datumCopy inside > that function (and similar one(s))? > Though, perhaps I just don't notice the elephant in the core... I was waiting to see if others had a reaction to this, but ... I think the last point (possible unsafe use by extensions) is reason enough to proceed with the proposed mitigation, which is pretty simple and non-invasive.. cheers andrew -- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hello, On 2023-Jul-12, Andrew Dunstan wrote: > + entry = hash_search(missing_cache, &key, HASH_ENTER, &found); > + > + if (!found) > + { > + /* cache miss, so we need a non-transient copy of the datum */ > + oldctx = MemoryContextSwitchTo(TopMemoryContext); > + entry->value = > + datumCopy(attrmiss->am_value, false, att->attlen); > + MemoryContextSwitchTo(oldctx); > + } Hmm ... when exactly do these values get freed if no longer needed? Is the theory that leaking them is not relevant? -- Álvaro Herrera 48°01'N 7°57'E — https://www.EnterpriseDB.com/ "Oh, great altar of passive entertainment, bestow upon me thy discordant images at such speed as to render linear thought impossible" (Calvin a la TV)
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Hello, On 2023-Jul-12, Andrew Dunstan wrote:+ entry = hash_search(missing_cache, &key, HASH_ENTER, &found); + + if (!found) + { + /* cache miss, so we need a non-transient copy of the datum */ + oldctx = MemoryContextSwitchTo(TopMemoryContext); + entry->value = + datumCopy(attrmiss->am_value, false, att->attlen); + MemoryContextSwitchTo(oldctx); + }Hmm ... when exactly do these values get freed if no longer needed? Is the theory that leaking them is not relevant?
Not sure I understand "relevant" here. They don't get freed. There will be at most one entry per row in pg_attribute where atthasmissing is true. I originally suggested cleaning them up at transaction end, but there was an argument that that might not make them sufficiently long-lived, and I don't know of any time more coarse grained when we can conveniently clean them up.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes: > On 2023-08-21 Mo 08:33, Alvaro Herrera wrote: >> Hmm ... when exactly do these values get freed if no longer needed? Is >> the theory that leaking them is not relevant? > Not sure I understand "relevant" here. They don't get freed. There will > be at most one entry per row in pg_attribute where atthasmissing is > true. Yeah. My feeling is that as long as we don't make duplicate hashtable entries, there will never be so many entries that it'd be worth the cost and intellectual complexity of trying to clean them up. When and if that theory is disproven, we can think harder; but for now I think this approach is good enough. regards, tom lane
Re: BUG #17994: Invalidating relcache corrupts tupDesc inside ExecEvalFieldStoreDeForm()
Andrew Dunstan <andrew@dunslane.net> writes:On 2023-08-21 Mo 08:33, Alvaro Herrera wrote:Hmm ... when exactly do these values get freed if no longer needed? Is the theory that leaking them is not relevant?Not sure I understand "relevant" here. They don't get freed. There will be at most one entry per row in pg_attribute where atthasmissing is true.Yeah. My feeling is that as long as we don't make duplicate hashtable entries, there will never be so many entries that it'd be worth the cost and intellectual complexity of trying to clean them up. When and if that theory is disproven, we can think harder; but for now I think this approach is good enough.
Fix pushed to all relevant branches.
cheers
andrew
-- Andrew Dunstan EDB: https://www.enterprisedb.com