Thread: TupleDesc refcounting
Attached is a WIP patch that adds reference counting for TupleDescs. Two issues that I ran into while implementing it: (1) How should the lifetime of a TupleDesc be managed? The existing ResourceOwner stuff seems to assume that it is managing "per-query" resources. A TupleDesc will often live beyond the lifetime of a single transaction, and might even be created before transactions can be started (e.g. formrdesc() in relcache.c, when we're creating relcache entries for nailed-in-cache relations early in InitPostgres()). In current sources, the lifetime of a TupleDesc is the lifetime of the memory context in which it is allocated (and/or whenever FreeTupleDesc() is invoked). We could imitate that behavior by optionally linking a ResourceOwner with each MemoryContext, and releasing the ResourceOwner when the MemoryContext is reset or deleted. However, I'm not sure that that's the right approach... (2) The existing ResourceOwner users issue a warning if the resource they are managing is not explicitly released before a transaction successfully commits (so they elog(WARNING)). I don't see the need to be that strict for TupleDescs -- as we do with palloc() without a matching pfree(), I think it should be okay to just silently clean up "leaked" TupleDescs when releasing a ResourceOwner. Thoughts? -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > (1) How should the lifetime of a TupleDesc be managed? The existing > ResourceOwner stuff seems to assume that it is managing "per-query" > resources. Yeah. I was envisioning two different approaches: for TupleDesc references from long-lived data structures (ie, typcache or relcache) just increment or decrement the count when the referencing data structure changes. ResourceOwner would be used for dynamic within-query references --- in practice, local variables. If the reference would be forgotten during an elog longjmp then you need a ResourceOwner to backstop it, otherwise not. You could conceivably set up a "cache ResourceOwner" with indefinite lifespan to make the cache case more like the local-variable case, but I think this would merely be a performance drag with no real value. There's no point in a ResourceOwner if it will never be called on to release resources, and a cache-lifespan ResourceOwner wouldn't be. > In current sources, the lifetime of a TupleDesc is the lifetime of the > memory context in which it is allocated (and/or whenever FreeTupleDesc() > is invoked). We could imitate that behavior by optionally linking a > ResourceOwner with each MemoryContext, and releasing the ResourceOwner > when the MemoryContext is reset or deleted. However, I'm not sure that > that's the right approach... No. We really only need this mechanism for the case where a tupledesc allocated in a long-lived context (again, think relcache or typcache) is going to be dynamically referenced by shorter-lived code. Tying it to MemoryContexts won't help because CacheMemoryContext never gets reset. > (2) The existing ResourceOwner users issue a warning if the resource > they are managing is not explicitly released before a transaction > successfully commits (so they elog(WARNING)). I don't see the need to be > that strict for TupleDescs -- as we do with palloc() without a matching > pfree(), I think it should be okay to just silently clean up "leaked" > TupleDescs when releasing a ResourceOwner. The reason why those warnings are issued is to catch code that is failing to manage references properly. I think that motivation applies perfectly well to tuple descriptors too. There is no good reason for code to forget to release the descriptor reference in non-error paths. No time to look at the patch itself right now ... regards, tom lane
On Tue, 2006-01-10 at 18:05 -0500, Tom Lane wrote: > Yeah. I was envisioning two different approaches: for TupleDesc > references from long-lived data structures (ie, typcache or relcache) > just increment or decrement the count when the referencing data > structure changes. ResourceOwner would be used for dynamic within-query > references --- in practice, local variables. If the reference would > be forgotten during an elog longjmp then you need a ResourceOwner to > backstop it, otherwise not. Ah, I see what you mean. In implementing this, I wasn't sure the best way to provide these two sorts of TupleDesc references. My first thought was to add a "use ResourceOwner?" boolean parameter to the routines that create and destroy references to TupleDescs: extern TupleDesc CreateTemplateTupleDesc(int natts, bool hasoid, bool use_resowner); extern TupleDesc CreateTupleDesc(int natts, bool hasoid, Form_pg_attribute *attrs, bool use_resowner); extern TupleDesc CreateTupleDescCopy(TupleDesc tupdesc, bool use_resowner); extern TupleDesc CreateTupleDescCopyConstr(TupleDesc tupdesc, bool use_resowner); extern void IncrTupleDescRefCount(TupleDesc tupdesc, bool use_resowner); extern void DecrTupleDescRefCount(TupleDesc tupdesc, bool use_resowner); But that requires making *all* the call-sites of these routines care about whether a ResourceOwner is being used. I think most call-sites will want to use a ResourceOwner, so that ought to be the default. Avoiding changing the function signatures also means we don't break a fairly widely-used API. Instead, how about defining: extern void IncrTupleDescPersistentRefCount(TupleDesc tdesc); (the name could do with improvement). This would increment the TupleDesc's "persistent reference count" -- that is, non-persistent references are automatically released at transaction end, whereas persistent ones are only released via: extern void DecrTupleDescPersistentRefCount(TupleDesc tdesc); One downside is that creating a new, persistent TupleDesc requires a little pain: TupleDesc tdesc = make_your_tdesc(); IncrTupleDescPersistentRefCount(tdesc); DecrTupleDescRefCount(tdesc); That means we will unnecessarily inform the ResourceOwner about the TupleDesc, and then tell it to immediately forget about it. However, I think that's acceptable: ResourceOwnerRememberTupleDesc() is trivial, and ResourceOwnerForgetTupleDesc() searches beginning at the most-recently-added TupleDesc, so it should also be very cheap. -Neil
Neil Conway <neilc@samurai.com> writes: > Ah, I see what you mean. In implementing this, I wasn't sure the best > way to provide these two sorts of TupleDesc references. My first thought > was to add a "use ResourceOwner?" boolean parameter to the routines that > create and destroy references to TupleDescs: No, I wouldn't do that. I would keep the routines you mention ignorant of ResourceOwner, because I think that the vast majority of tupdesc usage will NOT be using ResourceOwners. Only the places where a pointer to a cached tupdesc is handed out need to deal with this. This excludes practically all of the executor, for instance. If you're finding yourself writing a large and invasive patch, I think you're doing it wrong. I'm envisioning something pretty localized. regards, tom lane
On Thu, 2006-01-12 at 10:40 -0500, Tom Lane wrote: > If you're finding yourself writing a large and invasive patch, I think > you're doing it wrong. I think I might be :-) Attached is a revised version of the patch (still WIP). In summary: - TupleDescs are allocated in TopMemoryContext, and released when their reference count falls to zero. - normal TupleDesc references are released automatically (with a warning) when the ResourceOwner is released, but it is possible to obtain a "persistent reference" to a TupleDesc that is only released manually - I've begun updating the code to reflect this, but I haven't finished yet. Either the required changes are more extensive than expected or I'm doing something wrong: adding reference counting for TupleDescs changes the rules for TupleDesc lifetime. Plenty of code was doing stuff like: /* we need this TupleDesc to live for a while */ old_cxt = MemoryContextSwitchTo(LongLivedContext); tdesc = CreateTupleDescCopy(tdesc); MemoryContextSwitchTo(old_cxt); which no longer has the intended effect: the CurrentMemoryContext does not control the lifetime of a TupleDesc. I've only begun to make the required changes: the patch passes initdb, but fails during the regression tests. I've done a cursory job of updating the relcache and syscache to use the new refcounting rules, but I haven't really done the executor, the utility commands, or the other miscellaneous parts of the system that manipulate TupleDescs. I believe most of the remaining work is a Small Matter of Programming: updating code to use reference counts rather than depending on the lifetime of the CurrentMemoryContext. However, if I'm going down the wrong track, I thought I'd float the patch now rather than going further down this path. Comments welcome. -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > On Thu, 2006-01-12 at 10:40 -0500, Tom Lane wrote: >> If you're finding yourself writing a large and invasive patch, I think >> you're doing it wrong. > I think I might be :-) Yipes ... this seems far more invasive than I think is justified. In particular the notion of storing *every* tupdesc in TopMemoryContext seems completely wrong. That's not working with the context system, that's working against it. What I had in mind was an optional refcounting facility, whereas you seem to be going in the direction of making it mandatory. The reason I think it should be optional is that most uses of tupdescs just don't need it. We have a grand total of one place where a refcount (or forced local copy) seems demonstrably necessary. Surely changing a whole lot of code to accommodate that one place is not the right tradeoff. My inclination at this point is to forget the whole thing and just patch the callers of lookup_rowtype_tupdesc that need to copy the tupdesc. regards, tom lane
On Sun, 2006-01-15 at 12:08 -0500, Tom Lane wrote: > My inclination at this point is to forget the whole thing and just patch > the callers of lookup_rowtype_tupdesc that need to copy the tupdesc. Actually, I think I finally understand how to implement this patch sanely. I had thought that the lifetime of a TupleDesc should be dictated by either the memory context in which it is allocated, OR its reference count. This leads us down the road toward mandatory reference counting, which I agree is a net loss. However, since we're primarily concerned with TupleDescs allocated in CacheMemoryContext and that context is never reset, we can use the reference count *just* to manage the "external" references to TupleDescs. That should make the patch far less invasive. (I have the feeling you've been suggesting this all along, I've just been too thick-skulled to understand you.) I'll hopefully have a patch implementing this finished by tomorrow evening. -Neil
Neil Conway <neilc@samurai.com> writes: > ... However, since we're primarily > concerned with TupleDescs allocated in CacheMemoryContext and that > context is never reset, we can use the reference count *just* to manage > the "external" references to TupleDescs. That should make the patch far > less invasive. (I have the feeling you've been suggesting this all > along, I've just been too thick-skulled to understand you.) Actually, what I've been thinking about is applying reference counting only to the specific tupdescs that need it --- which at this point is really only the ones in the typcache, though we might later extend the use of the facility to other places. I suspect you'll find that it's convenient to treat typcache's own link to a tupdesc as a reference count too, so it's not strictly "external" references that should be counted. The point is that we don't need to force all the code into that same mold, especially not all at once. > I'll hopefully have a patch implementing this finished by tomorrow > evening. OK. I'm going to go ahead and develop a non-refcount patch (copy tupdescs where needed) to put into the back branches. I won't apply it to HEAD unless you say so, though, as I'm sure it'd create a merge problem for you. regards, tom lane
On Tue, 2006-01-17 at 09:36 -0500, Tom Lane wrote: > I suspect you'll find that it's convenient to treat typcache's own > link to a tupdesc as a reference count too, so it's not strictly > "external" references that should be counted. The problem with this is that incrementing a TupleDesc's refcount informs the CurrentResourceOwner. That causes the refcount to be decremented when the resource owner is released, which is wrong when the TupleDesc should live beyond the current query (as it should in the typcache). We could avoid that by creating a CacheResourceOwner that is never reset, as discussed earlier, but at present I'm not sure if there's any point, so I've just used a "dead" field. Attached is a revised patch. Reference counting is only used for lookup_rowtype_tupdesc(): as discussed there might be other places that would benefit from this, but I haven't looked at that yet. There was some call-sites of lookup_rowtype_tupdesc() where it doesn't seem to be easy to use reference counting. For example, consider the implementation of get_expr_result_type(): some code paths within that function call lookup_rowtype_tupdesc() to produce the returned TupleDesc, and some do not. The easiest fix seemed to be just making a copy of the TupleDesc for the lookup_rowtype_tupdesc() cases. The patch is WIP: a few regression tests fail due to TupleDesc leaks I haven't fixed yet but that should be easily fixable, and there are a few other minor issues to address. I'm just posting the patch now to get any feedback. (Apologies for not getting this done earlier, I had a touch of the flu yesterday...) -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > On Tue, 2006-01-17 at 09:36 -0500, Tom Lane wrote: >> I suspect you'll find that it's convenient to treat typcache's own >> link to a tupdesc as a reference count too, so it's not strictly >> "external" references that should be counted. > The problem with this is that incrementing a TupleDesc's refcount > informs the CurrentResourceOwner. Not if it's just "tupdesc->refcount++;" ... ;-) Seriously, I was imagining a two-level structure where tupdesc itself just has operations on the order of refcount++; if (--refcount <= 0) FreeTupleDesc(); and then a separate layer on top of that adds ResourceOwner management. This is precisely because the cache code requires non-resource-owner- managed links (or at least, has no particular use for the services of a ResourceOwner). > There was some call-sites of lookup_rowtype_tupdesc() where it doesn't > seem to be easy to use reference counting. For example, consider the > implementation of get_expr_result_type(): some code paths within that > function call lookup_rowtype_tupdesc() to produce the returned > TupleDesc, and some do not. The easiest fix seemed to be just making a > copy of the TupleDesc for the lookup_rowtype_tupdesc() cases. Yeah, I noticed this while making the back-branch patch. I agree that any given routine had better have a consistent return convention: if the tupdesc needs to be refcount-decremented when done, that must be the case in all code paths, since the caller can't be expected to know which case applies. My recollection though is that the non-lookup cases return freshly-built-in-local-storage tupdescs, which could certainly be set up as having refcount 1 so that the decrement would work correctly. It may or may not be worth the trouble compared to just copying, though. You should think about whether a routine is a performance hotspot before going out of your way to avoid a copy step. > (Apologies for not getting this done earlier, I had a touch of the flu > yesterday...) There's no hurry about it. Hope you're feeling better. regards, tom lane
On Thu, 2006-01-19 at 04:14 -0500, Neil Conway wrote: > The patch is WIP: a few regression tests fail due to TupleDesc leaks I > haven't fixed yet but that should be easily fixable, and there are a few > other minor issues to address. I'm just posting the patch now to get any > feedback. Attached is a revised patch: all the regression tests patch, and I'm not aware of any major remaining issues. Barring any objections, I'll apply this tomorrow. (I'll take a look at using refcounting for TupleDescs in other parts of the system after that...) -Neil
Attachment
Neil Conway <neilc@samurai.com> writes: > Attached is a revised patch: all the regression tests patch, and I'm not > aware of any major remaining issues. Barring any objections, I'll apply > this tomorrow. I object ... this is still trying to enforce tupdesc refcounting on much more of the system than I think useful or prudent. I'll try to come up with an alternative patch. regards, tom lane
On Sun, 2006-01-22 at 13:43 -0500, Tom Lane wrote: > I object ... this is still trying to enforce tupdesc refcounting on much > more of the system than I think useful or prudent. Well, the patch adds management of TupleDesc refcounts solely for the return value of lookup_rowtype_tupdesc(). If we're going to use reference counting there at all, I don't see a way do make the refcounting any less invasive. -Neil
I'm finally getting back to looking at the problem of reference-counting cached TupleDescs as was discussed in January. I had objected to the last patch Neil posted: http://archives.postgresql.org/pgsql-patches/2006-01/msg00243.php on the grounds that it seemed too complicated. On looking at it closely, I realize that a lot of the complexity comes from my insistence that the ResourceOwner mechanism ought to warn about any tupdesc references that haven't been explicitly released before transaction end. (Neil had questioned upthread whether that was really necessary ... he was right.) The problem is that execQual.c would like to hang onto tupdesc references across multiple executions of expression nodes, and there's not any explicit code to clean up expression trees, so no easy place to put a refcount decrement. Neil's patch arranges to clean up such references by using expression shutdown callbacks. The patch doesn't quite work as-is because it neglects the fact that shutdown callbacks are triggered during a node rescan as well as node shutdown. This could be fixed, but I'm thinking that it's a lot of mechanism to add when the ResourceOwner could perfectly well clean up the refcounts instead. The point of the no-remaining-refs rule was to catch code that should have released a refcount and failed to. We could keep that error checking if we made ResourceOwner recognize two different kinds of tupdesc refcount, one to warn about and one not to. I'm not sure if that's worth the trouble or not --- any opinions? Also, I looked at whether TupleTableSlots could sensibly use refcounted tuples, and I'm now of the opinion that it'd be a waste of time. There aren't any places where the executor uses a tupdesc obtained from cache as a TupleTableSlot's descriptor; rather, all the tupdescs involved are built in query-lifespan memory during plan startup. So I'm thinking we should actually get rid of tts_shouldFreeDesc and not have execTuples.c do any tupdesc-freeing at all. Retail freeing of structures that are in a memory context that's about to be freed is a waste of cycles. We've gotten rid of a fair amount of redundant pfree's that used to occur during plan shutdown, and these are some more that we could do without. Comments? regards, tom lane
I wrote: > I'm finally getting back to looking at the problem of reference-counting > cached TupleDescs as was discussed in January. I had objected to the > last patch Neil posted: > http://archives.postgresql.org/pgsql-patches/2006-01/msg00243.php > on the grounds that it seemed too complicated. On looking at it > closely, I realize that a lot of the complexity comes from my insistence > that the ResourceOwner mechanism ought to warn about any tupdesc > references that haven't been explicitly released before transaction end. After much thrashing I concluded that my original instinct was right and we really do want to insist on cleanup of tupdesc references during normal processing. The approach I tried to take effectively meant leaking tupdesc references until end of query, which is really bad news for SQL-language functions --- a function that grabs a tupdesc reference during plan startup would then accumulate a tupdesc reference during every invocation, leading to indefinite bloat in the ResourceOwner over a long query. So I ended up applying something pretty close to Neil's patch. I did modify it to not bother reference-counting tupdescs that aren't actually in any cache. I also fixed up TupleTableSlots to do reference-count processing if ExecSetSlotDescriptor is handed a reference-counted tupdesc (my assertion yesterday that this never happens was wrong), but not to bother cleaning up non-ref-counted descriptors. All in all, a lot of thrashing for only marginal improvement :-( regards, tom lane