Thread: lookup_rowtype_tupdesc considered harmful
I've been running the regression tests with the sinval-reset stress testing program I posted yesterday: http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php I've seen several crashes caused by using a tuple descriptor obtained from lookup_rowtype_tupdesc() after the descriptor has been recycled due to cache flushes. On reflection I think that lookup_rowtype_tupdesc is simply misdesigned. We can't have it handing back a pointer to a data structure of unspecified lifetime. One possibility is to give it an API comparable to the syscache lookup functions, ie you get a reference-counted pointer that you have to explicitly release when done with it. This would also presumably require extending the ResourceOwner mechanism to track these reference counts. A simpler alternative is just to make every caller copy the passed-back tupdesc immediately if it's going to do more than the most trivial work with it; or even put the copying step right into lookup_rowtype_tupdesc to make sure no one can omit it. Here we'd be paying palloc and data copying overhead to keep the code simple. However, it's arguable that none of the uses of lookup_rowtype_tupdesc are performance-critical enough to justify adding a lot of infrastructure to avoid this. One big strike against the reference-count approach is that it'd be difficult to back-patch such a solution into existing branches, since it would amount to an incompatible API change. If there are any add-on modules out there that use lookup_rowtype_tupdesc, they'd start failing because of not releasing the pointer. Any opinions what to do? regards, tom lane
On Sun, 2006-01-08 at 20:04 -0500, Tom Lane wrote: > On reflection I think that lookup_rowtype_tupdesc is simply misdesigned. > We can't have it handing back a pointer to a data structure of unspecified > lifetime. One possibility is to give it an API comparable to the > syscache lookup functions, ie you get a reference-counted pointer that > you have to explicitly release when done with it. This seems a cleaner approach. > One big strike against the reference-count approach is that it'd be > difficult to back-patch such a solution into existing branches, since > it would amount to an incompatible API change. Perhaps adding a palloc() to lookup_rowtype_tupdesc() can be done for back branches (which should be minimally invasive), and a reference-counting API can be implemented for 8.2. I'd be happy to volunteer to do the refcounting changes, in the (probably unlikely :) ) event you'd rather not do the work yourself. -Neil
Neil Conway <neilc@samurai.com> writes: > On Sun, 2006-01-08 at 20:04 -0500, Tom Lane wrote: >> On reflection I think that lookup_rowtype_tupdesc is simply misdesigned. >> We can't have it handing back a pointer to a data structure of unspecified >> lifetime. One possibility is to give it an API comparable to the >> syscache lookup functions, ie you get a reference-counted pointer that >> you have to explicitly release when done with it. > This seems a cleaner approach. I have not been able to think of an efficient way to make it work while still handing back a simple TupleDesc pointer --- seems like we'd have to contort the API somehow so that the "release" function can find the reference count. Any thoughts about that? > Perhaps adding a palloc() to lookup_rowtype_tupdesc() can be done for > back branches (which should be minimally invasive), and a > reference-counting API can be implemented for 8.2. Agreed, it doesn't necessarily have to be the same solution in the back branches. OTOH I am still not convinced it's worth the trouble to implement a reference-counting solution. > I'd be happy to volunteer to do the refcounting changes, in the > (probably unlikely :) ) event you'd rather not do the work yourself. I've got plenty of stuff to do, will be happy to let you take up this problem, if you have time to do something with it soon. regards, tom lane
On Mon, 2006-01-09 at 12:57 -0500, Tom Lane wrote: > I have not been able to think of an efficient way to make it work while > still handing back a simple TupleDesc pointer --- seems like we'd have > to contort the API somehow so that the "release" function can find the > reference count. Any thoughts about that? Perhaps the "release" function can just take a type OID. We could then use that to lookup the OID's typcache entry, which would be a convenient place to store the reference count (especially if we do generalized typcache invalidation, per discussion on -patches). That means two hash table lookups for each lookup/release pair, which isn't ideal but doesn't seem too bad. > I've got plenty of stuff to do, will be happy to let you take up this > problem, if you have time to do something with it soon. I'll take a look. -Neil
Neil Conway <neilc@samurai.com> writes: > On Mon, 2006-01-09 at 12:57 -0500, Tom Lane wrote: >> I have not been able to think of an efficient way to make it work while >> still handing back a simple TupleDesc pointer --- seems like we'd have >> to contort the API somehow so that the "release" function can find the >> reference count. Any thoughts about that? > Perhaps the "release" function can just take a type OID. We could then > use that to lookup the OID's typcache entry, which would be a convenient > place to store the reference count (especially if we do generalized > typcache invalidation, per discussion on -patches). That means two hash > table lookups for each lookup/release pair, which isn't ideal but > doesn't seem too bad. Nah, I don't think this works. The problem is that after an inval, you may have to provide an updated TupleDesc to new callers while old callers still have open reference counts to the old TupleDesc. The type OID isn't enough info to let you determine which generation of TupleDesc is being released. However, you might be able to adopt the same trick used in catcache.c: the callers think they have pointers to HeapTuples and are unaware that that is just a field of a larger struct. Add a reference count and a "dead" flag, and a "magic" value for safety checking, and you've got it. regards, tom lane
On Mon, 2006-01-09 at 14:51 -0500, Tom Lane wrote: > Nah, I don't think this works. The problem is that after an inval, > you may have to provide an updated TupleDesc to new callers while > old callers still have open reference counts to the old TupleDesc. Good point. > However, you might be able to adopt the same trick used in catcache.c: > the callers think they have pointers to HeapTuples and are unaware that > that is just a field of a larger struct. Add a reference count and a > "dead" flag, and a "magic" value for safety checking, and you've got it. Hmm, okay. There's the additional complication that we need to handle record types (see RecordCacheArray in typcache.c). Since I don't think we need reference counting for those, I'm envisioning something like: TupleDesc lookup_rowtype_tupdesc(Oid type_id, int32 typmod); void release_rowtype_tupdesc(TupleDesc tdesc); /* better name? */ TypeCacheEntry *lookup_type_cache(Oid type_id, int flags); void release_type_cache(TypeCacheEntry *tentry); where lookup_rowtype_tupdesc() returns a pointer to this struct: typedef struct { struct tupleDesc tdesc; /* must be first field */ TypeCacheEntry *tentry; /* pointer to owning TypeCacheEntry, or NULL if this is a recordtype */ } MagicTupleDesc; and where TypeCacheEntry has been modified to contain a reference count and an "is dead?" flag. Is there actually a need for the (ugly) "magic value" hackery used by catcache? -Neil
Neil Conway <neilc@samurai.com> writes: > Hmm, okay. There's the additional complication that we need to handle > record types (see RecordCacheArray in typcache.c). Since I don't think > we need reference counting for those, Yeah, you do. See record_out for instance, and reflect on the fact that it can have no idea what the called I/O functions are liable to do. > where lookup_rowtype_tupdesc() returns a pointer to this struct: > typedef struct > { > struct tupleDesc tdesc; /* must be first field */ > TypeCacheEntry *tentry; /* pointer to owning TypeCacheEntry, > or NULL if this is a record type */ > } MagicTupleDesc; > and where TypeCacheEntry has been modified to contain a reference count > and an "is dead?" flag. No, the refcount and isdead flags should be in *this* struct, and there's really no need for a back-link to TypeCacheEntry. Think harder about the situation where there are both old and new tupdescs outstanding for a single typcache entry. > Is there actually a need for the (ugly) "magic > value" hackery used by catcache? It's a safety check to make sure that what got passed to release_rowtype_tupdesc is really one of these animals and not just any old TupleDesc. It's not *necessary*, granted, but given that the compiler is not going to help people avoid such errors, I think it's prudent. regards, tom lane
I had a further thought about this. What we're really talking about here is a reference-counted TupleDesc structure: it's got no necessary connection to TypeCacheEntry at all. And in fact there are other places in the system that could use such a facility. For instance, TupleTableSlot has a "shouldFreeDesc" flag that sort of allows partial, one-way sharing of descs between slots, but it would work a lot better if the descriptors contained reference counts instead. And it might be worth insisting that users of a relation tupledesc from the relcache increment its refcount while they use it --- that would get rid of some rather shaky assumptions about when it's OK to free the tupledesc during relcache updates. I'm thinking it might be worth the cost to just add a refcount field to the standard tupleDesc struct. (We don't need an isdead flag: more elegantly, just include the TypeCacheEntry's own reference to the tupdesc in the refcount.) Add macros IncrTupleDescRefCount and DecrTupleDescRefCount, the latter freeing the struct if the count goes to zero, and voila. I'm not quite sure whether the tupledesc creation routines should initialize the count to zero or to one, but otherwise it all seems pretty clear. Thoughts? regards, tom lane
On Tue, 2006-01-10 at 09:47 -0500, Tom Lane wrote: > I had a further thought about this. What we're really talking about > here is a reference-counted TupleDesc structure: it's got no necessary > connection to TypeCacheEntry at all. Yeah, I came to basically the same conclusion when implementing the patch ("RefCountedTupleDesc" vs. a normal "TupleDesc"). I assumed that there was a need to have two variants of the structure (one refcounted, one not), but you're right that just adding refcounting to TupleDesc directly doesn't have a lot of overhead, and avoids the confusion of two similar-but-not-identical structs. I'll post a patch when it's finished (hopefully tonight). -Neil