Thread: lookup_rowtype_tupdesc considered harmful

lookup_rowtype_tupdesc considered harmful

From
Tom Lane
Date:
I've been running the regression tests with the sinval-reset stress
testing program I posted yesterday:
http://archives.postgresql.org/pgsql-hackers/2006-01/msg00244.php
I've seen several crashes caused by using a tuple descriptor obtained
from lookup_rowtype_tupdesc() after the descriptor has been recycled
due to cache flushes.

On reflection I think that lookup_rowtype_tupdesc is simply misdesigned.
We can't have it handing back a pointer to a data structure of unspecified
lifetime.  One possibility is to give it an API comparable to the
syscache lookup functions, ie you get a reference-counted pointer that
you have to explicitly release when done with it.  This would also
presumably require extending the ResourceOwner mechanism to track these
reference counts.

A simpler alternative is just to make every caller copy the passed-back
tupdesc immediately if it's going to do more than the most trivial work
with it; or even put the copying step right into lookup_rowtype_tupdesc
to make sure no one can omit it.  Here we'd be paying palloc and data
copying overhead to keep the code simple.  However, it's arguable that
none of the uses of lookup_rowtype_tupdesc are performance-critical
enough to justify adding a lot of infrastructure to avoid this.

One big strike against the reference-count approach is that it'd be
difficult to back-patch such a solution into existing branches, since
it would amount to an incompatible API change.  If there are any add-on
modules out there that use lookup_rowtype_tupdesc, they'd start failing
because of not releasing the pointer.

Any opinions what to do?
        regards, tom lane


Re: lookup_rowtype_tupdesc considered harmful

From
Neil Conway
Date:
On Sun, 2006-01-08 at 20:04 -0500, Tom Lane wrote:
> On reflection I think that lookup_rowtype_tupdesc is simply misdesigned.
> We can't have it handing back a pointer to a data structure of unspecified
> lifetime.  One possibility is to give it an API comparable to the
> syscache lookup functions, ie you get a reference-counted pointer that
> you have to explicitly release when done with it.

This seems a cleaner approach.

> One big strike against the reference-count approach is that it'd be
> difficult to back-patch such a solution into existing branches, since
> it would amount to an incompatible API change.

Perhaps adding a palloc() to lookup_rowtype_tupdesc() can be done for
back branches (which should be minimally invasive), and a
reference-counting API can be implemented for 8.2.

I'd be happy to volunteer to do the refcounting changes, in the
(probably unlikely :) ) event you'd rather not do the work yourself.

-Neil




Re: lookup_rowtype_tupdesc considered harmful

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Sun, 2006-01-08 at 20:04 -0500, Tom Lane wrote:
>> On reflection I think that lookup_rowtype_tupdesc is simply misdesigned.
>> We can't have it handing back a pointer to a data structure of unspecified
>> lifetime.  One possibility is to give it an API comparable to the
>> syscache lookup functions, ie you get a reference-counted pointer that
>> you have to explicitly release when done with it.

> This seems a cleaner approach.

I have not been able to think of an efficient way to make it work while
still handing back a simple TupleDesc pointer --- seems like we'd have
to contort the API somehow so that the "release" function can find the
reference count.  Any thoughts about that?

> Perhaps adding a palloc() to lookup_rowtype_tupdesc() can be done for
> back branches (which should be minimally invasive), and a
> reference-counting API can be implemented for 8.2.

Agreed, it doesn't necessarily have to be the same solution in the back
branches.  OTOH I am still not convinced it's worth the trouble to
implement a reference-counting solution.

> I'd be happy to volunteer to do the refcounting changes, in the
> (probably unlikely :) ) event you'd rather not do the work yourself.

I've got plenty of stuff to do, will be happy to let you take up this
problem, if you have time to do something with it soon.
        regards, tom lane


Re: lookup_rowtype_tupdesc considered harmful

From
Neil Conway
Date:
On Mon, 2006-01-09 at 12:57 -0500, Tom Lane wrote:
> I have not been able to think of an efficient way to make it work while
> still handing back a simple TupleDesc pointer --- seems like we'd have
> to contort the API somehow so that the "release" function can find the
> reference count.  Any thoughts about that?

Perhaps the "release" function can just take a type OID. We could then
use that to lookup the OID's typcache entry, which would be a convenient
place to store the reference count (especially if we do generalized
typcache invalidation, per discussion on -patches). That means two hash
table lookups for each lookup/release pair, which isn't ideal but
doesn't seem too bad.

> I've got plenty of stuff to do, will be happy to let you take up this
> problem, if you have time to do something with it soon.

I'll take a look.

-Neil




Re: lookup_rowtype_tupdesc considered harmful

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Mon, 2006-01-09 at 12:57 -0500, Tom Lane wrote:
>> I have not been able to think of an efficient way to make it work while
>> still handing back a simple TupleDesc pointer --- seems like we'd have
>> to contort the API somehow so that the "release" function can find the
>> reference count.  Any thoughts about that?

> Perhaps the "release" function can just take a type OID. We could then
> use that to lookup the OID's typcache entry, which would be a convenient
> place to store the reference count (especially if we do generalized
> typcache invalidation, per discussion on -patches). That means two hash
> table lookups for each lookup/release pair, which isn't ideal but
> doesn't seem too bad.

Nah, I don't think this works.  The problem is that after an inval,
you may have to provide an updated TupleDesc to new callers while
old callers still have open reference counts to the old TupleDesc.
The type OID isn't enough info to let you determine which generation
of TupleDesc is being released.

However, you might be able to adopt the same trick used in catcache.c:
the callers think they have pointers to HeapTuples and are unaware that
that is just a field of a larger struct.  Add a reference count and a
"dead" flag, and a "magic" value for safety checking, and you've got it.
        regards, tom lane


Re: lookup_rowtype_tupdesc considered harmful

From
Neil Conway
Date:
On Mon, 2006-01-09 at 14:51 -0500, Tom Lane wrote:
> Nah, I don't think this works.  The problem is that after an inval,
> you may have to provide an updated TupleDesc to new callers while
> old callers still have open reference counts to the old TupleDesc.

Good point.

> However, you might be able to adopt the same trick used in catcache.c:
> the callers think they have pointers to HeapTuples and are unaware that
> that is just a field of a larger struct.  Add a reference count and a
> "dead" flag, and a "magic" value for safety checking, and you've got it.

Hmm, okay. There's the additional complication that we need to handle
record types (see RecordCacheArray in typcache.c). Since I don't think
we need reference counting for those, I'm envisioning something like:

TupleDesc lookup_rowtype_tupdesc(Oid type_id, int32 typmod);
void release_rowtype_tupdesc(TupleDesc tdesc); /* better name? */

TypeCacheEntry *lookup_type_cache(Oid type_id, int flags);
void release_type_cache(TypeCacheEntry *tentry);

where lookup_rowtype_tupdesc() returns a pointer to this struct:

typedef struct
{   struct tupleDesc tdesc; /* must be first field */
   TypeCacheEntry *tentry; /* pointer to owning TypeCacheEntry,                              or NULL if this is a
recordtype */
 
} MagicTupleDesc;

and where TypeCacheEntry has been modified to contain a reference count
and an "is dead?" flag. Is there actually a need for the (ugly) "magic
value" hackery used by catcache?

-Neil




Re: lookup_rowtype_tupdesc considered harmful

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Hmm, okay. There's the additional complication that we need to handle
> record types (see RecordCacheArray in typcache.c). Since I don't think
> we need reference counting for those,

Yeah, you do.  See record_out for instance, and reflect on the fact that
it can have no idea what the called I/O functions are liable to do.

> where lookup_rowtype_tupdesc() returns a pointer to this struct:

> typedef struct
> {
>     struct tupleDesc tdesc; /* must be first field */

>     TypeCacheEntry *tentry; /* pointer to owning TypeCacheEntry,
>                                or NULL if this is a record type */
> } MagicTupleDesc;

> and where TypeCacheEntry has been modified to contain a reference count
> and an "is dead?" flag.

No, the refcount and isdead flags should be in *this* struct, and
there's really no need for a back-link to TypeCacheEntry.  Think harder
about the situation where there are both old and new tupdescs
outstanding for a single typcache entry.

> Is there actually a need for the (ugly) "magic
> value" hackery used by catcache?

It's a safety check to make sure that what got passed to
release_rowtype_tupdesc is really one of these animals and not just any
old TupleDesc.  It's not *necessary*, granted, but given that the compiler
is not going to help people avoid such errors, I think it's prudent.
        regards, tom lane


Re: lookup_rowtype_tupdesc considered harmful

From
Tom Lane
Date:
I had a further thought about this.  What we're really talking about
here is a reference-counted TupleDesc structure: it's got no necessary
connection to TypeCacheEntry at all.  And in fact there are other places
in the system that could use such a facility.  For instance,
TupleTableSlot has a "shouldFreeDesc" flag that sort of allows partial,
one-way sharing of descs between slots, but it would work a lot better
if the descriptors contained reference counts instead.  And it might be
worth insisting that users of a relation tupledesc from the relcache
increment its refcount while they use it --- that would get rid of some
rather shaky assumptions about when it's OK to free the tupledesc during
relcache updates.

I'm thinking it might be worth the cost to just add a refcount field to
the standard tupleDesc struct.  (We don't need an isdead flag: more
elegantly, just include the TypeCacheEntry's own reference to the
tupdesc in the refcount.)  Add macros IncrTupleDescRefCount and
DecrTupleDescRefCount, the latter freeing the struct if the count
goes to zero, and voila.

I'm not quite sure whether the tupledesc creation routines should
initialize the count to zero or to one, but otherwise it all seems
pretty clear.  Thoughts?
        regards, tom lane


Re: lookup_rowtype_tupdesc considered harmful

From
Neil Conway
Date:
On Tue, 2006-01-10 at 09:47 -0500, Tom Lane wrote:
> I had a further thought about this.  What we're really talking about
> here is a reference-counted TupleDesc structure: it's got no necessary
> connection to TypeCacheEntry at all.

Yeah, I came to basically the same conclusion when implementing the
patch ("RefCountedTupleDesc" vs. a normal "TupleDesc"). I assumed that
there was a need to have two variants of the structure (one refcounted,
one not), but you're right that just adding refcounting to TupleDesc
directly doesn't have a lot of overhead, and avoids the confusion of two
similar-but-not-identical structs. I'll post a patch when it's finished
(hopefully tonight).

-Neil