Thread: TupleDesc refcounting

TupleDesc refcounting

From
Neil Conway
Date:
Attached is a WIP patch that adds reference counting for TupleDescs. Two
issues that I ran into while implementing it:

(1) How should the lifetime of a TupleDesc be managed? The existing
ResourceOwner stuff seems to assume that it is managing "per-query"
resources. A TupleDesc will often live beyond the lifetime of a single
transaction, and might even be created before transactions can be
started (e.g. formrdesc() in relcache.c, when we're creating relcache
entries for nailed-in-cache relations early in InitPostgres()).

In current sources, the lifetime of a TupleDesc is the lifetime of the
memory context in which it is allocated (and/or whenever FreeTupleDesc()
is invoked). We could imitate that behavior by optionally linking a
ResourceOwner with each MemoryContext, and releasing the ResourceOwner
when the MemoryContext is reset or deleted. However, I'm not sure that
that's the right approach...

(2) The existing ResourceOwner users issue a warning if the resource
they are managing is not explicitly released before a transaction
successfully commits (so they elog(WARNING)). I don't see the need to be
that strict for TupleDescs -- as we do with palloc() without a matching
pfree(), I think it should be okay to just silently clean up "leaked"
TupleDescs when releasing a ResourceOwner.

Thoughts?

-Neil


Attachment

Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> (1) How should the lifetime of a TupleDesc be managed? The existing
> ResourceOwner stuff seems to assume that it is managing "per-query"
> resources.

Yeah.  I was envisioning two different approaches: for TupleDesc
references from long-lived data structures (ie, typcache or relcache)
just increment or decrement the count when the referencing data
structure changes.  ResourceOwner would be used for dynamic within-query
references --- in practice, local variables.  If the reference would
be forgotten during an elog longjmp then you need a ResourceOwner to
backstop it, otherwise not.

You could conceivably set up a "cache ResourceOwner" with indefinite
lifespan to make the cache case more like the local-variable case,
but I think this would merely be a performance drag with no real value.
There's no point in a ResourceOwner if it will never be called on to
release resources, and a cache-lifespan ResourceOwner wouldn't be.

> In current sources, the lifetime of a TupleDesc is the lifetime of the
> memory context in which it is allocated (and/or whenever FreeTupleDesc()
> is invoked). We could imitate that behavior by optionally linking a
> ResourceOwner with each MemoryContext, and releasing the ResourceOwner
> when the MemoryContext is reset or deleted. However, I'm not sure that
> that's the right approach...

No.  We really only need this mechanism for the case where a tupledesc
allocated in a long-lived context (again, think relcache or typcache)
is going to be dynamically referenced by shorter-lived code.  Tying it
to MemoryContexts won't help because CacheMemoryContext never gets
reset.

> (2) The existing ResourceOwner users issue a warning if the resource
> they are managing is not explicitly released before a transaction
> successfully commits (so they elog(WARNING)). I don't see the need to be
> that strict for TupleDescs -- as we do with palloc() without a matching
> pfree(), I think it should be okay to just silently clean up "leaked"
> TupleDescs when releasing a ResourceOwner.

The reason why those warnings are issued is to catch code that is
failing to manage references properly.  I think that motivation applies
perfectly well to tuple descriptors too.  There is no good reason for
code to forget to release the descriptor reference in non-error paths.

No time to look at the patch itself right now ...

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Tue, 2006-01-10 at 18:05 -0500, Tom Lane wrote:
> Yeah.  I was envisioning two different approaches: for TupleDesc
> references from long-lived data structures (ie, typcache or relcache)
> just increment or decrement the count when the referencing data
> structure changes.  ResourceOwner would be used for dynamic within-query
> references --- in practice, local variables.  If the reference would
> be forgotten during an elog longjmp then you need a ResourceOwner to
> backstop it, otherwise not.

Ah, I see what you mean. In implementing this, I wasn't sure the best
way to provide these two sorts of TupleDesc references. My first thought
was to add a "use ResourceOwner?" boolean parameter to the routines that
create and destroy references to TupleDescs:

extern TupleDesc CreateTemplateTupleDesc(int natts, bool hasoid,
                                         bool use_resowner);
extern TupleDesc CreateTupleDesc(int natts, bool hasoid,
                                 Form_pg_attribute *attrs,
                                 bool use_resowner);
extern TupleDesc CreateTupleDescCopy(TupleDesc tupdesc,
                                     bool use_resowner);
extern TupleDesc CreateTupleDescCopyConstr(TupleDesc tupdesc,
                                           bool use_resowner);
extern void IncrTupleDescRefCount(TupleDesc tupdesc,
                                  bool use_resowner);
extern void DecrTupleDescRefCount(TupleDesc tupdesc,
                                  bool use_resowner);

But that requires making *all* the call-sites of these routines care
about whether a ResourceOwner is being used. I think most call-sites
will want to use a ResourceOwner, so that ought to be the default.
Avoiding changing the function signatures also means we don't break a
fairly widely-used API. Instead, how about defining:

extern void IncrTupleDescPersistentRefCount(TupleDesc tdesc);

(the name could do with improvement). This would increment the
TupleDesc's "persistent reference count" -- that is, non-persistent
references are automatically released at transaction end, whereas
persistent ones are only released via:

extern void DecrTupleDescPersistentRefCount(TupleDesc tdesc);

One downside is that creating a new, persistent TupleDesc requires a
little pain:

TupleDesc tdesc = make_your_tdesc();
IncrTupleDescPersistentRefCount(tdesc);
DecrTupleDescRefCount(tdesc);

That means we will unnecessarily inform the ResourceOwner about the
TupleDesc, and then tell it to immediately forget about it. However, I
think that's acceptable: ResourceOwnerRememberTupleDesc() is trivial,
and ResourceOwnerForgetTupleDesc() searches beginning at the
most-recently-added TupleDesc, so it should also be very cheap.

-Neil



Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Ah, I see what you mean. In implementing this, I wasn't sure the best
> way to provide these two sorts of TupleDesc references. My first thought
> was to add a "use ResourceOwner?" boolean parameter to the routines that
> create and destroy references to TupleDescs:

No, I wouldn't do that.  I would keep the routines you mention ignorant
of ResourceOwner, because I think that the vast majority of tupdesc
usage will NOT be using ResourceOwners.  Only the places where a pointer
to a cached tupdesc is handed out need to deal with this.  This excludes
practically all of the executor, for instance.

If you're finding yourself writing a large and invasive patch, I think
you're doing it wrong.  I'm envisioning something pretty localized.

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Thu, 2006-01-12 at 10:40 -0500, Tom Lane wrote:
> If you're finding yourself writing a large and invasive patch, I think
> you're doing it wrong.

I think I might be :-)

Attached is a revised version of the patch (still WIP). In summary:

- TupleDescs are allocated in TopMemoryContext, and released when their
reference count falls to zero.

- normal TupleDesc references are released automatically (with a
warning) when the ResourceOwner is released, but it is possible to
obtain a "persistent reference" to a TupleDesc that is only released
manually

- I've begun updating the code to reflect this, but I haven't finished
yet. Either the required changes are more extensive than expected or I'm
doing something wrong: adding reference counting for TupleDescs changes
the rules for TupleDesc lifetime. Plenty of code was doing stuff like:

/* we need this TupleDesc to live for a while */
old_cxt = MemoryContextSwitchTo(LongLivedContext);
tdesc = CreateTupleDescCopy(tdesc);
MemoryContextSwitchTo(old_cxt);

which no longer has the intended effect: the CurrentMemoryContext does
not control the lifetime of a TupleDesc. I've only begun to make the
required changes: the patch passes initdb, but fails during the
regression tests. I've done a cursory job of updating the relcache and
syscache to use the new refcounting rules, but I haven't really done the
executor, the utility commands, or the other miscellaneous parts of the
system that manipulate TupleDescs.

I believe most of the remaining work is a Small Matter of Programming:
updating code to use reference counts rather than depending on the
lifetime of the CurrentMemoryContext. However, if I'm going down the
wrong track, I thought I'd float the patch now rather than going further
down this path. Comments welcome.

-Neil


Attachment

Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Thu, 2006-01-12 at 10:40 -0500, Tom Lane wrote:
>> If you're finding yourself writing a large and invasive patch, I think
>> you're doing it wrong.

> I think I might be :-)

Yipes ... this seems far more invasive than I think is justified.
In particular the notion of storing *every* tupdesc in TopMemoryContext
seems completely wrong.  That's not working with the context system,
that's working against it.

What I had in mind was an optional refcounting facility, whereas you
seem to be going in the direction of making it mandatory.  The reason
I think it should be optional is that most uses of tupdescs just don't
need it.  We have a grand total of one place where a refcount (or forced
local copy) seems demonstrably necessary.  Surely changing a whole lot
of code to accommodate that one place is not the right tradeoff.

My inclination at this point is to forget the whole thing and just patch
the callers of lookup_rowtype_tupdesc that need to copy the tupdesc.

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Sun, 2006-01-15 at 12:08 -0500, Tom Lane wrote:
> My inclination at this point is to forget the whole thing and just patch
> the callers of lookup_rowtype_tupdesc that need to copy the tupdesc.

Actually, I think I finally understand how to implement this patch
sanely. I had thought that the lifetime of a TupleDesc should be
dictated by either the memory context in which it is allocated, OR its
reference count. This leads us down the road toward mandatory reference
counting, which I agree is a net loss. However, since we're primarily
concerned with TupleDescs allocated in CacheMemoryContext and that
context is never reset, we can use the reference count *just* to manage
the "external" references to TupleDescs. That should make the patch far
less invasive. (I have the feeling you've been suggesting this all
along, I've just been too thick-skulled to understand you.)

I'll hopefully have a patch implementing this finished by tomorrow
evening.

-Neil



Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> ... However, since we're primarily
> concerned with TupleDescs allocated in CacheMemoryContext and that
> context is never reset, we can use the reference count *just* to manage
> the "external" references to TupleDescs. That should make the patch far
> less invasive. (I have the feeling you've been suggesting this all
> along, I've just been too thick-skulled to understand you.)

Actually, what I've been thinking about is applying reference counting
only to the specific tupdescs that need it --- which at this point is
really only the ones in the typcache, though we might later extend the
use of the facility to other places.  I suspect you'll find that it's
convenient to treat typcache's own link to a tupdesc as a reference
count too, so it's not strictly "external" references that should be
counted.  The point is that we don't need to force all the code into
that same mold, especially not all at once.

> I'll hopefully have a patch implementing this finished by tomorrow
> evening.

OK.  I'm going to go ahead and develop a non-refcount patch (copy
tupdescs where needed) to put into the back branches.  I won't apply it
to HEAD unless you say so, though, as I'm sure it'd create a merge
problem for you.

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Tue, 2006-01-17 at 09:36 -0500, Tom Lane wrote:
> I suspect you'll find that it's convenient to treat typcache's own
> link to a tupdesc as a reference count too, so it's not strictly
> "external" references that should be counted.

The problem with this is that incrementing a TupleDesc's refcount
informs the CurrentResourceOwner. That causes the refcount to be
decremented when the resource owner is released, which is wrong when the
TupleDesc should live beyond the current query (as it should in the
typcache). We could avoid that by creating a CacheResourceOwner that is
never reset, as discussed earlier, but at present I'm not sure if
there's any point, so I've just used a "dead" field.

Attached is a revised patch. Reference counting is only used for
lookup_rowtype_tupdesc(): as discussed there might be other places that
would benefit from this, but I haven't looked at that yet.

There was some call-sites of lookup_rowtype_tupdesc() where it doesn't
seem to be easy to use reference counting. For example, consider the
implementation of get_expr_result_type(): some code paths within that
function call lookup_rowtype_tupdesc() to produce the returned
TupleDesc, and some do not. The easiest fix seemed to be just making a
copy of the TupleDesc for the lookup_rowtype_tupdesc() cases.

The patch is WIP: a few regression tests fail due to TupleDesc leaks I
haven't fixed yet but that should be easily fixable, and there are a few
other minor issues to address. I'm just posting the patch now to get any
feedback.

(Apologies for not getting this done earlier, I had a touch of the flu
yesterday...)

-Neil


Attachment

Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> On Tue, 2006-01-17 at 09:36 -0500, Tom Lane wrote:
>> I suspect you'll find that it's convenient to treat typcache's own
>> link to a tupdesc as a reference count too, so it's not strictly
>> "external" references that should be counted.

> The problem with this is that incrementing a TupleDesc's refcount
> informs the CurrentResourceOwner.

Not if it's just "tupdesc->refcount++;" ... ;-)

Seriously, I was imagining a two-level structure where tupdesc itself
just has operations on the order of

    refcount++;

    if (--refcount <= 0)
        FreeTupleDesc();

and then a separate layer on top of that adds ResourceOwner management.
This is precisely because the cache code requires non-resource-owner-
managed links (or at least, has no particular use for the services of a
ResourceOwner).

> There was some call-sites of lookup_rowtype_tupdesc() where it doesn't
> seem to be easy to use reference counting. For example, consider the
> implementation of get_expr_result_type(): some code paths within that
> function call lookup_rowtype_tupdesc() to produce the returned
> TupleDesc, and some do not. The easiest fix seemed to be just making a
> copy of the TupleDesc for the lookup_rowtype_tupdesc() cases.

Yeah, I noticed this while making the back-branch patch.  I agree that
any given routine had better have a consistent return convention: if
the tupdesc needs to be refcount-decremented when done, that must be
the case in all code paths, since the caller can't be expected to know
which case applies.  My recollection though is that the non-lookup cases
return freshly-built-in-local-storage tupdescs, which could certainly be
set up as having refcount 1 so that the decrement would work correctly.
It may or may not be worth the trouble compared to just copying, though.
You should think about whether a routine is a performance hotspot before
going out of your way to avoid a copy step.

> (Apologies for not getting this done earlier, I had a touch of the flu
> yesterday...)

There's no hurry about it.  Hope you're feeling better.

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Thu, 2006-01-19 at 04:14 -0500, Neil Conway wrote:
> The patch is WIP: a few regression tests fail due to TupleDesc leaks I
> haven't fixed yet but that should be easily fixable, and there are a few
> other minor issues to address. I'm just posting the patch now to get any
> feedback.

Attached is a revised patch: all the regression tests patch, and I'm not
aware of any major remaining issues. Barring any objections, I'll apply
this tomorrow.

(I'll take a look at using refcounting for TupleDescs in other parts of
the system after that...)

-Neil


Attachment

Re: TupleDesc refcounting

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> Attached is a revised patch: all the regression tests patch, and I'm not
> aware of any major remaining issues. Barring any objections, I'll apply
> this tomorrow.

I object ... this is still trying to enforce tupdesc refcounting on much
more of the system than I think useful or prudent.

I'll try to come up with an alternative patch.

            regards, tom lane

Re: TupleDesc refcounting

From
Neil Conway
Date:
On Sun, 2006-01-22 at 13:43 -0500, Tom Lane wrote:
> I object ... this is still trying to enforce tupdesc refcounting on much
> more of the system than I think useful or prudent.

Well, the patch adds management of TupleDesc refcounts solely for the
return value of lookup_rowtype_tupdesc(). If we're going to use
reference counting there at all, I don't see a way do make the
refcounting any less invasive.

-Neil



Re: TupleDesc refcounting

From
Tom Lane
Date:
I'm finally getting back to looking at the problem of reference-counting
cached TupleDescs as was discussed in January.  I had objected to the
last patch Neil posted:
http://archives.postgresql.org/pgsql-patches/2006-01/msg00243.php
on the grounds that it seemed too complicated.  On looking at it
closely, I realize that a lot of the complexity comes from my insistence
that the ResourceOwner mechanism ought to warn about any tupdesc
references that haven't been explicitly released before transaction end.
(Neil had questioned upthread whether that was really necessary ... he
was right.)  The problem is that execQual.c would like to hang onto
tupdesc references across multiple executions of expression nodes, and
there's not any explicit code to clean up expression trees, so no easy
place to put a refcount decrement.  Neil's patch arranges to clean up
such references by using expression shutdown callbacks.  The patch
doesn't quite work as-is because it neglects the fact that shutdown
callbacks are triggered during a node rescan as well as node shutdown.
This could be fixed, but I'm thinking that it's a lot of mechanism to
add when the ResourceOwner could perfectly well clean up the refcounts
instead.

The point of the no-remaining-refs rule was to catch code that should
have released a refcount and failed to.  We could keep that error
checking if we made ResourceOwner recognize two different kinds of
tupdesc refcount, one to warn about and one not to.  I'm not sure if
that's worth the trouble or not --- any opinions?

Also, I looked at whether TupleTableSlots could sensibly use refcounted
tuples, and I'm now of the opinion that it'd be a waste of time.
There aren't any places where the executor uses a tupdesc obtained from
cache as a TupleTableSlot's descriptor; rather, all the tupdescs
involved are built in query-lifespan memory during plan startup.
So I'm thinking we should actually get rid of tts_shouldFreeDesc and
not have execTuples.c do any tupdesc-freeing at all.  Retail freeing
of structures that are in a memory context that's about to be freed
is a waste of cycles.  We've gotten rid of a fair amount of redundant
pfree's that used to occur during plan shutdown, and these are some
more that we could do without.

Comments?

            regards, tom lane

Re: TupleDesc refcounting

From
Tom Lane
Date:
I wrote:
> I'm finally getting back to looking at the problem of reference-counting
> cached TupleDescs as was discussed in January.  I had objected to the
> last patch Neil posted:
> http://archives.postgresql.org/pgsql-patches/2006-01/msg00243.php
> on the grounds that it seemed too complicated.  On looking at it
> closely, I realize that a lot of the complexity comes from my insistence
> that the ResourceOwner mechanism ought to warn about any tupdesc
> references that haven't been explicitly released before transaction end.

After much thrashing I concluded that my original instinct was right and
we really do want to insist on cleanup of tupdesc references during
normal processing.  The approach I tried to take effectively meant
leaking tupdesc references until end of query, which is really bad news
for SQL-language functions --- a function that grabs a tupdesc reference
during plan startup would then accumulate a tupdesc reference during
every invocation, leading to indefinite bloat in the ResourceOwner over
a long query.

So I ended up applying something pretty close to Neil's patch.  I did
modify it to not bother reference-counting tupdescs that aren't actually
in any cache.  I also fixed up TupleTableSlots to do reference-count
processing if ExecSetSlotDescriptor is handed a reference-counted
tupdesc (my assertion yesterday that this never happens was wrong),
but not to bother cleaning up non-ref-counted descriptors.

All in all, a lot of thrashing for only marginal improvement :-(

            regards, tom lane