Thread: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Jeff Davis
Date:
Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
machine.

Is it supposed to pass?

Regards,Jeff Davis



Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
> machine.

What happens exactly?

> Is it supposed to pass?

It shouldn't crash, but I'm not certain whether you'd see any
visible diffs in the tests.
        regards, tom lane


Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Jeff Davis
Date:
On Wed, 2010-09-01 at 15:31 -0400, Tom Lane wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
> > Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
> > machine.
>
> What happens exactly?
>
> > Is it supposed to pass?
>
> It shouldn't crash, but I'm not certain whether you'd see any
> visible diffs in the tests.

I do:

  CFLAGS="-O0 -DRELCACHE_FORCE_RELEASE" ./configure --enable-debug \
    --enable-depend --enable-cassert

I have attached regression.diffs after a "make check". The diffs don't
look trivial, and actually look quite strange to me.

It happens deterministically for me, so I assume that anyone on linux
x86-64 would see the same thing.

Regards,
    Jeff Davis

Attachment

Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> On Wed, 2010-09-01 at 15:31 -0400, Tom Lane wrote:
>> Jeff Davis <pgsql@j-davis.com> writes:
>>> Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my
>>> machine.

>> What happens exactly?

> I have attached regression.diffs after a "make check". The diffs don't
> look trivial, and actually look quite strange to me.

Hmm, sorta looks like that breaks something related to checking for
composite types.  That would be a bug.  Will look.
        regards, tom lane


Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
I wrote:
> Jeff Davis <pgsql@j-davis.com> writes:
>> I have attached regression.diffs after a "make check". The diffs don't
>> look trivial, and actually look quite strange to me.

> Hmm, sorta looks like that breaks something related to checking for
> composite types.  That would be a bug.  Will look.

So here is the culprit:

#0  0x00000000007c08e7 in flush_rowtype_cache (type_id=49022) at typcache.c:531
#1  0x00000000007b4797 in RelationClearRelation (relation=0x7fc490ccebe0,    rebuild=0 '\000') at relcache.c:1929
#2  0x00000000007b4131 in RelationClose (relation=0x7fc490ccebe0)   at relcache.c:1680
#3  0x0000000000472721 in relation_close (relation=0x7fc490ccebe0, lockmode=1)   at heapam.c:1051
#4  0x00000000007c01fe in lookup_type_cache (type_id=49022, flags=64)   at typcache.c:299
#5  0x00000000007c023f in lookup_rowtype_tupdesc_internal (type_id=49022,    typmod=-1, noError=0 '\000') at
typcache.c:321
#6  0x00000000007c03e7 in lookup_rowtype_tupdesc_copy (type_id=49022,    typmod=-1) at typcache.c:395
#7  0x00000000007cd087 in get_expr_result_type (expr=0x1395d08,    resultTypeId=0x0, resultTupleDesc=0x7fffbd6e5830) at
funcapi.c:256

lookup_type_cache() opens the relcache entry to copy its tupdesc, does
so, and closes the relcache entry again.  But then (if
RELCACHE_FORCE_RELEASE is on) we blow away the relcache entry ...
and RelationClearRelation carefully reaches around and blows away
the tupdesc in the typcache too!

So that's overly aggressive, but I'm not sure what's a simple fix for
it.  If we just take out that flush_rowtype_cache call in
RelationClearRelation, I think we run the risk of the typcache not being
flushed when it needs to be, namely when an ALTER TABLE changes the
table's rowtype at an instant when some particular backend doesn't
have an active relcache entry for it.

Probably the best fix would be to make typcache flushing fully
independent of the relcache, but that would mean making sure that all
ALTER TABLE variants that affect the rowtype will issue an explicit
typcache flush.  That seems a bit too invasive to be back-patchable.
I'm not entirely sure this sort of failure can occur without
RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
backpatchable fix would be nice.

Thoughts?
        regards, tom lane


Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I have attached regression.diffs after a "make check". The diffs don't
> look trivial, and actually look quite strange to me.

BTW, if I dike out the flush_rowtype_cache call at relcache.c:1929,
the regression tests do pass for me, so it seems there is only one
bug exposed by this test (as of HEAD anyway).  I don't find that
to be a credible real fix, though, as stated previously.

Once we do have a fix in place for this, it might be a good idea for
someone to run a buildfarm member with -DRELCACHE_FORCE_RELEASE ...
        regards, tom lane


Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
I wrote:
> Probably the best fix would be to make typcache flushing fully
> independent of the relcache, but that would mean making sure that all
> ALTER TABLE variants that affect the rowtype will issue an explicit
> typcache flush.  That seems a bit too invasive to be back-patchable.
> I'm not entirely sure this sort of failure can occur without
> RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
> backpatchable fix would be nice.

After a bit more study it seems that there is a reasonably
back-patchable approach to this.  We can continue to drive flushing of
composite-type typcache entries off of relcache flush, but it has to
occur when we do RelationCacheInvalidateEntry() or
RelationCacheInvalidate() due to a SI invalidate event, not just
anytime a relcache entry is closed.  We can do that by plugging in a
callback function with CacheRegisterRelcacheCallback.

Because the callback will only have the relation OID not the type OID,
it will have to scan the whole TypeCacheHash to see if there's a
matching entry.  However, that's not as bad as it sounds, because there
aren't likely to be very many entries in that hashtable.  I put in some
quick-hack instrumentation to see how big the table gets during the
regression tests, and find that of the hundred-odd backends launched
during the tests, none get above 26 typcache entries, and only 8 get as
many as 10 entries.  Based on those numbers, I'm not sure it'd ever be
worth adding the additional infrastructure to allow a direct hash lookup
instead.
        regards, tom lane


Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Jeff Davis
Date:
On Wed, 2010-09-01 at 20:57 -0400, Tom Lane wrote:
> I wrote:
> > Probably the best fix would be to make typcache flushing fully
> > independent of the relcache, but that would mean making sure that all
> > ALTER TABLE variants that affect the rowtype will issue an explicit
> > typcache flush.  That seems a bit too invasive to be back-patchable.
> > I'm not entirely sure this sort of failure can occur without
> > RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a
> > backpatchable fix would be nice.
> 
> After a bit more study it seems that there is a reasonably
> back-patchable approach to this.  We can continue to drive flushing of
> composite-type typcache entries off of relcache flush, but it has to
> occur when we do RelationCacheInvalidateEntry() or
> RelationCacheInvalidate() due to a SI invalidate event, not just
> anytime a relcache entry is closed.  We can do that by plugging in a
> callback function with CacheRegisterRelcacheCallback.
> 

I think I see how this fixes the problem, but I still don't completely
understand.

Why can't we just make a real copy of the tuple descriptor for the type
cache entry, rather than sharing it between the relcache and the type
cache?

Thank you for the quick response.

Regards,Jeff Davis



Re: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression

From
Tom Lane
Date:
Jeff Davis <pgsql@j-davis.com> writes:
> I think I see how this fixes the problem, but I still don't completely
> understand.

> Why can't we just make a real copy of the tuple descriptor for the type
> cache entry, rather than sharing it between the relcache and the type
> cache?

The issue isn't really about whether we're sharing the physical copy of
the tupdesc.  The problem the code is trying to deal with is making sure
that the typcache's copy gets thrown away (so it can be refreshed on
next use) when the relation's rowtype changes, due to ALTER TABLE ADD
COLUMN for example.  So we need to do that whenever we get a SI inval
event for the rel.  We were driving that purely off of relcache
flushes, which meant that discarding a relcache entry had to force a
typcache flush, since nothing would happen if a SI inval arrived at an
instant where we had no relcache entry for the rel.  Now the typcache is
wired directly to the SI inval events, so it'll get a call whether there
is a corresponding relcache entry or not.
        regards, tom lane