Thread: compiling with RELCACHE_FORCE_RELEASE doesn't pass regression
Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my machine. Is it supposed to pass? Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my > machine. What happens exactly? > Is it supposed to pass? It shouldn't crash, but I'm not certain whether you'd see any visible diffs in the tests. regards, tom lane
On Wed, 2010-09-01 at 15:31 -0400, Tom Lane wrote: > Jeff Davis <pgsql@j-davis.com> writes: > > Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my > > machine. > > What happens exactly? > > > Is it supposed to pass? > > It shouldn't crash, but I'm not certain whether you'd see any > visible diffs in the tests. I do: CFLAGS="-O0 -DRELCACHE_FORCE_RELEASE" ./configure --enable-debug \ --enable-depend --enable-cassert I have attached regression.diffs after a "make check". The diffs don't look trivial, and actually look quite strange to me. It happens deterministically for me, so I assume that anyone on linux x86-64 would see the same thing. Regards, Jeff Davis
Attachment
Jeff Davis <pgsql@j-davis.com> writes: > On Wed, 2010-09-01 at 15:31 -0400, Tom Lane wrote: >> Jeff Davis <pgsql@j-davis.com> writes: >>> Compiling with RELCACHE_FORCE_RELEASE doesn't pass "make check" on my >>> machine. >> What happens exactly? > I have attached regression.diffs after a "make check". The diffs don't > look trivial, and actually look quite strange to me. Hmm, sorta looks like that breaks something related to checking for composite types. That would be a bug. Will look. regards, tom lane
I wrote: > Jeff Davis <pgsql@j-davis.com> writes: >> I have attached regression.diffs after a "make check". The diffs don't >> look trivial, and actually look quite strange to me. > Hmm, sorta looks like that breaks something related to checking for > composite types. That would be a bug. Will look. So here is the culprit: #0 0x00000000007c08e7 in flush_rowtype_cache (type_id=49022) at typcache.c:531 #1 0x00000000007b4797 in RelationClearRelation (relation=0x7fc490ccebe0, rebuild=0 '\000') at relcache.c:1929 #2 0x00000000007b4131 in RelationClose (relation=0x7fc490ccebe0) at relcache.c:1680 #3 0x0000000000472721 in relation_close (relation=0x7fc490ccebe0, lockmode=1) at heapam.c:1051 #4 0x00000000007c01fe in lookup_type_cache (type_id=49022, flags=64) at typcache.c:299 #5 0x00000000007c023f in lookup_rowtype_tupdesc_internal (type_id=49022, typmod=-1, noError=0 '\000') at typcache.c:321 #6 0x00000000007c03e7 in lookup_rowtype_tupdesc_copy (type_id=49022, typmod=-1) at typcache.c:395 #7 0x00000000007cd087 in get_expr_result_type (expr=0x1395d08, resultTypeId=0x0, resultTupleDesc=0x7fffbd6e5830) at funcapi.c:256 lookup_type_cache() opens the relcache entry to copy its tupdesc, does so, and closes the relcache entry again. But then (if RELCACHE_FORCE_RELEASE is on) we blow away the relcache entry ... and RelationClearRelation carefully reaches around and blows away the tupdesc in the typcache too! So that's overly aggressive, but I'm not sure what's a simple fix for it. If we just take out that flush_rowtype_cache call in RelationClearRelation, I think we run the risk of the typcache not being flushed when it needs to be, namely when an ALTER TABLE changes the table's rowtype at an instant when some particular backend doesn't have an active relcache entry for it. Probably the best fix would be to make typcache flushing fully independent of the relcache, but that would mean making sure that all ALTER TABLE variants that affect the rowtype will issue an explicit typcache flush. That seems a bit too invasive to be back-patchable. I'm not entirely sure this sort of failure can occur without RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a backpatchable fix would be nice. Thoughts? regards, tom lane
Jeff Davis <pgsql@j-davis.com> writes: > I have attached regression.diffs after a "make check". The diffs don't > look trivial, and actually look quite strange to me. BTW, if I dike out the flush_rowtype_cache call at relcache.c:1929, the regression tests do pass for me, so it seems there is only one bug exposed by this test (as of HEAD anyway). I don't find that to be a credible real fix, though, as stated previously. Once we do have a fix in place for this, it might be a good idea for someone to run a buildfarm member with -DRELCACHE_FORCE_RELEASE ... regards, tom lane
I wrote: > Probably the best fix would be to make typcache flushing fully > independent of the relcache, but that would mean making sure that all > ALTER TABLE variants that affect the rowtype will issue an explicit > typcache flush. That seems a bit too invasive to be back-patchable. > I'm not entirely sure this sort of failure can occur without > RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a > backpatchable fix would be nice. After a bit more study it seems that there is a reasonably back-patchable approach to this. We can continue to drive flushing of composite-type typcache entries off of relcache flush, but it has to occur when we do RelationCacheInvalidateEntry() or RelationCacheInvalidate() due to a SI invalidate event, not just anytime a relcache entry is closed. We can do that by plugging in a callback function with CacheRegisterRelcacheCallback. Because the callback will only have the relation OID not the type OID, it will have to scan the whole TypeCacheHash to see if there's a matching entry. However, that's not as bad as it sounds, because there aren't likely to be very many entries in that hashtable. I put in some quick-hack instrumentation to see how big the table gets during the regression tests, and find that of the hundred-odd backends launched during the tests, none get above 26 typcache entries, and only 8 get as many as 10 entries. Based on those numbers, I'm not sure it'd ever be worth adding the additional infrastructure to allow a direct hash lookup instead. regards, tom lane
On Wed, 2010-09-01 at 20:57 -0400, Tom Lane wrote: > I wrote: > > Probably the best fix would be to make typcache flushing fully > > independent of the relcache, but that would mean making sure that all > > ALTER TABLE variants that affect the rowtype will issue an explicit > > typcache flush. That seems a bit too invasive to be back-patchable. > > I'm not entirely sure this sort of failure can occur without > > RELCACHE_FORCE_RELEASE, but I'm definitely not sure it can't, so a > > backpatchable fix would be nice. > > After a bit more study it seems that there is a reasonably > back-patchable approach to this. We can continue to drive flushing of > composite-type typcache entries off of relcache flush, but it has to > occur when we do RelationCacheInvalidateEntry() or > RelationCacheInvalidate() due to a SI invalidate event, not just > anytime a relcache entry is closed. We can do that by plugging in a > callback function with CacheRegisterRelcacheCallback. > I think I see how this fixes the problem, but I still don't completely understand. Why can't we just make a real copy of the tuple descriptor for the type cache entry, rather than sharing it between the relcache and the type cache? Thank you for the quick response. Regards,Jeff Davis
Jeff Davis <pgsql@j-davis.com> writes: > I think I see how this fixes the problem, but I still don't completely > understand. > Why can't we just make a real copy of the tuple descriptor for the type > cache entry, rather than sharing it between the relcache and the type > cache? The issue isn't really about whether we're sharing the physical copy of the tupdesc. The problem the code is trying to deal with is making sure that the typcache's copy gets thrown away (so it can be refreshed on next use) when the relation's rowtype changes, due to ALTER TABLE ADD COLUMN for example. So we need to do that whenever we get a SI inval event for the rel. We were driving that purely off of relcache flushes, which meant that discarding a relcache entry had to force a typcache flush, since nothing would happen if a SI inval arrived at an instant where we had no relcache entry for the rel. Now the typcache is wired directly to the SI inval events, so it'll get a call whether there is a corresponding relcache entry or not. regards, tom lane