Re: [HACKERS] couldn't rollback cache ? - Mailing list pgsql-hackers
From | Tom Lane |
---|---|
Subject | Re: [HACKERS] couldn't rollback cache ? |
Date | |
Msg-id | 29176.937576653@sss.pgh.pa.us Whole thread Raw |
In response to | couldn't rollback cache ? ("Hiroshi Inoue" <Inoue@tpf.co.jp>) |
Responses |
RE: [HACKERS] couldn't rollback cache ?
|
List | pgsql-hackers |
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > But as far as I see,neither relation cache nor system catalog cache > aren't be rollbacked correctly. > I added time_qualification_check to SearchSysCache() on trial > (see the patch at the end of this posting). Hmm. This must be a bug of very long standing; surprising it hasn't been noticed before. I think you are probably right, because a little glimpsing shows that SearchSysCache() is the *only* place in the whole system where HeapKeyTest() is called directly --- everyone else goes through HeapTupleSatisfies() which adds a timequal check of one sort or another. I don't know the timequal stuff at all, but it seems likely that we want one here. (Vadim, is this fix right?) > After this change, > abort; > ABORT > select * from t1; > ERROR: Relation t1 does not have attribute dt1 > Seems relation cache is not invalidated yet. > I also tried to add time_qualification_check to RelationId(Name)- > CacheGetRelation(). But unfortunately,Relation doesn't have > such a information. I think the real bug here is in inval.c: see InvalidationMessageCacheInvalidate, which scans pending SI messages at abort time. If we had committed, we'd have sent an SI message telling other backends to refresh their relcache entries for t1; so there is an entry for t1 in the pending-SI-message list. We can use that entry to tell us to invalidate our own relcache entry instead. It looks like this is done correctly for tuple SI messages but not for relation SI messages --- and in fact the code sez /* XXX ignore this--is this correct ??? */ Evidently not. (BTW, please add some comments to this code! It's not obvious that what it's doing is throwing away cache entries that have been changed by a transaction now being aborted.) It would probably be a good idea to switch the order of the operations in AtAbort_Cache() in xact.c, so that relcache reference counts get reset to 0 before we do the pending-SI-message scan. That way, we could just discard the bogus relcache entry and not waste time rebuilding an entry we might not need again. (This might even be essential to avoid an error in the aborted-table-create case; not sure. The routine RelationCacheAbort() didn't exist till last week, so the present call order is certainly not gospel.) We could probably do this in other ways too, like marking all relcache entries with the transaction ID of their last change and using that to detect what to throw away. But the SI message queue is already there so might as well use it... regards, tom lane
pgsql-hackers by date: