Thread: couldn't rollback cache ?
Hello all, Cache invalidation mechanism was much improved. Thanks to Tom. But as far as I see,neither relation cache nor system catalog cache aren't be rollbacked correctly. This should be solved if we would execute DDL statement inside transactions. For example, create table t1 (id int4);CREATEbegin;BEGINalter table t1 add column dt1 text;ADDselect * from t1;id|dt1--+---(0 rows) abort;ABORTvisco=> select * from t1;id|dt1--+---(0 rows) I added time_qualification_check to SearchSysCache() on trial (see the patch at the end of this posting). After this change,..abort;ABORTselect * from t1;ERROR: Relation t1 does not have attribute dt1 Seems relation cache is not invalidated yet. I also tried to add time_qualification_check to RelationId(Name)- CacheGetRelation(). But unfortunately,Relation doesn't have such a information. Any ideas ? Comments ? Regards. Hiroshi Inoue Inoue@tpf.co.jp *** utils/cache/catcache.c.orig Mon Jul 26 12:45:14 1999 --- utils/cache/catcache.c Fri Sep 17 08:57:50 1999 *************** *** 872,878 **** cache->cc_skey, res); if (res) ! break; } /* ---------------- --- 872,881 ---- cache->cc_skey, res); if (res) ! { ! if (HeapTupleSatisfiesNow(ct->ct_tup->t_data)) ! break; ! } } /* ----------------
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > But as far as I see,neither relation cache nor system catalog cache > aren't be rollbacked correctly. > I added time_qualification_check to SearchSysCache() on trial > (see the patch at the end of this posting). Hmm. This must be a bug of very long standing; surprising it hasn't been noticed before. I think you are probably right, because a little glimpsing shows that SearchSysCache() is the *only* place in the whole system where HeapKeyTest() is called directly --- everyone else goes through HeapTupleSatisfies() which adds a timequal check of one sort or another. I don't know the timequal stuff at all, but it seems likely that we want one here. (Vadim, is this fix right?) > After this change, > abort; > ABORT > select * from t1; > ERROR: Relation t1 does not have attribute dt1 > Seems relation cache is not invalidated yet. > I also tried to add time_qualification_check to RelationId(Name)- > CacheGetRelation(). But unfortunately,Relation doesn't have > such a information. I think the real bug here is in inval.c: see InvalidationMessageCacheInvalidate, which scans pending SI messages at abort time. If we had committed, we'd have sent an SI message telling other backends to refresh their relcache entries for t1; so there is an entry for t1 in the pending-SI-message list. We can use that entry to tell us to invalidate our own relcache entry instead. It looks like this is done correctly for tuple SI messages but not for relation SI messages --- and in fact the code sez /* XXX ignore this--is this correct ??? */ Evidently not. (BTW, please add some comments to this code! It's not obvious that what it's doing is throwing away cache entries that have been changed by a transaction now being aborted.) It would probably be a good idea to switch the order of the operations in AtAbort_Cache() in xact.c, so that relcache reference counts get reset to 0 before we do the pending-SI-message scan. That way, we could just discard the bogus relcache entry and not waste time rebuilding an entry we might not need again. (This might even be essential to avoid an error in the aborted-table-create case; not sure. The routine RelationCacheAbort() didn't exist till last week, so the present call order is certainly not gospel.) We could probably do this in other ways too, like marking all relcache entries with the transaction ID of their last change and using that to detect what to throw away. But the SI message queue is already there so might as well use it... regards, tom lane
Tom Lane wrote: > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > But as far as I see,neither relation cache nor system catalog cache > > aren't be rollbacked correctly. > > I added time_qualification_check to SearchSysCache() on trial > > (see the patch at the end of this posting). > > Hmm. This must be a bug of very long standing; surprising it hasn't > been noticed before. I think you are probably right, because a little > glimpsing shows that SearchSysCache() is the *only* place in the whole > system where HeapKeyTest() is called directly --- everyone else goes > through HeapTupleSatisfies() which adds a timequal check of one sort or > another. I don't know the timequal stuff at all, but it seems likely > that we want one here. (Vadim, is this fix right?) Sorry, but currently I have no ability to deal with anything but WAL. As for cache/SI issues, I would like to see shared catalog cache implemented (and remove current SI stuff), but I was not able to do it for ~ 2 years, -:( Vadim
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Friday, September 17, 1999 10:58 PM > To: Hiroshi Inoue > Cc: pgsql-hackers > Subject: Re: [HACKERS] couldn't rollback cache ? > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > But as far as I see,neither relation cache nor system catalog cache > > aren't be rollbacked correctly. > > I added time_qualification_check to SearchSysCache() on trial > > (see the patch at the end of this posting). > > Hmm. This must be a bug of very long standing; surprising it hasn't > been noticed before. I think you are probably right, because a little > glimpsing shows that SearchSysCache() is the *only* place in the whole > system where HeapKeyTest() is called directly --- everyone else goes > through HeapTupleSatisfies() which adds a timequal check of one sort or > another. I don't know the timequal stuff at all, but it seems likely > that we want one here. (Vadim, is this fix right?) > > > After this change, > > abort; > > ABORT > > select * from t1; > > ERROR: Relation t1 does not have attribute dt1 > > > Seems relation cache is not invalidated yet. > > I also tried to add time_qualification_check to RelationId(Name)- > > CacheGetRelation(). But unfortunately,Relation doesn't have > > such a information. > > I think the real bug here is in inval.c: see > InvalidationMessageCacheInvalidate, which scans pending SI messages > at abort time. If we had committed, we'd have sent an SI message > telling other backends to refresh their relcache entries for t1; > so there is an entry for t1 in the pending-SI-message list. We can > use that entry to tell us to invalidate our own relcache entry instead. > It looks like this is done correctly for tuple SI messages but not for I think it's not done correctly for tuple SI messages either. I didn't use current cache invalidation mechanism when I made the patch for SearchSysCache() because of the following 2 reasons. 1. SI messages are eaten by CommandCounterIncrement(). So they may vanish before transaction end/aborts. 2. The tuples which should be invalidated in case of abort are different from ones in case of commit. In case of commit,deletingold tuples should be invalidated for all backends. In case of abort,insert(updat)ing new tuples shouldbe invalidated for the insert(updat)ing backend. Currently heap_insert() calls RelationInvalidateHeapTuple() fora inserting new tuple but heap_replace() doesn't call RelationInvalid- ateHeapTuple() for a updating new tuple. Idon't understand which is right. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > I think it's not done correctly for tuple SI messages either. > I didn't use current cache invalidation mechanism when I made the > patch for SearchSysCache() because of the following 2 reasons. > 1. SI messages are eaten by CommandCounterIncrement(). So they > may vanish before transaction end/aborts. I think this is OK. The sending backend does not send the SI message in the first place until it has committed. Other backends can read the messages at CommandCounterIncrement; it doesn't matter whether the other backends later commit or abort their own transactions. I think. Do you have a counterexample? > 2. The tuples which should be invalidated in case of abort are different > from ones in case of commit. > In case of commit,deleting old tuples should be invalidated for all > backends. > In case of abort,insert(updat)ing new tuples should be invalidated > for the insert(updat)ing backend. I wonder whether it wouldn't be cleaner to identify the target tuples by OID instead of ItemPointer. That way would work for both new and update tuples... > Currently heap_insert() calls RelationInvalidateHeapTuple() for a > inserting new tuple but heap_replace() doesn't call RelationInvalid- > ateHeapTuple() for a updating new tuple. I don't understand which > is right. Hmm. Invalidating the old tuple is the right thing for heap_replace in terms of sending a message to other backends at commit; it's the old tuple that they might have cached and need to get rid of. But for getting rid of this backend's uncommitted new tuple in case of abort, it's not the right thing. OTOH, your change to add a time qual check to SearchSysCache would fix that, wouldn't it? But invalidating by OID would make the issue moot. Possibly heap_insert doesn't need to be calling RelationInvalidateHeapTuple at all --- a new tuple can't be cached by any other backend, by definition, until it has committed; so there's no need to send out an SI message for it. That call must be there to ensure that the local cache gets purged of the tuple in case of abort. Maybe we could remove that call (and reduce SI traffic) if we rely on a time qual to purge bogus entries from the local caches after abort. regards, tom lane
> -----Original Message----- > From: Tom Lane [mailto:tgl@sss.pgh.pa.us] > Sent: Monday, September 20, 1999 11:28 PM > To: Hiroshi Inoue > Cc: pgsql-hackers > Subject: Re: [HACKERS] couldn't rollback cache ? > > > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > > I think it's not done correctly for tuple SI messages either. > > I didn't use current cache invalidation mechanism when I made the > > patch for SearchSysCache() because of the following 2 reasons. > > > 1. SI messages are eaten by CommandCounterIncrement(). So they > > may vanish before transaction end/aborts. > > I think this is OK. The sending backend does not send the SI message > in the first place until it has committed. Other backends can read Doesn't the sending backend send the SI message when Command- CounterIncrement() is executed ? AtCommit_Cache() is called not only from CommitTransaction() but also from CommandCounterIncrement(). AtCommit_Cache() in CommandCounterIncrement() eats local invalidation messages and register SI information (this seems too early for other backends though it's not so harmful). Then AtAtart_ Cache() eats the SI information and invalidates related syscache and relcache for the backend(this seems right). At this point,invali- dation info for the backend vanishes. Isn't it right ? > the messages at CommandCounterIncrement; it doesn't matter whether the > other backends later commit or abort their own transactions. I think. > Do you have a counterexample? > > > 2. The tuples which should be invalidated in case of abort are different > > from ones in case of commit. > > In case of commit,deleting old tuples should be invalidated for all > > backends. > > In case of abort,insert(updat)ing new tuples should be invalidated > > for the insert(updat)ing backend. > > I wonder whether it wouldn't be cleaner to identify the target tuples > by OID instead of ItemPointer. That way would work for both new and > update tuples... > This may be a better way because the cache entry which should be invalidated are invalidated. However,we may invalidate still valid cache entry by OID(it's not so harmful). Even time qualification is useless in this case. > > Currently heap_insert() calls RelationInvalidateHeapTuple() for a > > inserting new tuple but heap_replace() doesn't call RelationInvalid- > > ateHeapTuple() for a updating new tuple. I don't understand which > > is right. > > Hmm. Invalidating the old tuple is the right thing for heap_replace in > terms of sending a message to other backends at commit; it's the old > tuple that they might have cached and need to get rid of. But for > getting rid of this backend's uncommitted new tuple in case of abort, > it's not the right thing. OTOH, your change to add a time qual check > to SearchSysCache would fix that, wouldn't it? Probably. Because time qualification is applied for uncommitted tuples. Regards. Hiroshi Inoue Inoue@tpf.co.jp
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes: >> I think this is OK. The sending backend does not send the SI message >> in the first place until it has committed. Other backends can read > Doesn't the sending backend send the SI message when Command- > CounterIncrement() is executed ? > AtCommit_Cache() is called not only from CommitTransaction() but > also from CommandCounterIncrement(). Oooh, you are right. I think that is a bug. We should postpone sending SI messages until commit. Also, if I recall correctly, CommandCounterIncrement() still gets called after we have decided to abort the current transaction (while we are eating commands looking for END or ABORT). It shouldn't do anything to the pending-SI list then either. > AtCommit_Cache() in CommandCounterIncrement() eats local > invalidation messages and register SI information (this seems too > early for other backends though it's not so harmful). Then AtAtart_ > Cache() eats the SI information and invalidates related syscache > and relcache for the backend(this seems right). At this point,invali- > dation info for the backend vanishes. Isn't it right ? I think it is OK for AtStart_Cache to read *incoming* SI messages, if those relate to transactions that other backends have committed. But we should sit on our own list of pending outgoing messages until we know we are committing (or aborting). >> I wonder whether it wouldn't be cleaner to identify the target tuples >> by OID instead of ItemPointer. That way would work for both new and >> update tuples... > This may be a better way because the cache entry which should be > invalidated are invalidated. > However,we may invalidate still valid cache entry by OID(it's not so > harmful). Even time qualification is useless in this case. Doesn't bother me --- we'll just re-read it. We'd have to do some work in that case anyway to verify whether we have the correct copy of the tuple. >> OTOH, your change to add a time qual check >> to SearchSysCache would fix that, wouldn't it? > Probably. Because time qualification is applied for uncommitted tuples. One thing we need to think about here: as it stands, the syscache will only store a single copy of any particular tuple (maybe I should say "of a particular OID"). But there might be several copies of that tuple with different t_min/t_max in the database. With your change to check time qual, as soon as we realize that the copy we have no longer SatisfiesNow(), we'll go look for a new copy. And we'll go look for a new copy after receiving a SI message indicating someone else has committed an update. The question is, are there any *other* times where we need to look for a new copy? I think we are OK if we change SI message sending to only send after commit, but I'm not sure about it. regards, tom lane
> > "Hiroshi Inoue" <Inoue@tpf.co.jp> writes: > >> I think this is OK. The sending backend does not send the SI message > >> in the first place until it has committed. Other backends can read > > > Doesn't the sending backend send the SI message when Command- > > CounterIncrement() is executed ? > > AtCommit_Cache() is called not only from CommitTransaction() but > > also from CommandCounterIncrement(). > > Oooh, you are right. I think that is a bug. We should postpone > sending SI messages until commit. Also, if I recall correctly, > CommandCounterIncrement() still gets called after we have decided to > abort the current transaction (while we are eating commands looking for > END or ABORT). It shouldn't do anything to the pending-SI list then > either. > > > AtCommit_Cache() in CommandCounterIncrement() eats local > > invalidation messages and register SI information (this seems too > > early for other backends though it's not so harmful). Then AtAtart_ > > Cache() eats the SI information and invalidates related syscache > > and relcache for the backend(this seems right). At this point,invali- > > dation info for the backend vanishes. Isn't it right ? > > I think it is OK for AtStart_Cache to read *incoming* SI messages, > if those relate to transactions that other backends have committed. > But we should sit on our own list of pending outgoing messages until > we know we are committing (or aborting). > What about delet(updat)ing backend itself ? Shouldn't the backend invalidate delet(updat)ing old tuples ? > >> I wonder whether it wouldn't be cleaner to identify the target tuples > >> by OID instead of ItemPointer. That way would work for both new and > >> update tuples... > [snip] > One thing we need to think about here: as it stands, the syscache will > only store a single copy of any particular tuple (maybe I should say > "of a particular OID"). But there might be several copies of that tuple > with different t_min/t_max in the database. With your change to check > time qual, as soon as we realize that the copy we have no longer > SatisfiesNow(), we'll go look for a new copy. And we'll go look > for a new copy after receiving a SI message indicating someone else has > committed an update. The question is, are there any *other* times where > we need to look for a new copy? I think we are OK if we change SI > message sending to only send after commit, but I'm not sure about it. > What kind of tuples are read into system catalog cache ? And what should we do to invalidate the tuples just in time ? As far as I see, 1. HEAP_XMIN_INVALID i.e the tuple wasn't inserted No backend regards this tuple as valid. 2. HEAP_XMIN_??????? && HEAP_XMAX_INVALID i.e the tuple is being inserted now. Only inserting backend regards this tupleas valid. Time qualification check which my patch does would work in this case. Otherwise SI message should besent to the backend when insertion is rollbacked. 3. HEAP_XMIN_??????? && HEAP_XMAX_??????? i.e the tuple is being deleted after insertion in a transaction now. Nobackend regards this tuple as valid. 4. HEAP_XMIN_COMMITTED && HEAP_XMAX_INVALID i.e the tuple is inserted,not deleted and not being deleted. HeapTupleSatisifies..()doesn't take effect. SI message should be sent to all backends immediately after the tuple wasdeleted. SI message should be sent to a backend immediately after the backend marked the tuple as being deleted. 5. HEAP_XMIN_COMMITTED && HEAP_XMAX_??????? i.e the tuple is being deleted now. Deleting backend doesn't regard thistuple as valid. If SI messages are postponed to send for other backends until commit,the tuple is invalidated correctly. Otherwise a check as my patch does would be necessary. 6. HEAP_XMAX_COMMITTED i.e the tuple is deleted SnapshotNow never regard this tuple as valid. Regards. Hiroshi Inoue Inoue@tpf.co.jp