Thread: couldn't rollback cache ?

couldn't rollback cache ?

From
"Hiroshi Inoue"
Date:
Hello all,

Cache invalidation mechanism was much improved. 
Thanks to Tom.

But as far as I see,neither relation cache nor system catalog cache 
aren't be rollbacked correctly.
This should be solved if we would execute DDL statement inside 
transactions. 

For example,
create table t1 (id int4);CREATEbegin;BEGINalter table t1 add column dt1 text;ADDselect * from t1;id|dt1--+---(0 rows)
abort;ABORTvisco=> select * from t1;id|dt1--+---(0 rows)

I added time_qualification_check to SearchSysCache() on trial
(see the patch at the end of this posting).

After this change,..abort;ABORTselect * from t1;ERROR:  Relation t1 does not have attribute dt1

Seems relation cache is not invalidated yet.
I also tried to add time_qualification_check to RelationId(Name)-
CacheGetRelation(). But unfortunately,Relation doesn't have
such a information.

Any ideas ?
Comments ?

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp

*** utils/cache/catcache.c.orig    Mon Jul 26 12:45:14 1999
--- utils/cache/catcache.c    Fri Sep 17 08:57:50 1999
***************
*** 872,878 ****                     cache->cc_skey,                     res);         if (res)
!             break;     }      /* ----------------
--- 872,881 ----                     cache->cc_skey,                     res);         if (res)
!         {
!             if (HeapTupleSatisfiesNow(ct->ct_tup->t_data))
!                 break;
!         }     }      /* ----------------



Re: [HACKERS] couldn't rollback cache ?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> But as far as I see,neither relation cache nor system catalog cache 
> aren't be rollbacked correctly.
> I added time_qualification_check to SearchSysCache() on trial
> (see the patch at the end of this posting).

Hmm.  This must be a bug of very long standing; surprising it hasn't
been noticed before.  I think you are probably right, because a little
glimpsing shows that SearchSysCache() is the *only* place in the whole
system where HeapKeyTest() is called directly --- everyone else goes
through HeapTupleSatisfies() which adds a timequal check of one sort or
another.  I don't know the timequal stuff at all, but it seems likely
that we want one here.  (Vadim, is this fix right?)

> After this change,
>     abort;
>     ABORT
>     select * from t1;
>     ERROR:  Relation t1 does not have attribute dt1

> Seems relation cache is not invalidated yet.
> I also tried to add time_qualification_check to RelationId(Name)-
> CacheGetRelation(). But unfortunately,Relation doesn't have
> such a information.

I think the real bug here is in inval.c: see
InvalidationMessageCacheInvalidate, which scans pending SI messages
at abort time.  If we had committed, we'd have sent an SI message
telling other backends to refresh their relcache entries for t1;
so there is an entry for t1 in the pending-SI-message list.  We can
use that entry to tell us to invalidate our own relcache entry instead.
It looks like this is done correctly for tuple SI messages but not for
relation SI messages --- and in fact the code sez        /* XXX ignore this--is this correct ??? */
Evidently not.  (BTW, please add some comments to this code!  It's
not obvious that what it's doing is throwing away cache entries that
have been changed by a transaction now being aborted.)

It would probably be a good idea to switch the order of the operations
in AtAbort_Cache() in xact.c, so that relcache reference counts get
reset to 0 before we do the pending-SI-message scan.  That way, we could
just discard the bogus relcache entry and not waste time rebuilding an
entry we might not need again.  (This might even be essential to avoid
an error in the aborted-table-create case; not sure.  The routine
RelationCacheAbort() didn't exist till last week, so the present call
order is certainly not gospel.)

We could probably do this in other ways too, like marking all relcache
entries with the transaction ID of their last change and using that to
detect what to throw away.  But the SI message queue is already there
so might as well use it...
        regards, tom lane


Re: [HACKERS] couldn't rollback cache ?

From
Vadim Mikheev
Date:
Tom Lane wrote:
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > But as far as I see,neither relation cache nor system catalog cache
> > aren't be rollbacked correctly.
> > I added time_qualification_check to SearchSysCache() on trial
> > (see the patch at the end of this posting).
> 
> Hmm.  This must be a bug of very long standing; surprising it hasn't
> been noticed before.  I think you are probably right, because a little
> glimpsing shows that SearchSysCache() is the *only* place in the whole
> system where HeapKeyTest() is called directly --- everyone else goes
> through HeapTupleSatisfies() which adds a timequal check of one sort or
> another.  I don't know the timequal stuff at all, but it seems likely
> that we want one here.  (Vadim, is this fix right?)

Sorry, but currently I have no ability to deal with anything
but WAL. As for cache/SI issues, I would like to see shared 
catalog cache implemented (and remove current SI stuff),
but I was not able to do it for ~ 2 years, -:(

Vadim


RE: [HACKERS] couldn't rollback cache ?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Friday, September 17, 1999 10:58 PM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] couldn't rollback cache ? 
> 
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > But as far as I see,neither relation cache nor system catalog cache 
> > aren't be rollbacked correctly.
> > I added time_qualification_check to SearchSysCache() on trial
> > (see the patch at the end of this posting).
> 
> Hmm.  This must be a bug of very long standing; surprising it hasn't
> been noticed before.  I think you are probably right, because a little
> glimpsing shows that SearchSysCache() is the *only* place in the whole
> system where HeapKeyTest() is called directly --- everyone else goes
> through HeapTupleSatisfies() which adds a timequal check of one sort or
> another.  I don't know the timequal stuff at all, but it seems likely
> that we want one here.  (Vadim, is this fix right?)
> 
> > After this change,
> >     abort;
> >     ABORT
> >     select * from t1;
> >     ERROR:  Relation t1 does not have attribute dt1
> 
> > Seems relation cache is not invalidated yet.
> > I also tried to add time_qualification_check to RelationId(Name)-
> > CacheGetRelation(). But unfortunately,Relation doesn't have
> > such a information.
> 
> I think the real bug here is in inval.c: see
> InvalidationMessageCacheInvalidate, which scans pending SI messages
> at abort time.  If we had committed, we'd have sent an SI message
> telling other backends to refresh their relcache entries for t1;
> so there is an entry for t1 in the pending-SI-message list.  We can
> use that entry to tell us to invalidate our own relcache entry instead.
> It looks like this is done correctly for tuple SI messages but not for

I think it's not done correctly for tuple SI messages either.
I didn't use current cache invalidation mechanism when I made the
patch for SearchSysCache() because of the following 2 reasons. 

1. SI messages are eaten by CommandCounterIncrement(). So they   may vanish before transaction end/aborts.
2. The tuples which should be invalidated in case of abort are different   from ones in case of commit.   In case of
commit,deletingold tuples should be invalidated for all   backends.   In case of abort,insert(updat)ing new tuples
shouldbe invalidated    for the insert(updat)ing backend.   Currently heap_insert() calls RelationInvalidateHeapTuple()
fora    inserting new tuple but heap_replace() doesn't call RelationInvalid-   ateHeapTuple() for a updating new tuple.
Idon't understand which   is right.
 

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp


Re: [HACKERS] couldn't rollback cache ?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> I think it's not done correctly for tuple SI messages either.
> I didn't use current cache invalidation mechanism when I made the
> patch for SearchSysCache() because of the following 2 reasons. 

> 1. SI messages are eaten by CommandCounterIncrement(). So they
>     may vanish before transaction end/aborts.

I think this is OK.  The sending backend does not send the SI message
in the first place until it has committed.  Other backends can read
the messages at CommandCounterIncrement; it doesn't matter whether the
other backends later commit or abort their own transactions.  I think.
Do you have a counterexample?

> 2. The tuples which should be invalidated in case of abort are different
>     from ones in case of commit.
>     In case of commit,deleting old tuples should be invalidated for all
>     backends.
>     In case of abort,insert(updat)ing new tuples should be invalidated 
>     for the insert(updat)ing backend.

I wonder whether it wouldn't be cleaner to identify the target tuples
by OID instead of ItemPointer.  That way would work for both new and
update tuples...

>     Currently heap_insert() calls RelationInvalidateHeapTuple() for a 
>     inserting new tuple but heap_replace() doesn't call RelationInvalid-
>     ateHeapTuple() for a updating new tuple. I don't understand which
>     is right.

Hmm.  Invalidating the old tuple is the right thing for heap_replace in
terms of sending a message to other backends at commit; it's the old
tuple that they might have cached and need to get rid of.  But for
getting rid of this backend's uncommitted new tuple in case of abort,
it's not the right thing.  OTOH, your change to add a time qual check
to SearchSysCache would fix that, wouldn't it?  But invalidating by OID
would make the issue moot.

Possibly heap_insert doesn't need to be calling
RelationInvalidateHeapTuple at all --- a new tuple can't be cached by
any other backend, by definition, until it has committed; so there's no
need to send out an SI message for it.  That call must be there to
ensure that the local cache gets purged of the tuple in case of abort.
Maybe we could remove that call (and reduce SI traffic) if we rely on
a time qual to purge bogus entries from the local caches after abort.
        regards, tom lane


RE: [HACKERS] couldn't rollback cache ?

From
"Hiroshi Inoue"
Date:
> -----Original Message-----
> From: Tom Lane [mailto:tgl@sss.pgh.pa.us]
> Sent: Monday, September 20, 1999 11:28 PM
> To: Hiroshi Inoue
> Cc: pgsql-hackers
> Subject: Re: [HACKERS] couldn't rollback cache ?
>
>
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> > I think it's not done correctly for tuple SI messages either.
> > I didn't use current cache invalidation mechanism when I made the
> > patch for SearchSysCache() because of the following 2 reasons.
>
> > 1. SI messages are eaten by CommandCounterIncrement(). So they
> >     may vanish before transaction end/aborts.
>
> I think this is OK.  The sending backend does not send the SI message
> in the first place until it has committed.  Other backends can read

Doesn't the sending backend send the SI message when Command-
CounterIncrement() is executed ?
AtCommit_Cache() is called not only from CommitTransaction() but
also from CommandCounterIncrement().

AtCommit_Cache() in CommandCounterIncrement() eats local
invalidation messages and register SI information (this seems too
early for other backends though it's not so harmful). Then AtAtart_
Cache() eats the SI information and invalidates related syscache
and relcache for the backend(this seems right). At this point,invali-
dation info for the backend vanishes. Isn't it right ?

> the messages at CommandCounterIncrement; it doesn't matter whether the
> other backends later commit or abort their own transactions.  I think.
> Do you have a counterexample?
>
> > 2. The tuples which should be invalidated in case of abort are different
> >     from ones in case of commit.
> >     In case of commit,deleting old tuples should be invalidated for all
> >     backends.
> >     In case of abort,insert(updat)ing new tuples should be invalidated
> >     for the insert(updat)ing backend.
>
> I wonder whether it wouldn't be cleaner to identify the target tuples
> by OID instead of ItemPointer.  That way would work for both new and
> update tuples...
>

This may be a better way because the cache entry which should be
invalidated are invalidated.
However,we may invalidate still valid cache entry by OID(it's not so
harmful).  Even time qualification is useless in this case.

> >     Currently heap_insert() calls RelationInvalidateHeapTuple() for a
> >     inserting new tuple but heap_replace() doesn't call RelationInvalid-
> >     ateHeapTuple() for a updating new tuple. I don't understand which
> >     is right.
>
> Hmm.  Invalidating the old tuple is the right thing for heap_replace in
> terms of sending a message to other backends at commit; it's the old
> tuple that they might have cached and need to get rid of.  But for
> getting rid of this backend's uncommitted new tuple in case of abort,
> it's not the right thing.  OTOH, your change to add a time qual check
> to SearchSysCache would fix that, wouldn't it?

Probably.  Because time qualification is applied for uncommitted tuples.

Regards.

Hiroshi Inoue
Inoue@tpf.co.jp



Re: [HACKERS] couldn't rollback cache ?

From
Tom Lane
Date:
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
>> I think this is OK.  The sending backend does not send the SI message
>> in the first place until it has committed.  Other backends can read

> Doesn't the sending backend send the SI message when Command-
> CounterIncrement() is executed ?
> AtCommit_Cache() is called not only from CommitTransaction() but
> also from CommandCounterIncrement().

Oooh, you are right.  I think that is a bug.  We should postpone
sending SI messages until commit.  Also, if I recall correctly,
CommandCounterIncrement() still gets called after we have decided to
abort the current transaction (while we are eating commands looking for
END or ABORT).  It shouldn't do anything to the pending-SI list then
either.

> AtCommit_Cache() in CommandCounterIncrement() eats local
> invalidation messages and register SI information (this seems too
> early for other backends though it's not so harmful). Then AtAtart_
> Cache() eats the SI information and invalidates related syscache
> and relcache for the backend(this seems right). At this point,invali-
> dation info for the backend vanishes. Isn't it right ?

I think it is OK for AtStart_Cache to read *incoming* SI messages,
if those relate to transactions that other backends have committed.
But we should sit on our own list of pending outgoing messages until
we know we are committing (or aborting).

>> I wonder whether it wouldn't be cleaner to identify the target tuples
>> by OID instead of ItemPointer.  That way would work for both new and
>> update tuples...

> This may be a better way because the cache entry which should be
> invalidated are invalidated.
> However,we may invalidate still valid cache entry by OID(it's not so
> harmful).  Even time qualification is useless in this case.

Doesn't bother me --- we'll just re-read it.  We'd have to do some work
in that case anyway to verify whether we have the correct copy of the
tuple.

>> OTOH, your change to add a time qual check
>> to SearchSysCache would fix that, wouldn't it?

> Probably.  Because time qualification is applied for uncommitted tuples.

One thing we need to think about here: as it stands, the syscache will
only store a single copy of any particular tuple (maybe I should say
"of a particular OID").  But there might be several copies of that tuple
with different t_min/t_max in the database.  With your change to check
time qual, as soon as we realize that the copy we have no longer
SatisfiesNow(), we'll go look for a new copy.  And we'll go look
for a new copy after receiving a SI message indicating someone else has
committed an update.  The question is, are there any *other* times where
we need to look for a new copy?  I think we are OK if we change SI
message sending to only send after commit, but I'm not sure about it.
        regards, tom lane


RE: [HACKERS] couldn't rollback cache ?

From
"Hiroshi Inoue"
Date:
> 
> "Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> >> I think this is OK.  The sending backend does not send the SI message
> >> in the first place until it has committed.  Other backends can read
> 
> > Doesn't the sending backend send the SI message when Command-
> > CounterIncrement() is executed ?
> > AtCommit_Cache() is called not only from CommitTransaction() but
> > also from CommandCounterIncrement().
> 
> Oooh, you are right.  I think that is a bug.  We should postpone
> sending SI messages until commit.  Also, if I recall correctly,
> CommandCounterIncrement() still gets called after we have decided to
> abort the current transaction (while we are eating commands looking for
> END or ABORT).  It shouldn't do anything to the pending-SI list then
> either.
> 
> > AtCommit_Cache() in CommandCounterIncrement() eats local
> > invalidation messages and register SI information (this seems too
> > early for other backends though it's not so harmful). Then AtAtart_
> > Cache() eats the SI information and invalidates related syscache
> > and relcache for the backend(this seems right). At this point,invali-
> > dation info for the backend vanishes. Isn't it right ?
> 
> I think it is OK for AtStart_Cache to read *incoming* SI messages,
> if those relate to transactions that other backends have committed.
> But we should sit on our own list of pending outgoing messages until
> we know we are committing (or aborting).
>

What about delet(updat)ing backend itself ?
Shouldn't the backend invalidate delet(updat)ing old tuples ?
> >> I wonder whether it wouldn't be cleaner to identify the target tuples
> >> by OID instead of ItemPointer.  That way would work for both new and
> >> update tuples...
>

[snip]
> One thing we need to think about here: as it stands, the syscache will
> only store a single copy of any particular tuple (maybe I should say
> "of a particular OID").  But there might be several copies of that tuple
> with different t_min/t_max in the database.  With your change to check
> time qual, as soon as we realize that the copy we have no longer
> SatisfiesNow(), we'll go look for a new copy.  And we'll go look
> for a new copy after receiving a SI message indicating someone else has
> committed an update.  The question is, are there any *other* times where
> we need to look for a new copy?  I think we are OK if we change SI
> message sending to only send after commit, but I'm not sure about it.
>

What kind of tuples are read into system catalog cache ?
And what should we do to invalidate the tuples just in time ?
As far as I see,

1. HEAP_XMIN_INVALID   i.e the tuple wasn't inserted    No backend regards this tuple as valid. 

2. HEAP_XMIN_??????? && HEAP_XMAX_INVALID   i.e the tuple is being inserted now.    Only inserting backend regards this
tupleas valid.   Time qualification check which my patch does would    work in this case.   Otherwise SI message should
besent to the backend    when insertion is rollbacked.
 

3. HEAP_XMIN_??????? && HEAP_XMAX_???????   i.e the tuple is being deleted after insertion in a transaction   now.
Nobackend regards this tuple as valid.
 

4. HEAP_XMIN_COMMITTED && HEAP_XMAX_INVALID   i.e the tuple is inserted,not deleted and not being deleted.
HeapTupleSatisifies..()doesn't take effect.   SI message should be sent to all backends immediately   after the tuple
wasdeleted.   SI message should be sent to a backend immediately after   the backend marked the tuple as being
deleted.

5. HEAP_XMIN_COMMITTED && HEAP_XMAX_???????   i.e the tuple is being deleted now.    Deleting backend doesn't regard
thistuple as valid.   If SI messages are postponed to send for other backends   until commit,the tuple is invalidated
correctly.  Otherwise a check as my patch does would be necessary.
 

6. HEAP_XMAX_COMMITTED   i.e the tuple is deleted   SnapshotNow never regard this tuple as valid.

Regards.
Hiroshi Inoue
Inoue@tpf.co.jp