Re: [HACKERS] couldn't rollback cache ? - Mailing list pgsql-hackers

From Tom Lane
Subject Re: [HACKERS] couldn't rollback cache ?
Date
Msg-id 29176.937576653@sss.pgh.pa.us
Whole thread Raw
In response to couldn't rollback cache ?  ("Hiroshi Inoue" <Inoue@tpf.co.jp>)
Responses RE: [HACKERS] couldn't rollback cache ?
List pgsql-hackers
"Hiroshi Inoue" <Inoue@tpf.co.jp> writes:
> But as far as I see,neither relation cache nor system catalog cache 
> aren't be rollbacked correctly.
> I added time_qualification_check to SearchSysCache() on trial
> (see the patch at the end of this posting).

Hmm.  This must be a bug of very long standing; surprising it hasn't
been noticed before.  I think you are probably right, because a little
glimpsing shows that SearchSysCache() is the *only* place in the whole
system where HeapKeyTest() is called directly --- everyone else goes
through HeapTupleSatisfies() which adds a timequal check of one sort or
another.  I don't know the timequal stuff at all, but it seems likely
that we want one here.  (Vadim, is this fix right?)

> After this change,
>     abort;
>     ABORT
>     select * from t1;
>     ERROR:  Relation t1 does not have attribute dt1

> Seems relation cache is not invalidated yet.
> I also tried to add time_qualification_check to RelationId(Name)-
> CacheGetRelation(). But unfortunately,Relation doesn't have
> such a information.

I think the real bug here is in inval.c: see
InvalidationMessageCacheInvalidate, which scans pending SI messages
at abort time.  If we had committed, we'd have sent an SI message
telling other backends to refresh their relcache entries for t1;
so there is an entry for t1 in the pending-SI-message list.  We can
use that entry to tell us to invalidate our own relcache entry instead.
It looks like this is done correctly for tuple SI messages but not for
relation SI messages --- and in fact the code sez        /* XXX ignore this--is this correct ??? */
Evidently not.  (BTW, please add some comments to this code!  It's
not obvious that what it's doing is throwing away cache entries that
have been changed by a transaction now being aborted.)

It would probably be a good idea to switch the order of the operations
in AtAbort_Cache() in xact.c, so that relcache reference counts get
reset to 0 before we do the pending-SI-message scan.  That way, we could
just discard the bogus relcache entry and not waste time rebuilding an
entry we might not need again.  (This might even be essential to avoid
an error in the aborted-table-create case; not sure.  The routine
RelationCacheAbort() didn't exist till last week, so the present call
order is certainly not gospel.)

We could probably do this in other ways too, like marking all relcache
entries with the transaction ID of their last change and using that to
detect what to throw away.  But the SI message queue is already there
so might as well use it...
        regards, tom lane


pgsql-hackers by date:

Previous
From: Hannu Krosing
Date:
Subject: Re: [HACKERS] Hierarchical query?
Next
From: Tom Lane
Date:
Subject: Re: [HACKERS] Bug