Re: mvcc catalo gsnapshots and TopTransactionContext - Mailing list pgsql-hackers

From Andres Freund
Subject Re: mvcc catalo gsnapshots and TopTransactionContext
Date
Msg-id 20130806070659.GF31572@alap2.anarazel.de
Whole thread Raw
In response to Re: mvcc catalo gsnapshots and TopTransactionContext  (Noah Misch <noah@leadboat.com>)
Responses Re: mvcc catalo gsnapshots and TopTransactionContext  (Robert Haas <robertmhaas@gmail.com>)
Re: mvcc catalo gsnapshots and TopTransactionContext  (Noah Misch <noah@leadboat.com>)
List pgsql-hackers
On 2013-08-05 13:09:31 -0400, Noah Misch wrote:
> On Fri, Jul 12, 2013 at 11:42:23AM +0200, Andres Freund wrote:
> > On 2013-07-11 15:09:45 -0400, Tom Lane wrote:
> > > It never has been, and never will be, allowed to call the catcache code
> > > without being in a transaction.  What do you think will happen if the
> > > requested row isn't in cache?  A table access, that's what, and that
> > > absolutely requires being in a transaction.
> 
> > I'd like to add an Assert like in the attached patch making sure we're
> > in a transaction. Otherwise it's far too easy not to hit an error during
> > development because everything is cached - and syscache usage isn't
> > always obvious from the outside to the naive or the tired.
> 
> The following test case reliably hits this new assertion:
> 
> create table t (c int);
> begin;
> create index on t (c);
> savepoint q;
> insert into t values (1);
> select 1/0;
> 
> Stack trace:
> 
> #2  0x0000000000761159 in ExceptionalCondition (conditionName=<value optimized out>, errorType=<value optimized out>,
fileName=<valueoptimized out>, lineNumber=<value optimized out>) at assert.c:54
 
> #3  0x000000000074fc53 in SearchCatCache (cache=0xc53048, v1=139479, v2=0, v3=0, v4=0) at catcache.c:1072
> #4  0x000000000075c011 in SearchSysCache (cacheId=0, key1=369539, key2=6, key3=18446744073709551615, key4=0) at
syscache.c:909
> #5  0x0000000000757e57 in RelationReloadIndexInfo (relation=0x7f2dd5fffea0) at relcache.c:1770
> #6  0x000000000075817b in RelationClearRelation (relation=0x7f2dd5fffea0, rebuild=1 '\001') at relcache.c:1926
> #7  0x00000000007588c6 in RelationFlushRelation (relationId=139479) at relcache.c:2076
> #8  RelationCacheInvalidateEntry (relationId=139479) at relcache.c:2138
> #9  0x000000000075185d in LocalExecuteInvalidationMessage (msg=0xcde778) at inval.c:546
> #10 0x0000000000751185 in ProcessInvalidationMessages (hdr=0xc0d390, func=0x7517d0 <LocalExecuteInvalidationMessage>)
atinval.c:422
 
> #11 0x0000000000751a3f in AtEOSubXact_Inval (isCommit=0 '\000') at inval.c:973
> #12 0x00000000004cb986 in AbortSubTransaction () at xact.c:4250
> #13 0x00000000004cbf05 in AbortCurrentTransaction () at xact.c:2863
> #14 0x00000000006968f6 in PostgresMain (argc=1, argv=0x7fff7cf71610, dbname=0xc13b60 "test", username=0xbedc88 "nm")
atpostgres.c:3848
 
> #15 0x000000000064c2b5 in BackendRun () at postmaster.c:4044
> #16 BackendStartup () at postmaster.c:3733
> #17 ServerLoop () at postmaster.c:1571
> #18 0x000000000064fa8d in PostmasterMain (argc=1, argv=0xbe84a0) at postmaster.c:1227
> #19 0x00000000005e2dcd in main (argc=1, argv=0xbe84a0) at main.c:196
> 
> When we call AtEOSubXact_Inval() or AtEOXact_Inval() with a relation still
> open, we can potentially get a relcache rebuild and therefore a syscache
> lookup as shown above.  CommitSubTransaction() is also potentially affected,
> though I don't have an SQL-level test case for that.  It calls
> CommandCounterIncrement() after moving to TRANS_COMMIT.  That CCI had better
> find no invalidations of open relations, or we'll make syscache lookups.  (In
> simple tests, any necessary invalidations tend to happen at the CCI in
> CommitTransactionCommand(), so the later CCI does in fact find nothing to do.
> I have little confidence that should be counted upon, though.)

> How might we best rearrange things to avoid these hazards?

Ok. After a good bit of code reading, I think this isn't an actual bug
but an overzealous Assert(). I think it should be
Assert(IsTransactionBlock()) not Assert(IsTransactionState());

The reason for that is that when we do the AtEO(Sub)?Xact_Inval(), we've
already done a RecordTransactionAbort(true|false) and
CurrentTransactionState->state = TRANS_ABORT. So the visibility routines
have enough information to consider rows created by the aborted
transaction as invisible.

I am not really happy with the RelationReloadIndexInfo()s in
RelationClearRelation() when we're in an aborted state, especially as
the comment surrounding them are clearly out of date, but I don't see a
bug there anymore.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Craig Ringer
Date:
Subject: Re: System catalog vacuum issues
Next
From: Vlad Arkhipov
Date:
Subject: Re: System catalog vacuum issues