Re: mvcc catalo gsnapshots and TopTransactionContext - Mailing list pgsql-hackers

From Robert Haas
Subject Re: mvcc catalo gsnapshots and TopTransactionContext
Date
Msg-id CA+TgmoaevkKbAp=4QgWrX=pLscsROv4b0Gpuh1KNzwq-xX7TAA@mail.gmail.com
Whole thread Raw
In response to Re: mvcc catalo gsnapshots and TopTransactionContext  (Andres Freund <andres@2ndquadrant.com>)
Responses Re: mvcc catalo gsnapshots and TopTransactionContext  (Andres Freund <andres@2ndquadrant.com>)
List pgsql-hackers
On Fri, Aug 9, 2013 at 11:17 PM, Andres Freund <andres@2ndquadrant.com> wrote:
> On 2013-08-09 14:11:46 -0400, Tom Lane wrote:
>> Andres Freund <andres@2ndquadrant.com> writes:
>> > On 2013-08-08 09:27:24 -0400, Robert Haas wrote:
>> >> How can it be safe to try to read catalogs if the transaction is aborted?
>>
>> > Well. It isn't. At least not in general. The specific case triggered
>> > here though are cache invalidations being processed which can lead to
>> > the catalog being read (pretty crummy, but not easy to get rid
>> > of). That's actually safe since before we process the invalidations we
>> > have done:
>> > 1) CurrentTransactionState->state = TRANS_ABORT
>> > 2) RecordTransactionAbort(), marking the transaction as aborted in the
>> >   clog
>> > 3) marked subxacts as aborted
>> > 3) ProcArrayEndTransaction() (for toplevel ones)
>>
>> > Due to these any tqual stuff will treat the current (sub-)xact and it's
>> > children as aborted. So the catalog lookups will use the catalog in a
>> > sensible state.
>>
>> I don't have any faith in this argument.  You might be right that we'll
>> correctly see our own output rows as aborted, but that's barely the tip
>> of the iceberg of risk here.  Is it safe to take new locks in an aborted
>> transaction?  (What if we're already past the lock-release point in
>> the abort sequence?)
>
> Don't get me wrong. I find the idea of doing catalog lookup during abort
> horrid. But it's been that way for at least 10 years (I checked 7.4), so
> it has at least some resemblance of working.
> Today we do a good bit less than back then, for one we don't do a full
> cache reload during abort anymore, just for the index support
> infrastructure. Also, you've reduced the amount of lookups a bit with the
> relmapper introduction.
>
>> For that matter, given that we don't know what
>> exactly caused the transaction abort, how safe is it to do anything at
>> all --- we might for instance be nearly out of memory.  If the catalog
>> reading attempt itself fails, won't we be in an infinite loop of
>> transaction aborts?
>
> Looks like that's possible, yes. There seem to be quite some other
> opportunities for this to happen if you look at the amount of work done
> in AbortSubTransaction(). I guess it rarely happens because we
> previously release some memory...
>
>> I could probably think of ten more risks if I spent a few more minutes
>> at it.
>
> No need to convince me here. I neither could believe we were doing this,
> nor figure out why it even "works" for the first hour of looking at it.
>
>> Cache invalidation during abort should *not* lead to any attempt to
>> immediately revalidate the cache.  No amount of excuses will make that
>> okay.  I have not looked to see just what the path of control is in this
>> particular case, but we need to fix it, not paper over it.
>
> I agree, although that's easier said than done in the case of
> subtransactions. The problem we have there is that it's perfectly valid
> to still have references to a relation from the outer, not aborted,
> transaction. Those need to be valid for anybody looking at the relcache
> entry after we've processed the ROLLBACK TO/...
>
> I guess the fix is something like we do in the commit case, where we
> transfer invalidations to the parent transaction. If we then process
> local invalidations *after* we've cleaned up the subtransaction
> completely we should be fine. We already do an implicity
> CommandCounterIncrement() in CommitSubTransaction()...

Andres, are you (or is anyone) going to try to fix this assertion failure?

-- 
Robert Haas
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company



pgsql-hackers by date:

Previous
From: Peter Geoghegan
Date:
Subject: Re: Any reasons to not move pgstattuple to core?
Next
From: Andres Freund
Date:
Subject: Re: mvcc catalo gsnapshots and TopTransactionContext