Excerpts from Tom Lane's message of lun sep 26 13:26:37 -0300 2011:
>
> yamt@mwd.biglobe.ne.jp (YAMAMOTO Takashi) writes:
> >> Maybe, but I'd still like to see a test case, because I can't reproduce
> >> any such problem by preparing ROLLBACK in an aborted transaction.
>
> > reading GetTransactionSnapshot, it seems that the problem happens
> > only with IsolationUsesXactSnapshot() true.
>
> Hmm. I'm inclined to think that this demonstrates a bug in snapshot
> management, not so much in plancache. We have plancache doing
>
> PushActiveSnapshot(GetTransactionSnapshot());
>
> and then later
>
> PopActiveSnapshot();
>
> and at this point surely it is not plancache's fault if there is any
> remaining refcount for the snapshot. There is, though, because
> GetTransactionSnapshot saved a refcount in TopTransactionResourceOwner.
> I think it's snapmgr.c's responsibility to make sure that that's cleaned
> up, and it's not doing so.
Agreed.
> The place where that refcount normally gets dropped is
> AtEarlyCommit_Snapshot, but that isn't going to be called at all in
> aborted-transaction cleanup. Worse, if we just transposed it over to be
> called in a place in AbortTransaction comparable to where it's called
> during commit, that still wouldn't fix the problem, because when the
> ROLLBACK happens, we've already aborted the transaction.
... ouch.
> I think that AtEarlyCommit_Snapshot is misdesigned, and that far from
> being done "early" in commit/abort, it needs to be done "late", like
> somewhere not very long before the
> ResourceOwnerDelete(TopTransactionResourceOwner) calls. There is no
> very good reason to think that someone might not ask for a snapshot
> during commit processing.
>
> Alvaro, do you happen to remember why this got designed as an "early"
> transaction shutdown action, rather than delaying it as long as
> possible?
As far as I remember, the only principle was that it had to run before
ResourceOwner cleanup. Commit 7b640b0345dc4fbd39ff568700985b432f6afa07
introduces that "early" call; ResOwner support had been introduced 10
days before in 6bbef4e5383c99d93aa974e2c79d328cfbd1c4a9. I probably
just tried it out and noticed that resowner.c complained if I didn't
drop the refcount prior to its own cleanup.
I don't think I ever considered the scenario of calls in aborted
transactions.
Shall I work on a fix? I expect you are plenty busy with commitfest
stuff, but please let me know otherwise.
--
Ãlvaro Herrera <alvherre@commandprompt.com>
The PostgreSQL Company - Command Prompt, Inc.
PostgreSQL Replication, Consulting, Custom Development, 24x7 support