Re: Snapshot management, final - Mailing list pgsql-patches

From Tom Lane
Subject Re: Snapshot management, final
Date
Msg-id 19801.1210109194@sss.pgh.pa.us
Whole thread Raw
In response to Re: Snapshot management, final  (Alvaro Herrera <alvherre@commandprompt.com>)
Responses Re: Snapshot management, final  (Tom Lane <tgl@sss.pgh.pa.us>)
Re: Snapshot management, final  (Alvaro Herrera <alvherre@commandprompt.com>)
Re: Snapshot management, final  (Alvaro Herrera <alvherre@commandprompt.com>)
List pgsql-patches
Alvaro Herrera <alvherre@commandprompt.com> writes:
> Simon Riggs wrote:
>> On Tue, 2008-04-22 at 15:49 -0400, Alvaro Herrera wrote:
>>> - Three CopySnapshot call sites remain outside snapmgr.c: DoCopy() on
>>> copy.c, ExplainOnePlan() on explain.c and _SPI_execute_plan() on spi.c.
>>> They are there because they grab the current ActiveSnapshot, modify it,
>>> and then use the resulting snapshot.  There is no corresponding
>>> FreeSnapshot, because it's not needed.
>>
>> Not needed? How can we be certain that the modified snapshot does not
>> outlive its original source?

> It's not CopySnapshot that's not needed, but FreeSnapshot.  The point
> here is that the snapshot will be freed automatically as soon as it is
> PopActiveSnapshot'd out of existance.  CopySnapshot creates a new,
> separate copy of the passed snapshot, and each of them will be freed
> (separately) as soon as their refcounts reach zero.

I looked over this patch and I think it still needs work.  I concur with
Simon's discomfort about the external CopySnapshot calls: they are
reference leaks waiting to happen.  As an example, you have the pattern

    snap = CopySnapshot(...);
    do some stuff;
    RegisterSnapshot();

but what if the "stuff" fails?  In most of these code paths there's
at least one palloc in between, so a failure is certainly possible.
In the current form of the patch, a failure might not cost more than
some leaked memory in TopTransactionContext, but it's still pretty
ugly.  I think it would be best if it were simply not possible to do
CopySnapshot without simultaneously putting the snap into either the
registered or active lists.

In the COPY and EXPLAIN cases I think you misunderstood the point
of the original code anyway: the modified snapshot is still the active
snapshot for the duration of the command.  So I think the right approach
for both of these is the equivalent of what you put into spi.c:

                snap = CopySnapshot(snapshot);
                if (!read_only)
                    snap->curcid = GetCurrentCommandId(false);
                PushActiveSnapshot(snap);

and then a Pop at the end.  It might be worth encapsulating the above
series as a single copy-modify-and-push subroutine in snapmgr.c,
so that you don't have to expose CopySnapshot publicly, nor expose
adjustment of a snap's curcid outside snapmgr.

Also, I don't think the subtransaction management is correct at all.
How can it handle cases where a subtransaction and a sub-sub-transaction
both take out reference counts on the same snap?  If the sub-sub-xact
aborts, you have no way to determine how many refcounts should go away.

I think it would work better to keep the subxact level indicators in
the ActiveSnapshotElt/RegdSnapshotElt list members.  This would mean
multiple RegdSnapshotElt members pointing at the same snapshot in
any case where two different subxact levels actually had refs to the
same snapshot.  The RegdSnapshotElt members would each count
registration counts for their own subtransaction, and the underlying
snapshot would just count how many RegdSnapshotElts were pointing at it.
This would allow the same amount of verification in subxact commit
as you have in xact commit, ie there should be no counts left for the
current subtransaction.

Also, I think that the whole snapshot-sharing mechanism is not working
as intended except for the serializable case; otherwise sequences
like
    x = RegisterSnapshot(GetTransactionSnapshot());
    y = RegisterSnapshot(GetTransactionSnapshot());
will result in x and y being separate copies.  Or are you assuming
that this just isn't worth optimizing?

Some minor points:

It seems odd that snapshot cleanup is late in main xact commit/abort and
early in subxact commit/abort.  Unless there's a really good reason for
that, keep the ordering of module cleanup calls consistent across the
various cases in xact.c.

GetSnapshotData's serializable parameter is obsolete and should be removed.

I'm not entirely comfortable with the reference counts being only uint16
wide.  int32 seems safer and it isn't really saving anything much.

Push/PopActiveSnap leaks the ActiveSnapshotElt.

PopActiveSnapshot should probably assert active_count > 0 before
decrementing, likewise in UnregisterSnapshot.

            regards, tom lane

pgsql-patches by date:

Previous
From: "Heikki Linnakangas"
Date:
Subject: Re: Verified fix for Bug 4137
Next
From: Tom Lane
Date:
Subject: Re: Snapshot management, final