Thread: WIP patch for latestCompletedXid method of computing snapshot xmax

WIP patch for latestCompletedXid method of computing snapshot xmax

From
Tom Lane
Date:
This patch implements Florian's idea about how to manage snapshot xmax
without the ugly and performance-losing tactic of taking XidGenLock and
ProcArrayLock at the same time.  I had to do a couple of slightly klugy
things to get bootstrap and prepared transactions to work, but on the
whole it seems at least as clean as the code we have now.  Comments?

            regards, tom lane



Attachment

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> This patch implements Florian's idea about how to manage snapshot xmax
> without the ugly and performance-losing tactic of taking XidGenLock and
> ProcArrayLock at the same time.  I had to do a couple of slightly klugy
> things to get bootstrap and prepared transactions to work, but on the
> whole it seems at least as clean as the code we have now.  Comments?

Just that it will be fascinating to see what effects this has on the
benchmarks.


--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
Tom Lane
Date:
Gregory Stark <stark@enterprisedb.com> writes:
> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>> This patch implements Florian's idea about how to manage snapshot xmax
>> without the ugly and performance-losing tactic of taking XidGenLock and
>> ProcArrayLock at the same time.  I had to do a couple of slightly klugy
>> things to get bootstrap and prepared transactions to work, but on the
>> whole it seems at least as clean as the code we have now.  Comments?

> Just that it will be fascinating to see what effects this has on the
> benchmarks.

Yeah, I was hoping to get some benchmarks before deciding whether it's
worth the risk of pushing this into 8.3.  I'm off trying pgbench now,
but if anyone wants to try something more serious like DBT2 ...

            regards, tom lane

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> This patch implements Florian's idea about how to manage snapshot xmax
> without the ugly and performance-losing tactic of taking XidGenLock and
> ProcArrayLock at the same time.  I had to do a couple of slightly klugy
> things to get bootstrap and prepared transactions to work, but on the
> whole it seems at least as clean as the code we have now.  Comments?

I'm a bit late, but still a few comments

1) I wonder why you made XidCacheRemoveRunningXids take the latestXid as
an extra argument, though - it should be able to figure that out on
it's own I'd have thought. Is that just for consistency, or did I miss
something?

2) I don't see how either the childXids array or the XID cache in the
    proc array could ever not be in ascending xid order. We do guarantee
    that a child xid is always greater than it's parent. This guarantee
    enables a few optimizations. First, as you say in the comments,
    finding the largest xid when committing becomes trivial. But more
    important, if we can assume that the proc array xid cache is always
    sorted, we can get ride of the exclusive lock during subxact abort.

    We will *always* remove the last n xids from the cache, so we just
    need to reset subxids.nxids, and not touch the actual array at all.
    (We might later set the now-unused entries to InvalidTransactionId,
    but thats only cosmetic).

Regarding 2) Removing subxids from the proc array cannnot effect xmin
    calculations, because the toplevel xid is always small than all it's
    children. So we only need to care about snapshots themselves.

    But they won't care much if they find an aborted sub xid in the
    proc array or now. If not, the xid is neither running nor committed,
    so it's assumes to have crashed. If it's in the snapshot, it's
    treated as in-progress.

So I believe the code could be simplified a bit if we just made it a rule
that both childXids and the cache in the proc array are always in ascending
xid order, and we could get rid of the  exclusive lock during subxact abort.

greetings, Florian Pflug


Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> 1) I wonder why you made XidCacheRemoveRunningXids take the latestXid as
> an extra argument, though - it should be able to figure that out on
> it's own I'd have thought. Is that just for consistency, or did I miss
> something?

Well, we were going to have to figure the latestXid in xact.c anyway in
the main-xact path, so I thought it was better to have it responsible
in both cases.  It's a judgment call though.

> 2) I don't see how either the childXids array or the XID cache in the
>     proc array could ever not be in ascending xid order. We do guarantee
>     that a child xid is always greater than it's parent.

It's the relationships among siblings that worry me.  It's likely that
we could guarantee this with a bit of care and some Asserts in xact.c's
xid-list manipulations, but it's not quite obvious that it must be so
--- for instance using an lcons instead of lappend might break it.

>     This guarantee
>     enables a few optimizations. First, as you say in the comments,
>     finding the largest xid when committing becomes trivial. But more
>     important, if we can assume that the proc array xid cache is always
>     sorted, we can get ride of the exclusive lock during subxact abort.

>     We will *always* remove the last n xids from the cache, so we just
>     need to reset subxids.nxids, and not touch the actual array at all.
>     (We might later set the now-unused entries to InvalidTransactionId,
>     but thats only cosmetic).

I'm not convinced that that works when the subxids cache has overflowed
earlier in the transaction.

In any case, the bottom line is that it's very unlikely that that code
costs enough to be worth optimizing.  When you think about the total
number of cycles that have been expended on each subtransaction
(especially one that has an xid), another comparison or two is lost in
the noise.  As long as the code isn't O(N^2) ... which it isn't AFAICS
... I would rather keep it simple and robust.

            regards, tom lane

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> 2) I don't see how either the childXids array or the XID cache in the proc
>> array could ever not be in ascending xid order. We do guarantee that a
>> child xid is always greater than it's parent.
>
> It's the relationships among siblings that worry me.  It's likely that we
> could guarantee this with a bit of care and some Asserts in xact.c's xid-list
> manipulations, but it's not quite obvious that it must be so --- for instance
> using an lcons instead of lappend might break it.

Hm.. AFAICT, a transaction can only ever have currently "live" subtransaction.
In other words, for any node in the transaction tree, all subtrees except for
one are always aborted.

Since we do guarantee that a node has an xid smaller than the of any node below,
it seems the only place we have to be careful at is AtSubCommit_childXids.

>
>> This guarantee enables a few optimizations. First, as you say in the
>> comments, finding the largest xid when committing becomes trivial. But more
>>  important, if we can assume that the proc array xid cache is always
>> sorted, we can get ride of the exclusive lock during subxact abort.
>
>> We will *always* remove the last n xids from the cache, so we just need to
>> reset subxids.nxids, and not touch the actual array at all. (We might later
>> set the now-unused entries to InvalidTransactionId, but thats only
>> cosmetic).
>
> I'm not convinced that that works when the subxids cache has overflowed
> earlier in the transaction.

It should. AFAICS, the subxid cache always shows the xids along the "path"
of the transaction tree that leads to the currently active subtransaction.
If it hasn't overflowed, that is. If it has, the list is cut off at some
point. But being cut-off will only make the number of elements we remove
smaller, but not change that we remove some numer of last elements.

> In any case, the bottom line is that it's very unlikely that that code costs
> enough to be worth optimizing.  When you think about the total number of
> cycles that have been expended on each subtransaction (especially one that
> has an xid), another comparison or two is lost in the noise.  As long as the
> code isn't O(N^2) ... which it isn't AFAICS ... I would rather keep it simple
> and robust.

The real win is that because of the guaranteed order, removing xids from the
cache gets less messy - it reduces to just changing subxids.nxids.Which in turn
allows us to only hold a *shared* proc array lock while doing the removal. To
quote my original mail

"Removing subxids from the proc array cannnot effect xmin
  calculations, because the toplevel xid is always small than all it's
  children. So we only need to care about snapshots themselves.

  But they won't care much if they find an aborted sub xid in the
  proc array or now. If not, the xid is neither running nor committed,
  so it's assumes to have crashed. If it's in the snapshot, it's
  treated as in-progress."

greetings, Florian Pflug

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
Tom Lane
Date:
"Florian G. Pflug" <fgp@phlo.org> writes:
> This guarantee enables a few optimizations. First, as you say in the
> comments, finding the largest xid when committing becomes trivial. But more
> important, if we can assume that the proc array xid cache is always
> sorted, we can get ride of the exclusive lock during subxact abort.

No, we can't, because subxact abort still has to advance latestCompletedXid.

            regards, tom lane

Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
"Florian G. Pflug"
Date:
Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> This guarantee enables a few optimizations. First, as you say in the
>> comments, finding the largest xid when committing becomes trivial. But more
>> important, if we can assume that the proc array xid cache is always
>> sorted, we can get ride of the exclusive lock during subxact abort.
>
> No, we can't, because subxact abort still has to advance latestCompletedXid.

I don't think so. I didn't notice that when initially reading your patch -
but it seems that updating latestCompletedXid during subxact abort is actually
worsening performance (only very slightly, though).

The whole point of updating latestCompletedXid when aborting a toplevel xact
is to prevent the following corner case.
.) Some transaction commits
.) Only readonly transactions
    (all xmins = xmaxs = latestCompletedXid+1)
.) One large Transaction that Aborts
    (xid = GetNewTransactionId() = latestCompletedXid+1)
.) Only readonly transactions for a long period again.
    all xmins = xmaxs = latestCompletedXid+1

If the ABORT didn't update latestCompletedXid, than we'd not be able to remove
rows created by that one large transactions (which aborted), because all the
xmins would still be latestCompletedXid+1, which is not > the xid of the
aborted transaction.

So we updating latestCompletedXid is only *necessary* durings COMMITS - during
ABORTS it's just done for efficiency reasons.

But for subtransactions, there is no efficiency gain at all, because the
toplevel xid already bounds the xmin. In fact, updating latestCompletedXid
during subxact moves the snapshot's xmax unnecessary far into the future,
which leads to larger snasphots, meaning more cycles spent to scan them.

I do feel that I explained the idea rather badly though in my initial
response to your patch - I'll post (hopefully) better explanation to
the hackers list shortly.

greetings, Florian Pflug



Re: WIP patch for latestCompletedXid method of computing snapshot xmax

From
Gregory Stark
Date:
"Tom Lane" <tgl@sss.pgh.pa.us> writes:

> Gregory Stark <stark@enterprisedb.com> writes:
>> "Tom Lane" <tgl@sss.pgh.pa.us> writes:
>>> This patch implements Florian's idea about how to manage snapshot xmax
>>> without the ugly and performance-losing tactic of taking XidGenLock and
>>> ProcArrayLock at the same time.  I had to do a couple of slightly klugy
>>> things to get bootstrap and prepared transactions to work, but on the
>>> whole it seems at least as clean as the code we have now.  Comments?
>
>> Just that it will be fascinating to see what effects this has on the
>> benchmarks.
>
> Yeah, I was hoping to get some benchmarks before deciding whether it's
> worth the risk of pushing this into 8.3.  I'm off trying pgbench now,
> but if anyone wants to try something more serious like DBT2 ...

I ran some DBT2 tests but haven't been able to show any performance effects
either in average or worst-case response times.

I think that's for a few reasons:

1) This is only a dual-processor machine I'm playing with and we scale well on
   two processors already.

2) TPC-C doesn't have many read-only transactions

3) The deadlocks I found earlier cause enough noise in the response times to
   hide any subtler effects.

We may have to wait until the next time Sun runs their 1,000-process monster
Java benchmark to see if it helps there.

--
  Gregory Stark
  EnterpriseDB          http://www.enterprisedb.com