Thread: A small mistake in the initial latestCompletedXid idea

A small mistake in the initial latestCompletedXid idea

From

"Florian G. Pflug"

Date:

12 September 2007, 14:16:25

Hi

When I initially proposed to use the latest *committed* xid as the xmax instead
of ReadNewTransactionId(), I believed that this would cause tuples created by a
later aborted transaction not to be vacuumed until another transaction (with a
higher xid) commits later. The idea was therefore modified to store the latest
*completed* xid, instead of the latest committed one.

I just realized that my fear was unjustified. AFAICS, VACUUM will aways remove
tuples created by aborted transactions, even if the xid is >= OldestXmin.

Therefore, I suggest that we rename latestCompletedXid to latestCommittedXid,
and update it only on commits. Admittedly, this won't bring any measurable
performance benefit in itself (it will slightly reduce the average snapshot
size, though), but not doing so might stand in the way of possible future
optimizations in that area.

I'll submit a patch to the patches list shortly.

greetings, Florian Pflug

Re: A small mistake in the initial latestCompletedXid idea

From

Tom Lane

Date:

12 September 2007, 15:54:21

"Florian G. Pflug" <fgp@phlo.org> writes:
> Therefore, I suggest that we rename latestCompletedXid to latestCommittedXid,
> and update it only on commits. Admittedly, this won't bring any measurable
> performance benefit in itself (it will slightly reduce the average snapshot
> size, though), but not doing so might stand in the way of possible future
> optimizations in that area.

This is a bad idea.  As you say, it doesn't directly save anything,
and the downside is that it may result in RecentGlobalXmin not moving
forward.  Consider a situation where there's a long string of aborts and
nary a commit.  latestCommittedXid won't advance, therefore each new
transaction continues to compute xmin = xmax = latestCommittedXid+1,
and so the window between global xmin and the newest active XIDs gets
wider and wider.  That puts performance stress on pg_clog and
pg_subtrans buffers --- if it goes on long enough, we get into a context
swap storm caused by pg_subtrans buffer thrashing.  We need to be sure
that xmin/xmax move forward when XIDs exit the ProcArray, whether they
commit or not.

Your post made me think for awhile about whether we really need to
serialize aborts at all.  From a transactional correctness standpoint
I think maybe we don't, but again the difficulty is with xmin tracking.
If an aborting xact can remove its XID from ProcArray without locking,
then it is possible that two concurrent scans of ProcArray arrive at
different xmin values, which means that GetOldestXmin might deliver
an incorrectly large answer, and that's fatal.  (One of the possible
consequences is truncating pg_subtrans too soon, but I believe there
are other ones too.)

Subtransactions don't affect xmin, of course, so there may be an
argument here that we don't have to do this stuff for a subtransaction
abort.  But I remain unconvinced that optimizing subtransaction abort
will really buy a performance gain worth taking any risk for.
        regards, tom lane

Re: A small mistake in the initial latestCompletedXid idea

From

"Florian G. Pflug"

Date:

12 September 2007, 16:49:09

Tom Lane wrote:
> "Florian G. Pflug" <fgp@phlo.org> writes:
>> Therefore, I suggest that we rename latestCompletedXid to latestCommittedXid,
>> and update it only on commits. Admittedly, this won't bring any measurable
>> performance benefit in itself (it will slightly reduce the average snapshot
>> size, though), but not doing so might stand in the way of possible future
>> optimizations in that area.
> 
> This is a bad idea.  As you say, it doesn't directly save anything,
> and the downside is that it may result in RecentGlobalXmin not moving
> forward.  Consider a situation where there's a long string of aborts and
> nary a commit.  latestCommittedXid won't advance, therefore each new
> transaction continues to compute xmin = xmax = latestCommittedXid+1,
> and so the window between global xmin and the newest active XIDs gets
> wider and wider.  That puts performance stress on pg_clog and
> pg_subtrans buffers --- if it goes on long enough, we get into a context
> swap storm caused by pg_subtrans buffer thrashing.  We need to be sure
> that xmin/xmax move forward when XIDs exit the ProcArray, whether they
> commit or not.

Hm.. Ok.. I see your point.

Maybe we should then make your initial argument against this hold, by
adding a check for xid > latestCompletedXid into TransactionIdIsInProgress.
That is cheap, might save some unnecessary proc array scanning, and justifies
advancing latestCommittedXid during subxact abort (where your argument
above doesn't hold, as you say yourself later on).

> Your post made me think for awhile about whether we really need to
> serialize aborts at all.  From a transactional correctness standpoint
> I think maybe we don't, but again the difficulty is with xmin tracking.
Agreed - For transaction correctness, aborted and in-progress transactions
are similar enough that it doesn't matter much in which pot our snapshot
puts them. They cases where the difference matters rechecks with
TransactionIdIsInProgress anyway and/or waits on the xid's lock.

> If an aborting xact can remove its XID from ProcArray without locking,
> then it is possible that two concurrent scans of ProcArray arrive at
> different xmin values, which means that GetOldestXmin might deliver
> an incorrectly large answer, and that's fatal.  (One of the possible
> consequences is truncating pg_subtrans too soon, but I believe there
> are other ones too.)
I'm not yet sure if that deviation in the xmin calculations poses any
real risk, or not. After all, even the transaction ending up with the
larger xmin won't actually see xids between the smaller and the larger
xid as in-progress. I haven't yet been able to come up with either a
counterexample, or an argument that this is indeed safe.

> Subtransactions don't affect xmin, of course, so there may be an
> argument here that we don't have to do this stuff for a subtransaction
> abort.  But I remain unconvinced that optimizing subtransaction abort
> will really buy a performance gain worth taking any risk for.
That depends largely on the workload, I would think. If there is any
benefit, I'd expect to see it for workloads involving deeply nested
BEGIN/END/EXCEPTION blocks. Especially because we currently roll back
each sub-transaction seperatly AFAICS, meaning we might take that
exclusive lock many times in short succession.

greetings, Florian Pflug