Thread: Strangeness in xid allocation / snapshot setup

Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
Hi Vadim,

I am trying to understand why GetSnapshotData() needs to acquire the
SInval spinlock before it calls ReadNewTransactionId, rather than after.
I see that you made it do so --- in the commit at
http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/storage/ipc/shmem.c.diff?r1=1.41&r2=1.42
but I don't understand why the loss of concurrency is "necessary".
Since we are going to treat all xids >= xmax as in-progress anyway,
what's wrong with reading xmax before we acquire the SInval lock?

Also, it seems to me that in GetNewTransactionId(), it's important
for MyProc->xid to be set before releasing XidGenLockId, not after.
Otherwise there is a narrow window for failure:

1. Process A calls GetNewTransactionId.  It allocates an xid of, say,
1234, and increments nextXid to 1235.  Just after releasing the
XidGenLock spinlock, but before it can set its MyProc->xid, control
swaps away from it.

2. Process B gets to run.  It runs GetSnapshotData.  It sees nextXid =
1235, and it does not see xid = 1234 in any backend's proc->xid.
Therefore, B will assume xid 1234 has already terminated, when it
hasn't.

Isn't this broken?  The problem would be avoided if GetNewTransactionId
sets MyProc->xid before releasing the spinlock, since then after
GetSnapshotData has called ReadNewTransactionId, we know that all older
XIDs that are still active are recorded in proc structures.

Comments?
        regards, tom lane


RE: Strangeness in xid allocation / snapshot setup

From
"Mikheev, Vadim"
Date:
> I am trying to understand why GetSnapshotData() needs to acquire the
> SInval spinlock before it calls ReadNewTransactionId, rather than after.
> I see that you made it do so --- in the commit at
>
http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/storage/ipc/sh
mem.c.diff?r1=1.41&r2=1.42
> but I don't understand why the loss of concurrency is "necessary".
> Since we are going to treat all xids >= xmax as in-progress anyway,
> what's wrong with reading xmax before we acquire the SInval lock?

AFAIR, I made so to prevent following:

1. Tx Old is running.
2. Tx S reads new transaction ID in GetSnapshotData() and swapped away  before SInval acquired.
3. Tx New gets new transaction ID, makes changes and commits.
4. Tx Old changes some row R changed by Tx New and commits.
5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and  Tx New *but* does not see *other* changes made
byTx New =>  Tx S reads unconsistent data.
 

---------

As for issue below - I don't remember why I decided that
it's not important and will need in some time to remember.

> Also, it seems to me that in GetNewTransactionId(), it's important
> for MyProc->xid to be set before releasing XidGenLockId, not after.
> Otherwise there is a narrow window for failure:
> 
> 1. Process A calls GetNewTransactionId.  It allocates an xid of, say,
> 1234, and increments nextXid to 1235.  Just after releasing the
> XidGenLock spinlock, but before it can set its MyProc->xid, control
> swaps away from it.
> 
> 2. Process B gets to run.  It runs GetSnapshotData.  It sees nextXid =
> 1235, and it does not see xid = 1234 in any backend's proc->xid.
> Therefore, B will assume xid 1234 has already terminated, when it
> hasn't.
> 
> Isn't this broken?  The problem would be avoided if 
> GetNewTransactionId
> sets MyProc->xid before releasing the spinlock, since then after
> GetSnapshotData has called ReadNewTransactionId, we know that 
> all older
> XIDs that are still active are recorded in proc structures.
> 
> Comments?
> 
>             regards, tom lane
> 


Re: Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> Since we are going to treat all xids >= xmax as in-progress anyway,
>> what's wrong with reading xmax before we acquire the SInval lock?

> AFAIR, I made so to prevent following:

> 1. Tx Old is running.
> 2. Tx S reads new transaction ID in GetSnapshotData() and swapped away
>    before SInval acquired.
> 3. Tx New gets new transaction ID, makes changes and commits.
> 4. Tx Old changes some row R changed by Tx New and commits.
> 5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and
>    Tx New *but* does not see *other* changes made by Tx New =>
>    Tx S reads unconsistent data.

Hmm, but that doesn't seem to have anything to do with the way that
GetSnapshotData operates.  If Tx New has an XID >= xmax read by Tx S'
GetSnapshotData, then Tx New will be considered uncommitted by S no
matter which order we get the locks in; it hardly matters whether Tx New
manages to physically commit before we finish building the snapshot for
S.  On the other side of the coin, if Tx New's XID < xmax for S, then
*with the GetNewTransactionId change that I want* we can be sure that
Tx New will be seen running by S when it does get the SInval lock
(unless New has managed to finish before S gets the lock, in which case
it's perfectly reasonable for S to treat it as committed or aborted).

Anyway, it seems to me that the possibility of inconsistent data is
inherent in the way we handle updated rows in Read Committed mode ---
you can always get to see a row that was emitted by a transaction you
don't see the other effects of.
        regards, tom lane


Re: Strangeness in xid allocation / snapshot setup

From
"Vadim Mikheev"
Date:
> > 1. Tx Old is running.
> > 2. Tx S reads new transaction ID in GetSnapshotData() and swapped away
> >    before SInval acquired.
> > 3. Tx New gets new transaction ID, makes changes and commits.
> > 4. Tx Old changes some row R changed by Tx New and commits.
> > 5. Tx S gets snapshot data and now sees R changed by *both* Tx Old and
> >    Tx New *but* does not see *other* changes made by Tx New =>
> >    Tx S reads unconsistent data.
>
> Hmm, but that doesn't seem to have anything to do with the way that
> GetSnapshotData operates.  If Tx New has an XID >= xmax read by Tx S'
> GetSnapshotData, then Tx New will be considered uncommitted by S no
> matter which order we get the locks in; it hardly matters whether Tx New

You forget about Tx Old! The point is that changes made by Tx Old *over*
Tx New' changes effectively make those Tx New' changes *visible* to
Tx S!

And this is not good: Tx New inserts PK and corresponding FK and commits;
Tx Old changes some field in row with that FK and commits - now Tx S will
see
FK row *but not PK one* (and what if Tx S was serializable Tx run by
pd_dump...)

SInval lock prevents Tx Old from commit (xact.c:CommitTransaction())  in
points 2. - 4. above and so Tx Old' changes will not be visible to Tx S.

> manages to physically commit before we finish building the snapshot for
> S.  On the other side of the coin, if Tx New's XID < xmax for S, then
> *with the GetNewTransactionId change that I want* we can be sure that
> Tx New will be seen running by S when it does get the SInval lock
> (unless New has managed to finish before S gets the lock, in which case
> it's perfectly reasonable for S to treat it as committed or aborted).

And this is how it worked (MyProc->xid was updated while holding
XXXGenLockId) in varsup.c from version 1.21 (Jun 1999) till
version 1.36 (Mar 2001) when you occasionally moved it outside
of locked code part:

http://www.ca.postgresql.org/cgi/cvsweb.cgi/pgsql/src/backend/access/transam
/varsup.c.diff?r1=1.35&r2=1.36

> Anyway, it seems to me that the possibility of inconsistent data is
> inherent in the way we handle updated rows in Read Committed mode ---
> you can always get to see a row that was emitted by a transaction you
> don't see the other effects of.

If I correctly understand meaning of "emitted" then sentence above is not
correct:
set of rows to be updated can only be shortened by concurrent transactions.
Yes, changes can be made over changes from concurrent transactions but only
for rows from original set defined by query snapshot and only if
concurrently
updated rows (from that set) satisfy query qual => a row must satisfy
snapshot
*and* query qual = double satisfaction guaranteed -:))
And let's remember that this behaviour is required for current RI
constraints
implementation.

Vadim




Re: Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
"Vadim Mikheev" <vmikheev@sectorbase.com> writes:
> You forget about Tx Old! The point is that changes made by Tx Old *over*
> Tx New' changes effectively make those Tx New' changes *visible* to
> Tx S!

Yes, but what's that got to do with the order of operations in
GetSnapshotData?  The scenario you describe can occur anyway.
Only if Tx Old is running in Read Committed mode, of course.
But if it is, then it's capable of deciding to update a row updated
by Tx New.  Whether Tx S's xmax value is before or after Tx New's ID
is not going to change the behavior of Tx Old.

> And this is how it worked (MyProc->xid was updated while holding
> XXXGenLockId) in varsup.c from version 1.21 (Jun 1999) till
> version 1.36 (Mar 2001) when you occasionally moved it outside
> of locked code part:

Okay, so that part was my error.  I've changed it back.

I'd still like to change GetSnapshotData to read the nextXid before
it acquires SInvalLock, though.  If we did that, it'd be safe to make
GetNewTransactionId be
SpinAcquire(XidGenLockId);xid = nextXid++;SpinAcquire(SInvalLockId);MyProc->xid =
xid;SpinRelease(SInvalLockId);SpinRelease(XidGenLockId);

which is really necessary if you want to avoid assuming that
TransactionIds can be fetched and stored atomically.

Two other changes I think are needed in this area:

* In AbortTransaction, the clearing of MyProc->xid and MyProc->xmin
should be moved down to after RecordTransactionAbort and surrounded
by acquire/release SInvalLock (to avoid atomic fetch/store assumption).

* In HeapTupleSatisfiesVacuum (new tqual.c routine I just made
yesterday, by extracting the tuple time qual checks from vacuum.c),
the order of checking for process status should be    TransactionIdIsInProgress    TransactionIdDidCommit
TransactionIdDidAbort
not the present    TransactionIdDidAbort    TransactionIdDidCommit    TransactionIdIsInProgress

The current way is wrong because if the other process is just in process
of committing, we can get

VACUUM                        other

TransactionIdDidAbort - no
TransactionIdDidCommit - no
                    RecordTransactionCommit();                    MyProc->xid = 0;

TransactionIdIsInProgress - no

whereupon vacuum decides that the other process crashed --- oops.  If
we test TransactionIdIsInProgress *first* in tqual, and xact.c records
commit or abort *before* clearing MyProc->xid, then we cannot have this
race condition where the xact is no longer considered in progress but
not seen to be committed/aborted either.

(Note: this bug is not a problem for existing VACUUM, since it can
never see any tuples from open transactions anyway.  But it will be
fatal for concurrent VACUUM.)

Comments?
        regards, tom lane


RE: Re: Strangeness in xid allocation / snapshot setup

From
"Mikheev, Vadim"
Date:
> > You forget about Tx Old! The point is that changes made by
> > Tx Old *over* Tx New' changes effectively make those Tx New'
> > changes *visible* to Tx S!
> 
> Yes, but what's that got to do with the order of operations in
> GetSnapshotData?  The scenario you describe can occur anyway.

Try to describe it step by step.

> Only if Tx Old is running in Read Committed mode, of course.
> But if it is, then it's capable of deciding to update a row updated
> by Tx New.  Whether Tx S's xmax value is before or after Tx New's ID
> is not going to change the behavior of Tx Old.

1.  I consider particular case when Tx S' xmax is before Tx New' ID.
1.1 For this case acquiring SInval lock before ReadNewTransactionId()   changes behavior of Tx Old: it postpones change
ofTx Old'   (and Tx New') MyProc->xid in xact.c:CommitTransaction(), so Tx S   will see Tx Old as running, ie Tx Old'
changeswill be invisible   to Tx S on the base of analyzing MyProc.xid-s, just like Tx New'   changes will be invisible
onthe base of analyzing next Tx ID.
 
2.  If you can find examples when current code is not able to provide   consistent snapshot of running (out of
interest)transactions   let's think how to fix code. Untill then my example shows why   we cannot move SInval lock
requestafter ReadNewTransactionId().
 

> I'd still like to change GetSnapshotData to read the nextXid before
> it acquires SInvalLock, though.  If we did that, it'd be safe to make
> GetNewTransactionId be
> 
>     SpinAcquire(XidGenLockId);
>     xid = nextXid++;
>     SpinAcquire(SInvalLockId);
>     MyProc->xid = xid;
>     SpinRelease(SInvalLockId);
>     SpinRelease(XidGenLockId);
> 
> which is really necessary if you want to avoid assuming that
> TransactionIds can be fetched and stored atomically.

To avoid that assumption one should add per MyProc spinlock.

Vadim


Re: Re: Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> 1.1 For this case acquiring SInval lock before ReadNewTransactionId()
>     changes behavior of Tx Old: it postpones change of Tx Old'
>     (and Tx New') MyProc->xid in xact.c:CommitTransaction(), so Tx S
>     will see Tx Old as running, ie Tx Old' changes will be invisible
>     to Tx S on the base of analyzing MyProc.xid-s, just like Tx New'
>     changes will be invisible on the base of analyzing next Tx ID.

Oh, now I get it: the point is to prevent Tx Old from exiting the set
of "still running" xacts as seen by Tx S.  Okay, it makes sense.
I'll try to add some documentation to explain it.

Given this, I'm wondering why we bother with having a separate
XidGenLock spinlock at all.  Why not eliminate it and use SInval
spinlock to lock GetNewTransactionId and ReadNewTransactionId?

What did you think about reordering the vacuum qual tests and
AbortTransaction sequence?

BTW, I'm starting to think that it would be really nice if we could
replace our spinlocks with not just a semaphore, but something that has
a notion of "shared" and "exclusive" lock requests.  For example,
if GetSnapshotData could use a shared lock on SInvalLock, it'd
improve concurrency.
        regards, tom lane


RE: Re: Strangeness in xid allocation / snapshot setup

From
"Mikheev, Vadim"
Date:
> Oh, now I get it: the point is to prevent Tx Old from exiting the set
> of "still running" xacts as seen by Tx S.  Okay, it makes sense.
> I'll try to add some documentation to explain it.

TIA! I had no time from '99 -:)

> Given this, I'm wondering why we bother with having a separate
> XidGenLock spinlock at all.  Why not eliminate it and use SInval
> spinlock to lock GetNewTransactionId and ReadNewTransactionId?

Reading all MyProc in GetSnashot may take long time - why disallow
new Tx to begin.

> What did you think about reordering the vacuum qual tests and
> AbortTransaction sequence?

Sorry, no time at the moment.

> BTW, I'm starting to think that it would be really nice if we could
> replace our spinlocks with not just a semaphore, but something that has
> a notion of "shared" and "exclusive" lock requests.  For example,
> if GetSnapshotData could use a shared lock on SInvalLock, it'd
> improve concurrency.

Yes, we already told about light lock manager (no deadlock detection etc).

Vadim


Re: Re: Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
>> Given this, I'm wondering why we bother with having a separate
>> XidGenLock spinlock at all.  Why not eliminate it and use SInval
>> spinlock to lock GetNewTransactionId and ReadNewTransactionId?

> Reading all MyProc in GetSnashot may take long time - why disallow
> new Tx to begin.

Because we need to synchronize?  It bothers me that we're assuming
that fetching/storing XIDs is atomic.  There's no possibility at all
of going to 8-byte XIDs as long as the code is like this.

I doubt that a spinlock per PROC structure would be a better answer,
either; the overhead of getting and releasing each lock would be
nontrivial, considering the small number of instructions spent at
each PROC in these routines.
        regards, tom lane


RE: Re: Strangeness in xid allocation / snapshot setup

From
"Mikheev, Vadim"
Date:
> > Isn't spinlock just a few ASM instructions?... on most platforms...
> 
> If we change over to something that supports read vs write locking,
> it's probably going to be rather more than that ... right now, I'm
> pretty dissatisfied with the performance of our spinlocks under load.

We shouldn't use light locks everywhere. Updating/reading MyProc.xid
is very good place to use simple spinlocks... or even better mutexes.

Vadim


Re: Re: Strangeness in xid allocation / snapshot setup

From
Tom Lane
Date:
"Mikheev, Vadim" <vmikheev@SECTORBASE.COM> writes:
> Isn't spinlock just a few ASM instructions?... on most platforms...

If we change over to something that supports read vs write locking,
it's probably going to be rather more than that ... right now, I'm
pretty dissatisfied with the performance of our spinlocks under load.
        regards, tom lane


RE: Re: Strangeness in xid allocation / snapshot setup

From
"Mikheev, Vadim"
Date:
> >> Given this, I'm wondering why we bother with having a separate
> >> XidGenLock spinlock at all.  Why not eliminate it and use SInval
> >> spinlock to lock GetNewTransactionId and ReadNewTransactionId?
> 
> > Reading all MyProc in GetSnashot may take long time - why disallow
> > new Tx to begin.
> 
> Because we need to synchronize?  It bothers me that we're assuming
> that fetching/storing XIDs is atomic.  There's no possibility at all
> of going to 8-byte XIDs as long as the code is like this.
> 
> I doubt that a spinlock per PROC structure would be a better answer,
> either; the overhead of getting and releasing each lock would be
> nontrivial, considering the small number of instructions spent at
> each PROC in these routines.

Isn't spinlock just a few ASM instructions?... on most platforms...

Vadim