Thread: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Simon Riggs
Date:
On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > Log Message: > > ----------- > > Tune GetSnapshotData() during Hot Standby by avoiding loop > > through normal backends. Makes code clearer also, since we > > avoid various Assert()s. Performance of snapshots taken > > during recovery no longer depends upon number of read-only > > backends. > > I think there's a little race condition there. > snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so > it's possible that by the time we acquire the lock, we're no longer in > recovery. So this can happen: > > 1. Backend A starts to take a snapshot, while we're still in recovery. > takenDuringRecovery is assigned true. > 2. Recovery ends, and a normal transaction X begins in backend B. > 3. A skips scanning ProcArray because takenDuringRecovery is true. > > The snapshot doesn't include X, so any changes done in that transaction > will not be visible to the snapshot while the transaction is still > running, but will be after it commits. > > Of course, it's highly improbable for 2. to happen, but it's there. The order of events is as you say, though I don't see the problem. The new xids will be beyond xmax and would be filtered out even if we did scan the procs, so they will be treated as running, which they are. Xmax will not have advanced since that relies on completed transactions, not started ones. -- Simon Riggs www.2ndQuadrant.com
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote: >> Simon Riggs wrote: >>> Log Message: >>> ----------- >>> Tune GetSnapshotData() during Hot Standby by avoiding loop >>> through normal backends. Makes code clearer also, since we >>> avoid various Assert()s. Performance of snapshots taken >>> during recovery no longer depends upon number of read-only >>> backends. >> I think there's a little race condition there. >> snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so >> it's possible that by the time we acquire the lock, we're no longer in >> recovery. So this can happen: >> >> 1. Backend A starts to take a snapshot, while we're still in recovery. >> takenDuringRecovery is assigned true. >> 2. Recovery ends, and a normal transaction X begins in backend B. >> 3. A skips scanning ProcArray because takenDuringRecovery is true. >> >> The snapshot doesn't include X, so any changes done in that transaction >> will not be visible to the snapshot while the transaction is still >> running, but will be after it commits. >> >> Of course, it's highly improbable for 2. to happen, but it's there. > > The order of events is as you say, though I don't see the problem. The > new xids will be beyond xmax and would be filtered out even if we did > scan the procs, so they will be treated as running, which they are. Xmax > will not have advanced since that relies on completed transactions, not > started ones. Good point. However, it is theoretically possible that yet another transaction starts and commits in that same window, bumping xmax. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Simon Riggs
Date:
On Mon, 2010-04-19 at 11:37 +0300, Heikki Linnakangas wrote: > Simon Riggs wrote: > > On Mon, 2010-04-19 at 10:36 +0300, Heikki Linnakangas wrote: > >> Simon Riggs wrote: > >>> Log Message: > >>> ----------- > >>> Tune GetSnapshotData() during Hot Standby by avoiding loop > >>> through normal backends. Makes code clearer also, since we > >>> avoid various Assert()s. Performance of snapshots taken > >>> during recovery no longer depends upon number of read-only > >>> backends. > >> I think there's a little race condition there. > >> snapshot->takenDuringRecovery is set before acquiring ProcArrayLock, so > >> it's possible that by the time we acquire the lock, we're no longer in > >> recovery. So this can happen: > >> > >> 1. Backend A starts to take a snapshot, while we're still in recovery. > >> takenDuringRecovery is assigned true. > >> 2. Recovery ends, and a normal transaction X begins in backend B. > >> 3. A skips scanning ProcArray because takenDuringRecovery is true. > >> > >> The snapshot doesn't include X, so any changes done in that transaction > >> will not be visible to the snapshot while the transaction is still > >> running, but will be after it commits. > >> > >> Of course, it's highly improbable for 2. to happen, but it's there. > > > > The order of events is as you say, though I don't see the problem. The > > new xids will be beyond xmax and would be filtered out even if we did > > scan the procs, so they will be treated as running, which they are. Xmax > > will not have advanced since that relies on completed transactions, not > > started ones. > > Good point. However, it is theoretically possible that yet another > transaction starts and commits in that same window, bumping xmax. If the snapshotting backend (#1) froze at the exact point between one CPU instruction and the next, during which time recovery ends, two new sessions connect, two new write transactions start (#2 and #3), they begin to write data and so assign xids, then write WAL, then the #3 writes commit record into WAL, flushes disk (maybe), updates clog and then is granted procarraylock in exclusive mode before snapshotting backend attempts to do so, while #2 keeps running. Yes, it does seem theoretically possible, at that one exact point in time, never to be repeated. It doesn't seem to be something we should place highly on the list of events we need protection from, does it? -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Robert Haas
Date:
On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > It doesn't seem to be something we should place highly on the list of > events we need protection from, does it? Since when do we not protect against race-conditions just because they're low likelihood? ...Robert
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Tom Lane
Date:
Robert Haas <robertmhaas@gmail.com> writes: > On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >> It doesn't seem to be something we should place highly on the list of >> events we need protection from, does it? > Since when do we not protect against race-conditions just because > they're low likelihood? Murphy's law says that the probability of any race condition happening in the field is orders of magnitude higher than you think. This has been proven true many times ... regards, tom lane
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Simon Riggs
Date:
On Mon, 2010-04-19 at 10:24 -0400, Tom Lane wrote: > Robert Haas <robertmhaas@gmail.com> writes: > > On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: > >> It doesn't seem to be something we should place highly on the list of > >> events we need protection from, does it? > > > Since when do we not protect against race-conditions just because > > they're low likelihood? > > Murphy's law says that the probability of any race condition happening > in the field is orders of magnitude higher than you think. This has > been proven true many times ... Choices are 1. Check RecoveryInProgress() once outside of lock, plus wild rumour of Murphy 2. Check RecoveryInProgress() before and after holding lock 3. Check RecoveryInProgress() while holding lock All of which perform better than 4. Revert patch -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Heikki Linnakangas
Date:
Simon Riggs wrote: > On Mon, 2010-04-19 at 10:24 -0400, Tom Lane wrote: >> Robert Haas <robertmhaas@gmail.com> writes: >>> On Mon, Apr 19, 2010 at 5:05 AM, Simon Riggs <simon@2ndquadrant.com> wrote: >>>> It doesn't seem to be something we should place highly on the list of >>>> events we need protection from, does it? >>> Since when do we not protect against race-conditions just because >>> they're low likelihood? >> Murphy's law says that the probability of any race condition happening >> in the field is orders of magnitude higher than you think. This has >> been proven true many times ... Right. And some future code changes elsewhere could make it more likely, by the time we've forgotten all about this. > Choices are > > 1. Check RecoveryInProgress() once outside of lock, plus wild rumour of > Murphy > > 2. Check RecoveryInProgress() before and after holding lock > > 3. Check RecoveryInProgress() while holding lock 4. Check RecoveryInProgress() once outside of lock, and scan the ProcArray anyway, just in case. That's what we did before this patch. Document that takenDuringRecovery == true means that the snapshot was most likely taken during recovery, but there is some race conditions where takenDuringRecovery is true even though the snapshot was taken just after recovery finished. AFAICS all of the other current uses of takenDuringRecovery work fine with that. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Simon Riggs
Date:
On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote: > > Choices are > > > > 1. Check RecoveryInProgress() once outside of lock, plus wild rumour of > > Murphy > > > > 2. Check RecoveryInProgress() before and after holding lock > > > > 3. Check RecoveryInProgress() while holding lock > > 4. Check RecoveryInProgress() once outside of lock, and scan the > ProcArray anyway, just in case. That's what we did before this patch. > Document that takenDuringRecovery == true means that the snapshot was > most likely taken during recovery, but there is some race conditions > where takenDuringRecovery is true even though the snapshot was taken > just after recovery finished. AFAICS all of the other current uses of > takenDuringRecovery work fine with that. Checking RecoveryInProgress() is much cheaper than scanning the whole ProcArray, so (4) is definitely worse than 1-3. -- Simon Riggs www.2ndQuadrant.com
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Tom Lane
Date:
Simon Riggs <simon@2ndQuadrant.com> writes: > On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote: >>> Choices are >>> 1. Check RecoveryInProgress() once outside of lock, plus wild rumour of >>> Murphy >>> 2. Check RecoveryInProgress() before and after holding lock >>> 3. Check RecoveryInProgress() while holding lock >> >> 4. Check RecoveryInProgress() once outside of lock, and scan the >> ProcArray anyway, just in case. That's what we did before this patch. >> Document that takenDuringRecovery == true means that the snapshot was >> most likely taken during recovery, but there is some race conditions >> where takenDuringRecovery is true even though the snapshot was taken >> just after recovery finished. AFAICS all of the other current uses of >> takenDuringRecovery work fine with that. > Checking RecoveryInProgress() is much cheaper than scanning the whole > ProcArray, so (4) is definitely worse than 1-3. If the lock we're talking about is an LWLock, #3 is okay. If it's a spinlock, not so much. regards, tom lane
Re: Re: [COMMITTERS] pgsql: Tune GetSnapshotData() during Hot Standby by avoiding loop
From
Simon Riggs
Date:
On Mon, 2010-04-19 at 10:55 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndQuadrant.com> writes: > > On Mon, 2010-04-19 at 17:44 +0300, Heikki Linnakangas wrote: > >>> Choices are > >>> 1. Check RecoveryInProgress() once outside of lock, plus wild rumour of > >>> Murphy > >>> 2. Check RecoveryInProgress() before and after holding lock > >>> 3. Check RecoveryInProgress() while holding lock > >> > >> 4. Check RecoveryInProgress() once outside of lock, and scan the > >> ProcArray anyway, just in case. That's what we did before this patch. > >> Document that takenDuringRecovery == true means that the snapshot was > >> most likely taken during recovery, but there is some race conditions > >> where takenDuringRecovery is true even though the snapshot was taken > >> just after recovery finished. AFAICS all of the other current uses of > >> takenDuringRecovery work fine with that. > > > Checking RecoveryInProgress() is much cheaper than scanning the whole > > ProcArray, so (4) is definitely worse than 1-3. > > If the lock we're talking about is an LWLock, #3 is okay. If it's a > spinlock, not so much. Just committed the following patch to implement option #3. We test RecoveryInProgress() after the LWLockAcquire rather than before. -- Simon Riggs www.2ndQuadrant.com