Re: Reducing Transaction Start/End Contention - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Reducing Transaction Start/End Contention |
Date | |
Msg-id | 200803260150.m2Q1ouX06621@momjian.us Whole thread Raw |
In response to | Reducing Transaction Start/End Contention ("Simon Riggs" <simon@2ndquadrant.com>) |
List | pgsql-hackers |
Added to TODO: > * Consider transaction start/end performance improvements > > http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php > http://archives.postgresql.org/pgsql-hackers/2008-03/msg00361.php --------------------------------------------------------------------------- Simon Riggs wrote: > Jignesh Shah's scalability testing on Solaris has revealed further > tuning opportunities surrounding the start and end of a transaction. > Tuning that should be especially important since async commit is likely > to allow much higher transaction rates than were previously possible. > > There is strong contention on the ProcArrayLock in Exclusive mode, with > the top path being CommitTransaction(). This becomes clear as the number > of connections increases, but it seems likely that the contention can be > caused in a range of other circumstances. My thoughts on the causes of > this contention are that the following 3 tasks contend with each other > in the following way: > > CommitTransaction(): takes ProcArrayLock Exclusive > but only needs access to one ProcArray element > > waits for > > GetSnapshotData():ProcArrayLock Shared > ReadNewTransactionId():XidGenLock Shared > > which waits for > > GetNextTransactionId() > takes XidGenLock Exclusive > ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive > two possible place where I/O is required > ExtendSubtrans(): takes SubtransControlLock() > one possible place where I/O is required > Avoids lock on ProcArrayLock: atomically updates one ProcArray element > > > or more simply: > > CommitTransaction() -- i.e. once per transaction > waits for > GetSnapshotData() -- i.e. once per SQL statement > which waits for > GetNextTransactionId() -- i.e. once per transaction > > This gives some goals for scalability improvements and some proposals. > (1) and (2) are proposals for 8.3 tuning, the others are directions for > further research. > > > Goal: Reduce total time that GetSnapshotData() waits for > GetNextTransactionId() > > 1. Increase size of Clog-specific BLCKSZ > Clog currently uses BLCKSZ to define the size of clog buffers. This can > be changed to use CLOG_BLCKSZ, which would then be set to 32768. > This will naturally increase the amount of memory allocated to the clog, > so we need not alter CLOG_BUFFERS above 8 if we do this (as previously > suggested, with successful results). This will also reduce the number of > ExtendClog() calls, which will probably reduce the overall contention > also. > > 2. Perform ExtendClog() as a background activity > Background process can look at the next transactionid once each cycle > without holding any lock. If the xid is almost at the point where a new > clog page would be allocated, then it will allocate one prior to the new > page being absolutely required. Doing this as a background task would > mean that we do not need to hold the XidGenLock in exclusive mode while > we do this, which means that GetSnapshotData() and CommitTransaction() > would also be less likely to block. Also, if any clog writes need to be > performed when the page is moved forwards this would also be performed > in the background. > > 3. Consider whether ProcArrayLock should use a new queued-shared lock > mode that puts a maximum wait time on ExclusiveLock requests. It would > be fairly hard to implement this well as a timer, but it might be > possible to place a limit on queue length. i.e. allow Share locks to be > granted immediately if a Shared holder already exists, but only if there > is a queue of no more than N exclusive mode requests queued. This might > prevent the worst cases of exclusive lock starvation. > > 4. Since shared locks are currently queued behind exclusive requests > when they cannot be immediately satisfied, it might be worth > reconsidering the way LWLockRelease works also. When we wake up the > queue we only wake the Shared requests that are adjacent to the head of > the queue. Instead we could wake *all* waiting Shared requestors. > > e.g. with a lock queue like this: > (HEAD) S<-S<-X<-S<-X<-S<-X<-S > Currently we would wake the 1st and 2nd waiters only. > > If we were to wake the 3rd, 5th and 7th waiters also, then the queue > would reduce in length very quickly, if we assume generally uniform > service times. (If the head of the queue is X, then we wake only that > one process and I'm not proposing we change that). That would mean queue > jumping right? Well thats what already happens in other circumstances, > so there cannot be anything intrinsically wrong with allowing it, the > only question is: would it help? > > We need not wake the whole queue, there may be some generally more > beneficial heuristic. The reason for considering this is not to speed up > Shared requests but to reduce the queue length and thus the waiting time > for the Xclusive requestors. Each time a Shared request is dequeued, we > effectively re-enable queue jumping, so a Shared request arriving during > that point will actually jump ahead of Shared requests that were unlucky > enough to arrive while an Exclusive lock was held. Worse than that, the > new incoming Shared requests exacerbate the starvation, so the more > non-adjacent groups of Shared lock requests there are in the queue, the > worse the starvation of the exclusive requestors becomes. We are > effectively randomly starving some shared locks as well as exclusive > locks in the current scheme, based upon the state of the lock when they > make their request. The situation is worst when the lock is heavily > contended and the workload has a 50/50 mix of shared/exclusive requests, > e.g. serializable transactions or transactions with lots of > subtransactions. > > > Goal: Reduce the total time that CommitTransaction() waits for > GetSnapshotData() > > 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do > this, we split the ProcArrayLock into multiple partitions (as suggested > by Alvaro). There are comments in GetNewTransactionId() about having one > spinlock per ProcArray entry. This would be too many and we could reduce > contention by having one lock for each N ProcArray entries. Since we > don't see too much contention with 100 users (default) it would seem > sensible to make N ~ 120. Striped or contiguous? If we stripe the lock > partitions then we will need multiple partitions however many users we > have connected, whereas using contiguous ranges would allow one lock for > low numbers of users and yet enough locks for higher numbers of users. > > 6. Reduce the number of times ProcArrayLock is called in Exclusive mode. > To do this, optimise group commit so that all of the actions for > multiple transactions are executed together: flushing WAL, updating CLOG > and updating ProcArray, whenever it is appropriate to do so. There's no > point in having a group commit facility that optimises just one of those > contention points when all 3 need to be considered. That needs to be > done as part of a general overhaul of group commit. This would include > making TransactionLogMultiUpdate() take CLogControlLock once for each > page that it needs to access, which would also reduce contention from > TransactionIdCommitTree(). > > (1) and (2) can be patched fairly easily for 8.3. I have a prototype > patch for (1) on the shelf already from 6 months ago. > > (3), (4) and (5) seem like changes that would require significant > testing time to ensure we did it correctly, even though the patches > might be fairly small. I'm thinking this is probably an 8.4 change, but > I can get test versions out fairly quickly I think. > > (6) seems definitely an 8.4 change. > > -- > Simon Riggs > EnterpriseDB http://www.enterprisedb.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: