Re: Reducing Transaction Start/End Contention - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: Reducing Transaction Start/End Contention |
Date | |
Msg-id | 200803120023.m2C0NPw02466@momjian.us Whole thread Raw |
In response to | Re: Reducing Transaction Start/End Contention (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Reducing Transaction Start/End Contention
|
List | pgsql-hackers |
Is this still a TODO? --------------------------------------------------------------------------- Simon Riggs wrote: > On Mon, 2007-07-30 at 20:20 +0100, Simon Riggs wrote: > > > Jignesh Shah's scalability testing on Solaris has revealed further > > tuning opportunities surrounding the start and end of a transaction. > > Tuning that should be especially important since async commit is likely > > to allow much higher transaction rates than were previously possible. > > > > There is strong contention on the ProcArrayLock in Exclusive mode, with > > the top path being CommitTransaction(). This becomes clear as the number > > of connections increases, but it seems likely that the contention can be > > caused in a range of other circumstances. My thoughts on the causes of > > this contention are that the following 3 tasks contend with each other > > in the following way: > > > > CommitTransaction(): takes ProcArrayLock Exclusive > > but only needs access to one ProcArray element > > > > waits for > > > > GetSnapshotData():ProcArrayLock Shared > > ReadNewTransactionId():XidGenLock Shared > > > > which waits for > > > > GetNextTransactionId() > > takes XidGenLock Exclusive > > ExtendCLOG(): takes ClogControlLock Exclusive, WALInsertLock Exclusive > > two possible place where I/O is required > > ExtendSubtrans(): takes SubtransControlLock() > > one possible place where I/O is required > > Avoids lock on ProcArrayLock: atomically updates one ProcArray element > > > > > > or more simply: > > > > CommitTransaction() -- i.e. once per transaction > > waits for > > GetSnapshotData() -- i.e. once per SQL statement > > which waits for > > GetNextTransactionId() -- i.e. once per transaction > > > > This gives some goals for scalability improvements and some proposals. > > (1) and (2) are proposals for 8.3 tuning, the others are directions for > > further research. > > > > > > Goal: Reduce total time that GetSnapshotData() waits for > > GetNextTransactionId() > > The latest patch for lazy xid allocation reduces the number of times > GetNextTransactionId() is called by eliminating the call entirely for > read only transactions. That will reduce the number of waits and so will > for most real world cases increase the scalability of Postgres. > Right-mostly workloads will be slightly less scalable, so we should > expect our TPC-C numbers to be slightly worse than our TPC-E numbers. > > We should retest to see whether the bottleneck has been moved > sufficiently to allow us to avoid doing techniques (1), (2), (3), (5) or > (6) at all. > > > 1. Increase size of Clog-specific BLCKSZ > > Clog currently uses BLCKSZ to define the size of clog buffers. This can > > be changed to use CLOG_BLCKSZ, which would then be set to 32768. > > This will naturally increase the amount of memory allocated to the clog, > > so we need not alter CLOG_BUFFERS above 8 if we do this (as previously > > suggested, with successful results). This will also reduce the number of > > ExtendClog() calls, which will probably reduce the overall contention > > also. > > > > 2. Perform ExtendClog() as a background activity > > Background process can look at the next transactionid once each cycle > > without holding any lock. If the xid is almost at the point where a new > > clog page would be allocated, then it will allocate one prior to the new > > page being absolutely required. Doing this as a background task would > > mean that we do not need to hold the XidGenLock in exclusive mode while > > we do this, which means that GetSnapshotData() and CommitTransaction() > > would also be less likely to block. Also, if any clog writes need to be > > performed when the page is moved forwards this would also be performed > > in the background. > > > 3. Consider whether ProcArrayLock should use a new queued-shared lock > > mode that puts a maximum wait time on ExclusiveLock requests. It would > > be fairly hard to implement this well as a timer, but it might be > > possible to place a limit on queue length. i.e. allow Share locks to be > > granted immediately if a Shared holder already exists, but only if there > > is a queue of no more than N exclusive mode requests queued. This might > > prevent the worst cases of exclusive lock starvation. > > (4) is a general concern that remains valid. > > > 4. Since shared locks are currently queued behind exclusive requests > > when they cannot be immediately satisfied, it might be worth > > reconsidering the way LWLockRelease works also. When we wake up the > > queue we only wake the Shared requests that are adjacent to the head of > > the queue. Instead we could wake *all* waiting Shared requestors. > > > > e.g. with a lock queue like this: > > (HEAD) S<-S<-X<-S<-X<-S<-X<-S > > Currently we would wake the 1st and 2nd waiters only. > > > > If we were to wake the 3rd, 5th and 7th waiters also, then the queue > > would reduce in length very quickly, if we assume generally uniform > > service times. (If the head of the queue is X, then we wake only that > > one process and I'm not proposing we change that). That would mean queue > > jumping right? Well thats what already happens in other circumstances, > > so there cannot be anything intrinsically wrong with allowing it, the > > only question is: would it help? > > > > We need not wake the whole queue, there may be some generally more > > beneficial heuristic. The reason for considering this is not to speed up > > Shared requests but to reduce the queue length and thus the waiting time > > for the Xclusive requestors. Each time a Shared request is dequeued, we > > effectively re-enable queue jumping, so a Shared request arriving during > > that point will actually jump ahead of Shared requests that were unlucky > > enough to arrive while an Exclusive lock was held. Worse than that, the > > new incoming Shared requests exacerbate the starvation, so the more > > non-adjacent groups of Shared lock requests there are in the queue, the > > worse the starvation of the exclusive requestors becomes. We are > > effectively randomly starving some shared locks as well as exclusive > > locks in the current scheme, based upon the state of the lock when they > > make their request. The situation is worst when the lock is heavily > > contended and the workload has a 50/50 mix of shared/exclusive requests, > > e.g. serializable transactions or transactions with lots of > > subtransactions. > > > > > > Goal: Reduce the total time that CommitTransaction() waits for > > GetSnapshotData() > > > > 5. Reduce the time that GetSnapshotData holds ProcArray lock. To do > > this, we split the ProcArrayLock into multiple partitions (as suggested > > by Alvaro). There are comments in GetNewTransactionId() about having one > > spinlock per ProcArray entry. This would be too many and we could reduce > > contention by having one lock for each N ProcArray entries. Since we > > don't see too much contention with 100 users (default) it would seem > > sensible to make N ~ 120. Striped or contiguous? If we stripe the lock > > partitions then we will need multiple partitions however many users we > > have connected, whereas using contiguous ranges would allow one lock for > > low numbers of users and yet enough locks for higher numbers of users. > > > > 6. Reduce the number of times ProcArrayLock is called in Exclusive mode. > > To do this, optimise group commit so that all of the actions for > > multiple transactions are executed together: flushing WAL, updating CLOG > > and updating ProcArray, whenever it is appropriate to do so. There's no > > point in having a group commit facility that optimises just one of those > > contention points when all 3 need to be considered. That needs to be > > done as part of a general overhaul of group commit. This would include > > making TransactionLogMultiUpdate() take CLogControlLock once for each > > page that it needs to access, which would also reduce contention from > > TransactionIdCommitTree(). > > > > (1) and (2) can be patched fairly easily for 8.3. I have a prototype > > patch for (1) on the shelf already from 6 months ago. > > > > (3), (4) and (5) seem like changes that would require significant > > testing time to ensure we did it correctly, even though the patches > > might be fairly small. I'm thinking this is probably an 8.4 change, but > > I can get test versions out fairly quickly I think. > > > > (6) seems definitely an 8.4 change. > > -- > Simon Riggs > 2ndQuadrant http://www.2ndQuadrant.com > > > ---------------------------(end of broadcast)--------------------------- > TIP 3: Have you checked our extensive FAQ? > > http://www.postgresql.org/docs/faq -- Bruce Momjian <bruce@momjian.us> http://momjian.us EnterpriseDB http://postgres.enterprisedb.com + If your life is a hard drive, Christ can be your backup. +
pgsql-hackers by date: