Thread: Final Thoughts for 8.3 on LWLocking and Scalability
I've completed a review of all of the LWlocking in the backends. This is documented in the enclosed file. I would propose that we use this as comments in lwlock.h or in the README, if people agree. A number of points emerge from that analysis: 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining operation: XidCacheRemoveRunningXids(). Reducing things to that level is brilliant work, Florian and Tom. After analysis, I am still concerned because subxact abort could now be starved out by large number of shared holders, then when it is acquired we may experience starvation of shared requestors, as described in point (4) here: http://archives.postgresql.org/pgsql-hackers/2007-07/msg00948.php I no longer want to solve it in the way described there, but have a solution described in a separate post on -hackers. The original solution still seems valid, but if we can solve it another way we should. 2. CountActiveBackends() searches the whole of the proc array, even though it could stop when it gets to commit_siblings. Stopping once the heuristic has been determined seems like the best thing to do. A small patch to implement this is attached. 3. ReceiveSharedInvalidMessages() takes a Shared lock on SInvalLock, then takes an Exclusive lock later in the same routine to perform SIDelExpiredDataEntries(). The latter routine examines data that it hasn't touched to see if it can delete anything. If it finds anything other than its own consumed message it will only be because it beat another backend in the race to delete a message it just consumed. So most callers of SIDelExpiredDataEntries() will do nothing at all, after having queued for an X lock. I can't see the sense in that, but maybe there is some deeper purpose? ISTM that we should only attempt to clean the queue when it fills, during SIInsertDataEntry(), which it already does. We want to avoid continually re-triggering postmaster signals, but we should do that anyway with a "yes-I-already-did-that" flag, rather than by eager cleaning of the queue, which just defers a postmaster signal storm, but does not prevent it. 4. WALWriteLock is acquired in Shared mode by bgwriter when it runs GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the Shared request will queue like everybody else. WALWriteLock queue length can be long, so the bgwriter can get stuck for much longer than bgwriter_delay when it makes this call; this happens only when archive_timeout > 0 so probably has never shown up in any performance testing. XLogWrite takes info_lck also, so we can move the lastSegSwitchTime behind that lock instead. That way bgwriter need never wait on I/O, just spin for access to info_lck. Minor change. 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), but I can't find a caller of that anywhere in core or contrib. Can those now be removed? 6. David Strong talked about doing some testing to see if NUM_BUFFER_PARTITIONS should be increased above 16. We don't have any further information on that. Should we increase the value to 32 or 64? A minor increase seems safe and should provide the most gain without decreasing performance for lower numbers of CPUs. 7. VACUUM has many contention points within it, so HOT should avoid the annoyance of having to run VACUUM repeatedly on very small heavily-updated tables. I haven't further analysed the SLRU locks, since nothing much has changed there recently and they were already pretty efficient, IIRC. I'm working on patches for 1-4. We've moved far in recent weeks, so it seems like we should finish the job. Comments? -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Attachment
On 9/11/07, Simon Riggs <simon@2ndquadrant.com> wrote: > 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), > but I can't find a caller of that anywhere in core or contrib. Can those > now be removed? GetNextXidAndEpoch() is needed for external modules to use 8-byte transaction ids user-level. Used by txid module in Skytools/pgq. Please keep it. -- marko
On Tue, 2007-09-11 at 13:31 +0300, Marko Kreen wrote: > On 9/11/07, Simon Riggs <simon@2ndquadrant.com> wrote: > > 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), > > but I can't find a caller of that anywhere in core or contrib. Can those > > now be removed? > > GetNextXidAndEpoch() is needed for external modules to use > 8-byte transaction ids user-level. Used by txid module in > Skytools/pgq. > > Please keep it. Guessed it might be you folks that needed it. I'll comment that, so we don't forget it again. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining > operation: XidCacheRemoveRunningXids(). Reducing things to that level is > brilliant work, Florian and Tom. It would be brilliant if it were true, but it isn't. Better look again. regards, tom lane
On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: > Simon Riggs <simon@2ndquadrant.com> writes: > > 1. The ProcArrayLock is acquired Exclusive-ly by only one remaining > > operation: XidCacheRemoveRunningXids(). Reducing things to that level is > > brilliant work, Florian and Tom. > > It would be brilliant if it were true, but it isn't. Better look again. On the more detailed explanation, I say "in normal operation". My analytical notes attached to the original post show ProcArrayLock is acquired exclusively during backend start, exit and while making a prepared (twophase) commit. So yes, it is locked Exclusively in other places, but they happen rarely and they actually add/remove procs from the array, so its unlikely anything can change there anyhow. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs wrote: > On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: >> Simon Riggs <simon@2ndquadrant.com> writes: >>> 1. The ProcArrayLock is acquired Exclusive-ly by only one >>> remaining operation: XidCacheRemoveRunningXids(). Reducing things >>> to that level is brilliant work, Florian and Tom. >> It would be brilliant if it were true, but it isn't. Better look >> again. > > On the more detailed explanation, I say "in normal operation". > > My analytical notes attached to the original post show ProcArrayLock > is acquired exclusively during backend start, exit and while making a > prepared (twophase) commit. So yes, it is locked Exclusively in > other places, but they happen rarely and they actually add/remove > procs from the array, so its unlikely anything can change there > anyhow. Well, and during normal during COMMIT and ABORT, which might happen rather frequently ;-) I do agree, however, that XidCacheRemoveRunningXids() is the only site left where getting rid of it might be possible, and might bring measurable benefit for some workloads. With more effort, we might not even need it during ABORT, but I doubt that the effort would be worth it. While some (plpgsql intensive) workloads might abort subxacts rather frequently, I doubt that same holds true for toplevel aborts. I'm actually working on a patch to remove that lock from XidCacheRemoveRunningXids(), but I'm not yet completely sure that my approach is safe. Tom had some objections that I take rather seriously. We'll see ;-) greetings, Florian Pflug
On Tue, 2007-09-11 at 19:32 +0200, Florian G. Pflug wrote: > Simon Riggs wrote: > > On Tue, 2007-09-11 at 10:21 -0400, Tom Lane wrote: > >> Simon Riggs <simon@2ndquadrant.com> writes: > >>> 1. The ProcArrayLock is acquired Exclusive-ly by only one > >>> remaining operation: XidCacheRemoveRunningXids(). Reducing things > >>> to that level is brilliant work, Florian and Tom. > >> It would be brilliant if it were true, but it isn't. Better look > >> again. > > > > On the more detailed explanation, I say "in normal operation". > > > > My analytical notes attached to the original post show ProcArrayLock > > is acquired exclusively during backend start, exit and while making a > > prepared (twophase) commit. So yes, it is locked Exclusively in > > other places, but they happen rarely and they actually add/remove > > procs from the array, so its unlikely anything can change there > > anyhow. > > Well, and during normal during COMMIT and ABORT, which might happen > rather frequently ;-) Agreed, that part of my assessment was not accurate... -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com
Simon Riggs <simon@2ndquadrant.com> writes: > I've completed a review of all of the LWlocking in the backends. This is > documented in the enclosed file. I would propose that we use this as > comments in lwlock.h or in the README, if people agree. I don't think that putting this list in as documentation is a smart idea --- it would inevitably become out-of-date, and misleading documentation is worse than none. Parts of it are out of date already (which kinda proves my point considering that very little new development has gone into the tree since September). Since anyone who's concerned about a particular lock can grep for uses of it pretty easily, I think we should just figure on them doing that. > 2. CountActiveBackends() searches the whole of the proc array, even > though it could stop when it gets to commit_siblings. Stopping once the > heuristic has been determined seems like the best thing to do. A small > patch to implement this is attached. At the moment CountActiveBackends doesn't take the lock anymore, so I'm thinking that changing this is not worthwhile. > [ sinval lock management needs redesign ] Yup it does. > 4. WALWriteLock is acquired in Shared mode by bgwriter when it runs > GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the > Shared request will queue like everybody else. WALWriteLock queue length > can be long, so the bgwriter can get stuck for much longer than > bgwriter_delay when it makes this call; this happens only when > archive_timeout > 0 so probably has never shown up in any performance > testing. XLogWrite takes info_lck also, so we can move the > lastSegSwitchTime behind that lock instead. That way bgwriter need never > wait on I/O, just spin for access to info_lck. Minor change. This seems like a possibly reasonable thing to do; did you ever write a patch for it? > 5. ReadNewTransactionId() is only called now by GetNextXidAndEpoch(), > but I can't find a caller of that anywhere in core or contrib. Can those > now be removed? No. It's needed by Slony. > 6. David Strong talked about doing some testing to see if > NUM_BUFFER_PARTITIONS should be increased above 16. We don't have any > further information on that. Should we increase the value to 32 or 64? Not without some testing. regards, tom lane
On Wed, 2008-03-19 at 12:24 -0400, Tom Lane wrote: > > [ sinval lock management needs redesign ] > > Yup it does. I wrote a redesigned, simplified version of my earlier patch. Enclosed here for discussion only, not expecting this to be the final version. Comments at top of patch explain it. The basic idea is to identify a single backend to delete the sinval message queue, without the need for redesigning the postmaster to handle single-backend invalidation messages. > > 4. WALWriteLock is acquired in Shared mode by bgwriter when it runs > > GetLastSegSwitchTime(). All other callers are Exclusive lockers, so the > > Shared request will queue like everybody else. WALWriteLock queue length > > can be long, so the bgwriter can get stuck for much longer than > > bgwriter_delay when it makes this call; this happens only when > > archive_timeout > 0 so probably has never shown up in any performance > > testing. XLogWrite takes info_lck also, so we can move the > > lastSegSwitchTime behind that lock instead. That way bgwriter need never > > wait on I/O, just spin for access to info_lck. Minor change. > > This seems like a possibly reasonable thing to do; did you ever write > a patch for it? No, but happy to do so. -- Simon Riggs 2ndQuadrant http://www.2ndQuadrant.com PostgreSQL UK 2008 Conference: http://www.postgresql.org.uk