Re: Wait free LW_SHARED acquisition - v0.9 - Mailing list pgsql-hackers
From | Andres Freund |
---|---|
Subject | Re: Wait free LW_SHARED acquisition - v0.9 |
Date | |
Msg-id | 20141014135843.GF9267@awork2.anarazel.de Whole thread Raw |
In response to | Re: Wait free LW_SHARED acquisition - v0.9 (Merlin Moncure <mmoncure@gmail.com>) |
Responses |
Re: Wait free LW_SHARED acquisition - v0.9
|
List | pgsql-hackers |
On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote: > On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund <andres@2ndquadrant.com> wrote: > > On 2014-10-10 16:41:39 +0200, Andres Freund wrote: > >> FWIW, the profile always looks like > >> - 48.61% postgres postgres [.] s_lock > >> - s_lock > >> + 96.67% StrategyGetBuffer > >> + 1.19% UnpinBuffer > >> + 0.90% PinBuffer > >> + 0.70% hash_search_with_hash_value > >> + 3.11% postgres postgres [.] GetSnapshotData > >> + 2.47% postgres postgres [.] StrategyGetBuffer > >> + 1.93% postgres [kernel.kallsyms] [k] copy_user_generic_string > >> + 1.28% postgres postgres [.] hash_search_with_hash_value > >> - 1.27% postgres postgres [.] LWLockAttemptLock > >> - LWLockAttemptLock > >> - 97.78% LWLockAcquire > >> + 38.76% ReadBuffer_common > >> + 28.62% _bt_getbuf > >> + 8.59% _bt_relandgetbuf > >> + 6.25% GetSnapshotData > >> + 5.93% VirtualXactLockTableInsert > >> + 3.95% VirtualXactLockTableCleanup > >> + 2.35% index_fetch_heap > >> + 1.66% StartBufferIO > >> + 1.56% LockReleaseAll > >> + 1.55% _bt_next > >> + 0.78% LockAcquireExtended > >> + 1.47% _bt_next > >> + 0.75% _bt_relandgetbuf > >> > >> to me. Now that's with the client count 496, but it's similar with lower > >> counts. > >> > >> BTW, that profile *clearly* indicates we should make StrategyGetBuffer() > >> smarter. > > > > Which is nearly trivial now that atomics are in. Check out the attached > > WIP patch which eliminates the spinlock from StrategyGetBuffer() unless > > there's buffers on the freelist. > > Is this safe? > > + victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1); > > - if (++StrategyControl->nextVictimBuffer >= NBuffers) > + buf = &BufferDescriptors[victim % NBuffers]; > + > + if (victim % NBuffers == 0) > <snip> > > I don't think there's any guarantee you could keep nextVictimBuffer > from wandering off the end. Not sure what you mean? It'll cycle around at 2^32. The code doesn't try to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning &BufferDescriptors I'm doing % NBuffers. Yes, that'll have the disadvantage of the first buffers being slightly more likely to be hit, but for that to be relevant you'd need unrealistically large amounts of shared_buffers. > You could buy a little safety by CAS'ing > zero in instead, followed by atomic increment. However that's still > pretty dodgy IMO and requires two atomic ops which will underperform > the spin in some situations. > > I like Robert's idea to keep the spinlock but preallocate a small > amount of buffers, say 8 or 16. I think that's pretty much orthogonal. I *do* think it makes sense to increment nextVictimBuffer in bigger steps. But the above doesn't prohibit doing so - and it'll still be be much better without the spinlock. Same number of atomics, but no potential of spinning and no potential of being put to sleep while holding the spinlock. This callsite has a comparatively large likelihood of being put to sleep while holding a spinlock because it can run for a very long time doing nothing but spinlocking. Greetings, Andres Freund -- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training & Services
pgsql-hackers by date: