Re: Wait free LW_SHARED acquisition - v0.9 - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Wait free LW_SHARED acquisition - v0.9
Date
Msg-id 20141014135843.GF9267@awork2.anarazel.de
Whole thread Raw
In response to Re: Wait free LW_SHARED acquisition - v0.9  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Wait free LW_SHARED acquisition - v0.9
List pgsql-hackers
On 2014-10-14 08:40:49 -0500, Merlin Moncure wrote:
> On Fri, Oct 10, 2014 at 11:00 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-10-10 16:41:39 +0200, Andres Freund wrote:
> >> FWIW, the profile always looks like
> >> -  48.61%      postgres  postgres              [.] s_lock
> >>    - s_lock
> >>       + 96.67% StrategyGetBuffer
> >>       + 1.19% UnpinBuffer
> >>       + 0.90% PinBuffer
> >>       + 0.70% hash_search_with_hash_value
> >> +   3.11%      postgres  postgres              [.] GetSnapshotData
> >> +   2.47%      postgres  postgres              [.] StrategyGetBuffer
> >> +   1.93%      postgres  [kernel.kallsyms]     [k] copy_user_generic_string
> >> +   1.28%      postgres  postgres              [.] hash_search_with_hash_value
> >> -   1.27%      postgres  postgres              [.] LWLockAttemptLock
> >>    - LWLockAttemptLock
> >>       - 97.78% LWLockAcquire
> >>          + 38.76% ReadBuffer_common
> >>          + 28.62% _bt_getbuf
> >>          + 8.59% _bt_relandgetbuf
> >>          + 6.25% GetSnapshotData
> >>          + 5.93% VirtualXactLockTableInsert
> >>          + 3.95% VirtualXactLockTableCleanup
> >>          + 2.35% index_fetch_heap
> >>          + 1.66% StartBufferIO
> >>          + 1.56% LockReleaseAll
> >>          + 1.55% _bt_next
> >>          + 0.78% LockAcquireExtended
> >>       + 1.47% _bt_next
> >>       + 0.75% _bt_relandgetbuf
> >>
> >> to me. Now that's with the client count 496, but it's similar with lower
> >> counts.
> >>
> >> BTW, that profile *clearly* indicates we should make StrategyGetBuffer()
> >> smarter.
> >
> > Which is nearly trivial now that atomics are in. Check out the attached
> > WIP patch which eliminates the spinlock from StrategyGetBuffer() unless
> > there's buffers on the freelist.
> 
> Is this safe?
> 
> + victim = pg_atomic_fetch_add_u32(&StrategyControl->nextVictimBuffer, 1);
> 
> - if (++StrategyControl->nextVictimBuffer >= NBuffers)
> + buf = &BufferDescriptors[victim % NBuffers];
> +
> + if (victim % NBuffers == 0)
> <snip>
> 
> I don't think there's any guarantee you could keep nextVictimBuffer
> from wandering off the end.

Not sure what you mean? It'll cycle around at 2^32. The code doesn't try
to avoid nextVictimBuffer from going above NBuffers. To avoid overrunning
&BufferDescriptors I'm doing % NBuffers.

Yes, that'll have the disadvantage of the first buffers being slightly
more likely to be hit, but for that to be relevant you'd need
unrealistically large amounts of shared_buffers.

> You could buy a little safety by CAS'ing
> zero in instead, followed by atomic increment.  However that's still
> pretty dodgy IMO and requires two atomic ops which will underperform
> the spin in some situations.
> 
> I like Robert's idea to keep the spinlock but preallocate a small
> amount of buffers, say 8 or 16.

I think that's pretty much orthogonal. I *do* think it makes sense to
increment nextVictimBuffer in bigger steps. But the above doesn't
prohibit doing so - and it'll still be be much better without the
spinlock. Same number of atomics, but no potential of spinning and no
potential of being put to sleep while holding the spinlock.

This callsite has a comparatively large likelihood of being put to sleep
while holding a spinlock because it can run for a very long time doing
nothing but spinlocking.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Merlin Moncure
Date:
Subject: Re: Wait free LW_SHARED acquisition - v0.9
Next
From: Tom Lane
Date:
Subject: Re: Drop any statistics of table after it's truncated