Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile - Mailing list pgsql-hackers

From Merlin Moncure
Subject Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile
Date
Msg-id CAHyXU0xbyCVchZ9MJdBQpNGuwa4-G6C-39A9FW02357RLdBOCw@mail.gmail.com
Whole thread Raw
In response to Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Robert Haas <robertmhaas@gmail.com>)
Responses Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile  (Ants Aasma <ants@cybertec.at>)
List pgsql-hackers
On Fri, Jun 1, 2012 at 3:40 PM, Robert Haas <robertmhaas@gmail.com> wrote:
> On Fri, Jun 1, 2012 at 3:16 PM, Florian Pflug <fgp@phlo.org> wrote:
>> Ok, now you've lost me. If the read() blocks, how on earth can a few
>> additional pins/unpins ever account for any meaningful execution time?
>>
>> It seems to me that once read() blocks we're talking about a delay in the
>> order of the scheduling granularity (i.e., milliseconds, in the best case),
>> while even in the word case pinning a buffer shouldn't take more than
>> 1000 cycles (for comparison, I think a cache miss across all layers costs
>> a few hundred cycles). So there's at the very least 3 order of magnitude
>> between those two...
>
> I'm not sure what you want me to say here.  s_lock shows up in the
> profile, and some of that is from PinBuffer.  I think any detectable
> number of calls to s_lock is a sign of Bad Things (TM).  I can't
> reproduce anything as severe as what the OP is seeing, but what does
> that prove?  In a couple years we'll have systems with 128 cores
> floating around, and things that are minor problems at 32 or even 64
> cores will be crippling at 128 cores.  IME, spinlock contention has a
> very sharp tipping point.  It's only a minor annoyance and then you
> hit some threshold number of cores and, bam, you're spending 70-90% of
> your time across all cores fighting over that one spinlock.

I think your approach, nailing buffers, is really the way to go.  It
nails buffers based on detected contention which is very desirable --
uncontended spinlocks aren't broken and don't need to be fixed.  It
also doesn't add overhead in the general case whereas a side by side
backend queue does.

Another nice aspect is that you're not changing the lifetime of the
pin as the backend sees it but storing the important stuff (the
interplay with usage_count is a nice touch) on the buffer itself --
you want to keep as little as possible in the backend private side and
your patch does that; it's more amenable to 3rd party intervention
(flush your buffers right now!) then extended pins.  It exploits the
fact that pins can overlap and that the reference count is useless if
the buffer is always in memory anyways.  It immediately self corrects
when the first backend gripes whereas a per backend solution will
grind down as each backend independently determines it's got a problem
-- not pleasant if your case is 'walking' a set of buffers.

Buffer pins aren't a cache: with a cache you are trying to mask a slow
operation (like a disk i/o) with a faster such that the amount of slow
operations are minimized.  Buffer pins however are very different in
that we only care about contention on the reference count (the buffer
itself is not locked!) which makes me suspicious that caching type
algorithms are the wrong place to be looking.  I think it comes to do
picking between your relatively complex but general, lock displacement
approach or a specific strategy based on known bottlenecks.

merlin


pgsql-hackers by date:

Previous
From: Robert Haas
Date:
Subject: Re: Re: [COMMITTERS] pgsql: Send new protocol keepalive messages to standby servers.
Next
From: Ants Aasma
Date:
Subject: Re: 9.2beta1, parallel queries, ReleasePredicateLocks, CheckForSerializableConflictIn in the oprofile