Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From Andres Freund
Subject Re: Clock sweep not caching enough B-Tree leaf pages?
Date
Msg-id 20140416133533.GH17874@awork2.anarazel.de
Whole thread Raw
In response to Re: Clock sweep not caching enough B-Tree leaf pages?  (Merlin Moncure <mmoncure@gmail.com>)
Responses Re: Clock sweep not caching enough B-Tree leaf pages?  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-hackers
On 2014-04-16 08:25:23 -0500, Merlin Moncure wrote:
> The downside of this approach was complexity and difficult to test for
> edge case complexity.  I would like to point out though that while i/o
> efficiency gains are nice, I think contention issues are the bigger
> fish to fry.

That's my feeling as well.

> 
> On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
> >> What about:  9. Don't wait on locked buffer in the clock sweep.
> >
> > I don't think we do that? Or are you referring to locked buffer headers?
> 
> Right -- exactly.  I posted patch for this a while back. It's quite
> trivial: implement a trylock variant of the buffer header lock macro
> and further guard the check with a non-locking test (which TAS()
> already does generally, but the idea is to avoid the cache line lock
> in likely cases of contention).  I believe this to be unambiguously
> better: even if it's self healing or unlikely, there is no good reason
> to jump into a spinlock fray or even request a contented cache line
> while holding a critical lock.

IIRC you had problems proving the benefits of that, right?

I think that's because the locking times of buffer headers are short
enough that it's really unlikely to read a locked buffer header
spinlock. The spinlock acquiration will have made the locker the
exclusive owner of the spinlock in the majority of cases, and as soon as
that happens the cache miss/transfer will take far longer than the lock
takes.

I think this is the wrong level to optimize things. Imo there's two
possible solutions (that don't exclude each other):

* perform the clock sweep in one process so there's a very fast way to get to a free buffer. Possibly in a partitioned
way.

* Don't take a global exclusive lock while performing the clock sweep. Instead increase
StrategyControl->nextVictimBufferin chunks under an exclusive lock, and then scan the potential victim buffers in those
chunkswithout a global lock held.
 

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services



pgsql-hackers by date:

Previous
From: Petr Jelinek
Date:
Subject: Re: bgworker crashed or not?
Next
From: Bruce Momjian
Date:
Subject: Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0