Home > mailing lists

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

From	Andres Freund
Subject	Re: Clock sweep not caching enough B-Tree leaf pages?
Date	April 16, 2014 13:35:42
Msg-id	20140416133533.GH17874@awork2.anarazel.de Whole thread Raw
In response to	Re: Clock sweep not caching enough B-Tree leaf pages? (Merlin Moncure <mmoncure@gmail.com>)
Responses	Re: Clock sweep not caching enough B-Tree leaf pages?
List	pgsql-hackers

Tree view

On 2014-04-16 08:25:23 -0500, Merlin Moncure wrote:
> The downside of this approach was complexity and difficult to test for
> edge case complexity.  I would like to point out though that while i/o
> efficiency gains are nice, I think contention issues are the bigger
> fish to fry.

That's my feeling as well.

> 
> On Wed, Apr 16, 2014 at 8:14 AM, Andres Freund <andres@2ndquadrant.com> wrote:
> > On 2014-04-16 07:55:44 -0500, Merlin Moncure wrote:
> >> What about:  9. Don't wait on locked buffer in the clock sweep.
> >
> > I don't think we do that? Or are you referring to locked buffer headers?
> 
> Right -- exactly.  I posted patch for this a while back. It's quite
> trivial: implement a trylock variant of the buffer header lock macro
> and further guard the check with a non-locking test (which TAS()
> already does generally, but the idea is to avoid the cache line lock
> in likely cases of contention).  I believe this to be unambiguously
> better: even if it's self healing or unlikely, there is no good reason
> to jump into a spinlock fray or even request a contented cache line
> while holding a critical lock.

IIRC you had problems proving the benefits of that, right?

I think that's because the locking times of buffer headers are short
enough that it's really unlikely to read a locked buffer header
spinlock. The spinlock acquiration will have made the locker the
exclusive owner of the spinlock in the majority of cases, and as soon as
that happens the cache miss/transfer will take far longer than the lock
takes.

I think this is the wrong level to optimize things. Imo there's two
possible solutions (that don't exclude each other):

* perform the clock sweep in one process so there's a very fast way to get to a free buffer. Possibly in a partitioned
way.

* Don't take a global exclusive lock while performing the clock sweep. Instead increase
StrategyControl->nextVictimBufferin chunks under an exclusive lock, and then scan the potential victim buffers in those
chunkswithout a global lock held.

Greetings,

Andres Freund

-- Andres Freund                       http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services

pgsql-hackers by date:

From: Petr Jelinek
Date: 16 April 2014, 13:35:08
Subject: Re: bgworker crashed or not?

From: Bruce Momjian
Date: 16 April 2014, 13:49:03
Subject: Re: [BUG FIX] Compare returned value by socket() against PGINVALID_SOCKET instead of < 0

Re: Clock sweep not caching enough B-Tree leaf pages? - Mailing list pgsql-hackers

Previous

Next