Home > mailing lists

Re: StrategyGetBuffer optimization, take 2 - Mailing list pgsql-hackers

From	Robert Haas
Subject	Re: StrategyGetBuffer optimization, take 2
Date	August 17, 2013 15:55:15
Msg-id	CA+Tgmobr+WBoJoUFm5ju3cL9fcNqRQzyAPhHxjyTdxSRmZVpkw@mail.gmail.com Whole thread Raw
In response to	StrategyGetBuffer optimization, take 2 (Merlin Moncure <mmoncure@gmail.com>)
Responses	Re: StrategyGetBuffer optimization, take 2
List	pgsql-hackers

Tree view

On Mon, Aug 5, 2013 at 11:49 AM, Merlin Moncure <mmoncure@gmail.com> wrote:
> *) What I think is happening:
> I think we are again getting burned by getting de-scheduled while
> holding the free list lock. I've been chasing this problem for a long
> time now (for example, see:
> http://postgresql.1045698.n5.nabble.com/High-SYS-CPU-need-advise-td5732045.html)
> but not I've got a reproducible case.  What is happening this:
>
> 1. in RelationGetBufferForTuple (hio.c): fire LockRelationForExtension
> 2. call ReadBufferBI.  this goes down the chain until StrategyGetBuffer()
> 3. Lock free list, go into clock sweep loop
> 4. while holding clock sweep, hit 'hot' buffer, spin on it
> 5. get de-scheduled
> 6. now enter the 'hot buffer spin lock lottery'
> 7. more/more backends pile on, linux scheduler goes bezerk, reducing
> chances of winning #6
> 8. finally win the lottery. lock released. everything back to normal.

This is an interesting theory, but where's the evidence?  I've seen
spinlock contention come from enough different places to be wary of
arguments that start with "it must be happening because...".

IMHO, the thing to do here is run perf record -g during one of the
trouble periods.  The performance impact is quite low.  You could
probably even set up a script that runs perf for five minute intervals
at a time and saves all of the perf.data files.  When one of these
spikes happens, grab the one that's relevant.

If you see that s_lock is where all the time is going, then you've
proved it's a PostgreSQL spinlock rather than something in the kernel
or a shared library.  If you can further see what's calling s_lock
(which should hopefully be possible with perf -g), then you've got it
nailed dead to rights.

...Robert

pgsql-hackers by date:

From: Peter Eisentraut
Date: 17 August 2013, 12:22:41
Subject: Re: libpq thread locking during SSL connection start

From: Robert Haas
Date: 17 August 2013, 16:08:24
Subject: Re: dynamic background workers, round two

Re: StrategyGetBuffer optimization, take 2 - Mailing list pgsql-hackers

Previous

Next