Re: Spinlock performance improvement proposal - Mailing list pgsql-hackers

From Neil Padgett
Subject Re: Spinlock performance improvement proposal
Date
Msg-id 3BB24034.4DE44D62@redhat.com
Whole thread Raw
In response to Spinlock performance improvement proposal  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Spinlock performance improvement proposal
List pgsql-hackers
Tom Lane wrote:
> 
> Neil Padgett <npadgett@redhat.com> writes:
> > Initial results (top five -- if you would like a complete profile, let
> > me know):
> > Each sample counts as 1 samples.
> >   %   cumulative   self              self     total
> >  time   samples   samples    calls  T1/call  T1/call  name
> >  26.57  42255.02 42255.02                             FindLockCycleRecurse
> 
> Yipes.  It would be interesting to know more about the locking pattern
> of your benchmark --- are there long waits-for chains, or not?  The
> present deadlock detector was certainly written with an eye to "get it
> right" rather than "make it fast", but I wonder whether this shows a
> performance problem in the detector, or just too many executions because
> you're waiting too long to get locks.
> 
> > However, this seems to be a red herring. Removing the deadlock detector
> > had no effect. In fact, benchmarking showed removing it yielded no
> > improvement in transaction processing rate on uniprocessor or SMP
> > systems. Instead, it seems that the deadlock detector simply amounts to
> > "something to do" for the blocked backend while it waits for lock
> > acquisition.
> 
> Do you have any idea about the typical lock-acquisition delay in this
> benchmark?  Our docs advise trying to set DEADLOCK_TIMEOUT higher than
> the typical acquisition delay, so that the deadlock detector does not
> run unnecessarily.

Well. Currently the runs are the typical pg_bench runs. This was useful
since it was a handy benchmark that was already done, and I was hoping
it might be useful for comparison since it seems to be popular. More
benchmarks of different types would of course be useful though. 

I think the large time consumed by the deadlock detector in the profile
is simply due to too many executions while waiting to acquire to
contended locks. But, I agree that it seems DEADLOCK_TIMEOUT was set too
low, since it appears from the profile output that the deadlock detector
was running unnecessarily. But the deadlock detector isn't causing the
SMP performance hit right now, since the throughput is the same with it
in place or with it removed completely. I therefore didn't make any
attempt to tune DEADLOCK_TIMEOUT. As I mentioned before, it apparently
just gives the backend "something" to do while it waits for a lock. 

I'm thinking that the deadlock detector unnecessarily has no effect on
performance since the shared memory is causing some level of
serialization. So, one CPU (or two, or three, but not all) is doing
useful work, while the others are idle (that is to say, doing no useful
work). If they are idle spinning, or idle running the deadlock detector
the net throughput is still the same. (This might also indicate that
improving the lock design won't help here.) Of course, another
possibility is that you spend so long spinning simply because you do
spin (rather than sleep), and this is wasting much CPU time so the
useful work backends take longer to get things done. Either is just
speculation right now without any data to back things up.

> 
> > For example, there has been some suggestion
> > that perhaps some component of the database is causing large lock
> > contention.
> 
> My thought as well.  I would certainly recommend that you use more than
> one test case while looking at these things.

Yes. That is another suggestion for a next step. Several cases might
serve to better expose the path causing the slowdown. I think that
several test cases of varying usage patterns, coupled with hold time
instrumentation (which can tell what routine acquired the lock and how
long it held it, and yield wait-for data in the analysis), are the right
way to go about attacking SMP performance. Any other thoughts?

Neil

-- 
Neil Padgett
Red Hat Canada Ltd.                       E-Mail:  npadgett@redhat.com
2323 Yonge Street, Suite #300, 
Toronto, ON  M4P 2C9


pgsql-hackers by date:

Previous
From: Andrew McMillan
Date:
Subject: Re: casting for dates
Next
From: "Mitch Vincent"
Date:
Subject: Re: casting for dates