Re: Wierd context-switching issue on Xeon - Mailing list pgsql-performance

From Dave Cramer
Subject Re: Wierd context-switching issue on Xeon
Date
Msg-id 1082331281.1557.47.camel@localhost.localdomain
Whole thread Raw
In response to Re: Wierd context-switching issue on Xeon  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: Wierd context-switching issue on Xeon
List pgsql-performance
So the the kernel/OS is irrelevant here ? this happens on any dual xeon?

What about hypterthreading does it still happen if HTT is turned off ?

Dave
On Sun, 2004-04-18 at 17:47, Tom Lane wrote:
> After some further digging I think I'm starting to understand what's up
> here, and the really fundamental answer is that a multi-CPU Xeon MP box
> sucks for running Postgres.
>
> I did a bunch of oprofile measurements on a machine belonging to one of
> Josh's clients, using a test case that involved heavy concurrent access
> to a relatively small amount of data (little enough to fit into Postgres
> shared buffers, so that no I/O or kernel calls were really needed once
> the test got going).  I found that by nearly any measure --- elapsed
> time, bus transactions, or machine-clear events --- the spinlock
> acquisitions associated with grabbing and releasing the BufMgrLock took
> an unreasonable fraction of the time.  I saw about 15% of elapsed time,
> 40% of bus transactions, and nearly 100% of pipeline-clear cycles going
> into what is essentially two instructions out of the entire backend.
> (Pipeline clears occur when the cache coherency logic detects a memory
> write ordering problem.)
>
> I am not completely clear on why this machine-level bottleneck manifests
> as a lot of context swaps at the OS level.  I think what is happening is
> that because SpinLockAcquire is so slow, a process is much more likely
> than you'd normally expect to arrive at SpinLockAcquire while another
> process is also acquiring the spinlock.  This puts the two processes
> into a "lockstep" condition where the second process is nearly certain
> to observe the BufMgrLock as locked, and be forced to suspend itself,
> even though the time the first process holds the BufMgrLock is not
> really very long at all.
>
> If you google for Xeon and "cache coherency" you'll find quite a bit of
> suggestive information about why this might be more true on the Xeon
> setup than others.  A couple of interesting hits:
>
> http://www.theinquirer.net/?article=10797
> says that Xeon MP uses a *slower* FSB than Xeon DP.  This would
> translate directly to more time needed to transfer a dirty cache line
> from one processor to the other, which is the basic operation that we're
> talking about here.
>
> http://www.aceshardware.com/Spades/read.php?article_id=30000187
> says that Opterons use a different cache coherency protocol that is
> fundamentally superior to the Xeon's, because dirty cache data can be
> transferred directly between two processor caches without waiting for
> main memory.
>
> So in the short term I think we have to tell people that Xeon MP is not
> the most desirable SMP platform to run Postgres on.  (Josh thinks that
> the specific motherboard chipset being used in these machines might
> share some of the blame too.  I don't have any evidence for or against
> that idea, but it's certainly possible.)
>
> In the long run, however, CPUs continue to get faster than main memory
> and the price of cache contention will continue to rise.  So it seems
> that we need to give up the assumption that SpinLockAcquire is a cheap
> operation.  In the presence of heavy contention it won't be.
>
> One thing we probably have got to do soon is break up the BufMgrLock
> into multiple finer-grain locks so that there will be less contention.
> However I am wary of doing this incautiously, because if we do it in a
> way that makes for a significant rise in the number of locks that have
> to be acquired to access a buffer, we might end up with a net loss.
>
> I think Neil Conway was looking into how the bufmgr might be
> restructured to reduce lock contention, but if he had come up with
> anything he didn't mention exactly what.  Neil?
>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
>
>
>
> !DSPAM:4082feb7326901956819835!
>
>
--
Dave Cramer
519 939 0336
ICQ # 14675561


pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: sunquery and estimated rows
Next
From: Greg Stark
Date:
Subject: Re: Wierd context-switching issue on Xeon