On Thu, 15 Sep 2005, Tom Lane wrote:
> One thing that did seem to help a little bit was padding the LWLocks
> to 32 bytes (by default they are 24 bytes each on x86_64) and ensuring
> the array starts on a 32-byte boundary. This ensures that we won't have
> any LWLocks crossing cache lines --- contended access to such an LWLock
> would probably incur the sort of large penalty seen above, because you'd
> be trading two cache lines back and forth not one. It seems that the
> important locks are not split that way in CVS tip, because the gain
> wasn't much, but I wonder whether some effect like this might explain
> some of the unexplainable performance changes we've noticed in the past
> (eg, in the dbt2 results). A seemingly unrelated small change in the
> size of other data structures in shared memory might move things around
> enough to make a performance-critical lock cross a cache line boundary.
What about padding the LWLock to 64 bytes on these architectures. Both P4
and Opteron have 64 byte cache lines, IIRC. This would ensure that a
cacheline doesn't hold two LWLocks.
Gavin