Following some advice from Intel,
http://www.intel.com/cd/ids/developer/asmo-na/eng/technologies/threading
/20469.htm?page=2
I'm looking at whether the LWLock data structures may be within the same
cache line.
Intel uses 128 byte cache lines on its high end processors.
slru.c uses BUFFERALIGN which is currently hardcoded in
pg_config_manual.c to be
#define ALIGNOF_BUFFER 32
which seems to be the wrong setting for the Intel CPUs, possibly others.
In slru.c we have this code fragment:
/* Release shared lock, grab per-buffer lock instead */ LWLockRelease(shared->ControlLock);
LWLockAcquire(shared->buffer_locks[slotno],LW_EXCLUSIVE);
The purpose of this is to reduce contention, by holding finer grained
locks. ISTM what this does is drop one lock then take another lock by
accessing an array (buffer_locks) which will be in the same cache line
for all locks, then access the LWLock data structure, which again will
be all within the same cache line. ISTM that we have fine grained
LWLocks, but not fine grained cache lines. That means that all Clog and
Subtrans locks would be effected, since we have 8 of each.
For other global LWlocks, the same thing applies, so BufMgrLock and many
other locks are effectively all the same from the cache's perspective.
...and BTW, what is MMCacheLock?? is that an attempt at padding already?
It looks like padding out LWLock struct would ensure that each of those
were in separate cache lines?
Any views?
Best Regards, Simon Riggs