Further reduction of bufmgr lock contention - Mailing list pgsql-hackers

I've been looking into Gavin Hamill's recent report of poor performance
with PG 8.1 on an 8-way IBM PPC64 box.  strace'ing backends shows a lot
of semop() calls, indicating blocking at the LWLock or lmgr-lock levels,
but not a lot of select() delays, suggesting we don't have too much of a
problem at the hardware spinlock level.  A typical breakdown of
different kernel call types is
   566 _llseek    10 brk    10 gettimeofday     4 mmap     4 munmap   562 read     4 recv     8 select  3014 semop
12send     1 time     3 write
 

(I'm hoping to get some oprofile results to confirm there's nothing
strange going on at the hardware level, but no luck yet on getting
oprofile to work on Debian/PPC64 ... anyone know anything about suitable
kernels to use for that?)

Instrumenting LWLockAcquire (with a patch I had developed last fall,
but just now got around to cleaning up and committing to CVS) shows
that the contention is practically all for the BufMappingLock:

$ grep ^PID postmaster.log | sort +9nr | head -20
PID 23820 lwlock 0: shacq 2446470 exacq 6154 blk 12755
PID 23823 lwlock 0: shacq 2387597 exacq 4297 blk 9255
PID 23824 lwlock 0: shacq 1678694 exacq 4433 blk 8692
PID 23826 lwlock 0: shacq 1221221 exacq 3224 blk 5893
PID 23821 lwlock 0: shacq 1892453 exacq 1665 blk 5766
PID 23835 lwlock 0: shacq 2390685 exacq 1453 blk 5511
PID 23822 lwlock 0: shacq 1669419 exacq 1615 blk 4926
PID 23830 lwlock 0: shacq 1039468 exacq 1248 blk 2946
PID 23832 lwlock 0: shacq 698622 exacq 397 blk 1818
PID 23836 lwlock 0: shacq 544472 exacq 530 blk 1300
PID 23839 lwlock 0: shacq 497505 exacq 46 blk 885
PID 23842 lwlock 0: shacq 305281 exacq 1 blk 720
PID 23840 lwlock 0: shacq 317554 exacq 226 blk 355
PID 23840 lwlock 2: shacq 0 exacq 2872 blk 7
PID 23835 lwlock 2: shacq 0 exacq 3434 blk 6
PID 23835 lwlock 1: shacq 0 exacq 1452 blk 4
PID 23822 lwlock 1: shacq 0 exacq 1614 blk 3
PID 23820 lwlock 2: shacq 0 exacq 3582 blk 2
PID 23821 lwlock 1: shacq 0 exacq 1664 blk 2
PID 23830 lwlock 1: shacq 0 exacq 1247 blk 2

These numbers show that our rewrite of the bufmgr has done a great job
of cutting down the amount of potential contention --- most of the
traffic on this lock is shared rather than exclusive acquisitions ---
but it seems that if you have enough CPUs it's still not good enough.
(My best theory as to why Gavin is seeing better performance from a
dual Opteron is simply that 2 processors will have 1/4th as much
contention as 8 processors.)

I have an idea about how to improve matters: I think we could break the
buffer tag to buffer mapping hashtable into multiple partitions based on
some hash value of the buffer tags, and protect each partition under a
separate LWLock, similar to what we did with the lmgr lock table not
long ago.  Anyone have a comment on this strategy, or a better idea?
        regards, tom lane


pgsql-hackers by date:

Previous
From: Alvaro Herrera
Date:
Subject: Re: TODO item question [pg_hba.conf]
Next
From: Tom Lane
Date:
Subject: Re: TODO item question [pg_hba.conf]