On Mon, Jul 30, 2012 at 10:43 AM, Kevin Grittner
<Kevin.Grittner@wicourts.gov> wrote:
> node distances:
> node 0 1 2 3
> 0: 10 11 11 11
> 1: 11 10 11 11
> 2: 11 11 10 11
> 3: 11 11 11 10
>
> When considering a hardware purchase, it might be wise to pay close
> attention to how "far" a core may need to go to get to the most
> "distant" RAM.
I think the zone_reclaim gets turned on with a high ratio. If the
inter node costs were the same, and the intranode costs dropped in
half, zone reclaim would likely get turned on at boot time.
I had something similar in a 48 core system but if I recall correctly
the matrix was 8x8 and the cost differential was much higher.
The symptoms I saw was that a very hard working db, on a 128G machine
with about 95G as OS / kernel cache, would slow to a crawl with kswapd
working very hard (I think it was kswapd) after a period of 1 to 3
weeks. Note that actual swap in and out wasn't all that great by
vmstat. The same performance hit happened on a similar machine used
as a file server after a similar period of warm up.
The real danger here is that the misbehavior can take a long time to
show up, and from what I read at the time, the performance gain for
any zone reclaim = 1 was minimal for a file or db server, and more in
line for a large virtual machine farm, with a lot of processes chopped
into sections small enough to fit in one node's memory and not need a
lot of access from another node. Anything that relies on the OS to
cache is likely not served by zone reclaim = 1.