On Mon, Jan 13, 2014 at 03:24:38PM -0800, Josh Berkus wrote:
> On 01/13/2014 02:26 PM, Mel Gorman wrote:
> > Really?
> >
> > zone_reclaim_mode is often a complete disaster unless the workload is
> > partitioned to fit within NUMA nodes. On older kernels enabling it would
> > sometimes cause massive stalls. I'm actually very surprised to hear it
> > fixes anything and would be interested in hearing more about what sort
> > of circumstnaces would convince you to enable that thing.
>
> So the problem with the default setting is that it pretty much isolates
> all FS cache for PostgreSQL to whichever socket the postmaster is
> running on, and makes the other FS cache unavailable. This means that,
> for example, if you have two memory banks, then only one of them is
> available for PostgreSQL filesystem caching ... essentially cutting your
> available cache in half.
No matter what default NUMA allocation policy we set, there will be
an application for which that behaviour is wrong. As such, we've had
tools for setting application specific NUMA policies for quite a few
years now. e.g:
$ man 8 numactl
.... --interleave=nodes, -i nodes Set a memory interleave policy. Memory will be allocated using round
robinon nodes. When memory cannot be allocated on the current interleave target fall back to other nodes.
Multiplenodes may be specified on --interleave, --membind and --cpunodebind.
Cheers,
Dave.
--
Dave Chinner
david@fromorbit.com