Re: optimal hardware for postgres? - Mailing list pgsql-general

From Marco Colombo
Subject Re: optimal hardware for postgres?
Date
Msg-id 1114511653.12081.42.camel@Frodo.esi
Whole thread Raw
In response to Re: optimal hardware for postgres?  (William Yu <wyu@talisys.com>)
List pgsql-general
On Tue, 2005-04-26 at 01:32 -0700, William Yu wrote:
> Linux 2.6 does have NUMA support. But whether it's actually a for
> Postgres is debatable due to the architecture.
>
> First let's take a look at how NUMA makes this run faster in a 2x
> Opteron system. The idea is that the processes running on CPU0 can
> access memory attached to that CPU a lot faster than memory attached to
> CPU1. So in a NUMA-aware system, a process running on CPU0 can request
> all it's memory be located memory bank0.
[...]

This is only part of the truth. You should compare it with real SMP
solutions. The idea is that CPU0 can access directly attached memory
faster than it would on a SMP system, given equivalent or so technology,
of course. So NUMA has a fast path and a slow path, while SMP has only
one, uniform, medium path. The whole point is where the SMP path lies.

If it's close to the fast (local) path in NUMA, then NUMA won't pay off
(performance wise) unless the application is NUMA-aware _and_
NUMA-friendly (which depends on how the application is writter, assuming
the underlying problem _can_ be solved in a NUMA-friendly way).

If the SMP path is close to the slow (remote) path in NUMA (for example,
they have to keep the caches coherent, and obey to memory barriers and
locks) then NUMA has little to loose for NUMA-unaware or NUMA-unfriendly
applications (worst case), and a lot to gain when some NUMA-aware
optimizations kick in.

I've read some articles that refer to the name SUMO (sufficiently
uniform memory organization) AMD would use to describe their NUMA,
which seems to imply that their worst case is "sufficiently" close
to the usual SMP timing.

There are other interesting issues in SMP scaling, on the software side.
Scaling with N > 8 might force partitioning at software level anyway,
in order to reduce the number of locks, both as software objects
(reduce software complexity) and as hardware events (reduce time spent
in useless synchronizations). See:

http://www.bitmover.com/llnl/smp.pdf

This also affects ccNUMA, of course, I'm not saying NUMA avoids this in
any way. But it's a price _both_ have to pay, moving their numbers
towards the worst case anyway (which makes the worst case not so worse).

.TM.
--
      ____/  ____/   /
     /      /       /                   Marco Colombo
    ___/  ___  /   /                  Technical Manager
   /          /   /                      ESI s.r.l.
 _____/ _____/  _/                      Colombo@ESI.it


pgsql-general by date:

Previous
From: Neil Conway
Date:
Subject: Re: oid wraparound
Next
From: Sean Davis
Date:
Subject: Re: pgadminIII - creating servers