Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions |
Date | |
Msg-id | CACjxUsO16g4kZzikRxMXmYhMMQUa93ZEBTEXHiKDvR2mjonfDw@mail.gmail.com Whole thread Raw |
In response to | Re: [HACKERS] Re: [GSOC 17] Eliminate O(N^2) scaling fromrw-conflict tracking in serializable transactions ("Mengxing Liu" <liu-mx15@mails.tsinghua.edu.cn>) |
List | pgsql-hackers |
On Wed, Mar 15, 2017 at 11:35 AM, Mengxing Liu <liu-mx15@mails.tsinghua.edu.cn> wrote: >> On a NUMA machine It is not at all unusual to see bifurcated results >> -- with each run coming in very close to one number or a second >> number, often at about a 50/50 rate, with no numbers falling >> anywhere else. This seems to be based on where the processes and >> memory allocations happen to land. >> > > Do you mean that for a NUMA machine, there usually exists two > different results of its performance? > Just two? Neither three nor four? In my personal experience, I have often seen two timings that each run randomly matched; I have not seen nor heard of more, but that doesn't mean it can't happen. ;-) > At first, I will compile and install PostgreSQL by myself and try > the profile tools (perf or oprofile). perf is newer, and generally better if you can use it. Don't try to use either on HP hardware -- the BIOS uses some of the same hardware registers that other manufacturers leave for use of profilers; an HP machine is likely to freeze or reboot if you try to run either of those profilers under load. > Then I will run one or two benchmarks using different config, > where I may need your help to ensure that my tests are close to the > practical situation. Yeah, we should talk about OS and PostgreSQL configuration before you run any benchmarks. Neither tends to come configured as I would run a production system. > PS: Disable NUMA in BIOS means that CPU can use its own memory > controller when accessing local memory to reduce hops. NUMA means that each CPU chip directly controls some of the RAM (possibly with other, non-CPU controllers for some RAM). The question is whether the BIOS or the OS controls the memory allocation. The OS looks at what processes are on what cores and tries to use "nearby" memory for allocations. This can be pessimal if the amount of RAM that is under contention is less than the size of one memory segment, since all CPU chips need to ask the one managing that RAM for each access. In such a case, you actually get best performance using a cpuset which just uses one CPU package and the memory segments directly managed by that CPU package. Without the cpuset you may actually see better performance for this workload by letting the BIOS interleave allocations, which spreads the RAM allocations around to memory managed by all CPUs, and no one CPU becomes the bottleneck. The access is still not uniform, but you dodge a bottleneck -- albeit less efficiently than using a custom cpuset. > On the contrary, "enable" means UMA. No. Think about this: regardless of this BIOS setting each RAM package is directly connected to one CPU package, which functions as its memory controller. Most of the RAM used by PostgreSQL is for disk buffers -- shared buffers in shared memory and OS cache. Picture processes running on different CPU packages accessing a single particular shared buffer. Also picture processes running on different CPU packages using the same spinlock at the same time. No BIOS setting can avoid the communications and data transfer among the 8 CPU packages, and the locking of the cache lines. > Therefore, I think Tony is right, we should disable this setting. > > I got the information from here. > http://frankdenneman.nl/2010/12/28/node-interleaving-enable-or-disable/ Ah, that explains it. There is no such thing as "disabling NUMA" -- you can have the BIOS force interleaving, or you can have the BIOS leave the NUMA memory assignment to the OS. I was assuming that by "disabling NUMA" you meant to have BIOS control it through interleaving. You meant the opposite -- disabling the BIOS override of OS NUMA control. I agree that we should leave NUMA scheduling to the OS. There are, however, some non-standard OS configuration options that allow NUMA to behave better with PostgreSQL than the defaults allow. We will need to tune a little. The author of that article seems to be assuming that the usage will be with applications like word processing, spreadsheets, or browsers -- where the OS can place all the related processes on a single CPU package and all (or nearly all) memory allocations can be made from associated memory -- yielding a fairly uniform and fast access when the BIOS override is disabled. On a database product which wants to use all the cores and almost all of the memory, with heavy contention on shared memory, the situation is very different. Shared resource access time is going to be non-uniform no matter what you do. The difference is that the OS can still make *process local* allocations from nearby memory segments, while BIOS cannot. -- Kevin Grittner
pgsql-hackers by date: