Thread: postgres processes spending most of their time in the kernel
I have a moderately loaded postgres server running 7.2beta4 (i wanted to try out the live vacuum) that turns out to spend the majority of its cpu time in kernel land. With only a handful of running processes, postgres induces tens of thousands of context switches per second. Practically the only thing postgres does with all this CPU time is semop() in a tight loop. Here is a snippet of strace: [pid 11410] 0.000064 <... semop resumed> ) = 0 [pid 11409] 0.000020 <... semop resumed> ) = 0 [pid 11410] 0.000024 semop(1179648, 0xbfffe658, 1 <unfinished ...> [pid 11409] 0.000027 semop(1179648, 0xbfffe488, 1 <unfinished ...> [pid 11407] 0.000027 semop(1179648, 0xbfffe8b8, 1 <unfinished ...> [pid 11409] 0.000022 <... semop resumed> ) = 0 [pid 11406] 0.000018 <... semop resumed> ) = 0 [pid 11409] 0.000023 semop(1179648, 0xbfffe468, 1 <unfinished ...> [pid 11406] 0.000026 semop(1179648, 0xbfffe958, 1) = 0 [pid 11406] 0.000057 semop(1179648, 0xbfffe9f8, 1 <unfinished ...> [pid 11408] 0.000037 <... semop resumed> ) = 0 [pid 11408] 0.000029 semop(1179648, 0xbfffe4d8, 1) = 0 [pid 11411] 0.000038 <... semop resumed> ) = 0 [pid 11408] 0.000023 semop(1179648, 0xbfffe4d8, 1 <unfinished ...> [pid 11411] 0.000026 semop(1179648, 0xbfffe498, 1) = 0 [pid 11407] 0.000040 <... semop resumed> ) = 0 [pid 11411] 0.000024 semop(1179648, 0xbfffe658, 1 <unfinished ...> [pid 11407] 0.000027 semop(1179648, 0xbfffe8a8, 1) = 0 [pid 11410] 0.000038 <... semop resumed> ) = 0 [pid 11407] 0.000024 semop(1179648, 0xbfffe918, 1 <unfinished ...> [pid 11410] 0.000026 semop(1179648, 0xbfffe618, 1) = 0 [pid 11410] 0.000058 semop(1179648, 0xbfffe6a8, 1 <unfinished ...> [pid 11409] 0.000024 <... semop resumed> ) = 0 [pid 11409] 1.214166 semop(1179648, 0xbfffe428, 1) = 0 [pid 11406] 0.000063 <... semop resumed> ) = 0 [pid 11406] 0.000031 semop(1179648, 0xbfffe9f8, 1) = 0 [pid 11406] 0.000051 semop(1179648, 0xbfffe8f8, 1 <unfinished ...> Performance on this database kind of sucks. Since there is little or no block I/O, I assume this is because postgres is wasting its CPU allocations. Does anyone else see this? Is there a config option to tune the locking behavior? Any other workarounds? The machine is a 2-way x86 running Linux 2.4. I brought this up on linux-kernel and they don't seem to think it is the scheduler's problem. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: > I have a moderately loaded postgres server running 7.2beta4 (i wanted to > try out the live vacuum) that turns out to spend the majority of its cpu > time in kernel land. With only a handful of running processes, postgres > induces tens of thousands of context switches per second. Practically the > only thing postgres does with all this CPU time is semop() in a tight > loop. It sounds like you have a build that's using SysV semaphores in place of test-and-set instructions. That should not happen on x86 hardware, since we have assembly TAS code for x86. Please look at your port header file (src/include/pg_config_os.h symlink) and src/include/storage/s_lock.h to figure out why it's misbuilt. regards, tom lane
On Fri, 28 Dec 2001, Tom Lane wrote: > "Jeffrey W. Baker" <jwbaker@acm.org> writes: > > I have a moderately loaded postgres server running 7.2beta4 (i wanted to > > try out the live vacuum) that turns out to spend the majority of its cpu > > time in kernel land. With only a handful of running processes, postgres > > induces tens of thousands of context switches per second. Practically the > > only thing postgres does with all this CPU time is semop() in a tight > > loop. > > It sounds like you have a build that's using SysV semaphores in place of > test-and-set instructions. That should not happen on x86 hardware, > since we have assembly TAS code for x86. Please look at your port > header file (src/include/pg_config_os.h symlink) and > src/include/storage/s_lock.h to figure out why it's misbuilt. Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time. I'm using GCC on i386 so I don't see how that is possible. It should be safe for me to simply define these two things in pg_config.h, I suspect. -jwb
"Jeffrey W. Baker" <jwbaker@acm.org> writes: >> It sounds like you have a build that's using SysV semaphores in place of >> test-and-set instructions. That should not happen on x86 hardware, >> since we have assembly TAS code for x86. Please look at your port >> header file (src/include/pg_config_os.h symlink) and >> src/include/storage/s_lock.h to figure out why it's misbuilt. > Well, it seems that one of __i386__ or __GNUC__ isn't set at compile time. > I'm using GCC on i386 so I don't see how that is possible. I don't either. > It should be > safe for me to simply define these two things in pg_config.h, I suspect. That is not a solution. If it's broken for you then it's likely to be broken for other people. We need to figure out what went wrong and provide a permanent fix. What gcc version are you running, exactly, and what symbols does it predefine? (I seem to recall that there's a way to find that out, though I'm not recalling how at the moment. Anyone?) regards, tom lane
> What gcc version are you running, exactly, and what symbols does it > predefine? (I seem to recall that there's a way to find that out, > though I'm not recalling how at the moment. Anyone?) pgsql/src/tools/ccsym shows compiler symbols. -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 853-3000 + If your life is a hard drive, | 830 Blythe Avenue + Christ can be your backup. | Drexel Hill, Pennsylvania 19026