Thread: Pgsql taking a *lot* of CPU time (unkillable).
Hello, I'm currently running PostgreSQL 7.4.6 under NetBSD 2.0 (Release), but with a custom kernel. I can start it, and it performs normally, i.e. I can access my databases and such. Now I'm primarily using it with the GNUCash PostgreSQL backend. After I've finished using it, and leaving it to itself for a while, it starts to consume all CPU time for, apparently, no good reason (because it's not doing anything). I started it thusly: /usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/ And the following output appeared: LOG: database system was interrupted at 2005-01-14 13:52:42 CET LOG: checkpoint record is at 0/2329AA0 LOG: redo record is at 0/2329AA0; undo record is at 0/0; shutdown FALSE LOG: next transaction ID: 46264; next OID: 142900 LOG: database system was not properly shut down; automatic recovery in progress LOG: record with zero length at 0/2329AE0 LOG: redo is not required LOG: database system is ready ps auxww | grep pgsql shows: pgsql 15786 94.8 0.3 4380 568 p2 R+ 4:13PM 5:13.13 /usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/ (postgres) pgsql 24309 0.0 0.0 5368 4 p2 IW+ 4:13PM 0:00.01 postmaster: stats buffer process (postgres) pgsql 25177 0.0 0.0 4420 4 p2 IW+ 4:13PM 0:00.01 postmaster: stats collector process (postgres) pgsql 29008 0.0 0.0 0 0 p2 ZW+ - 0:00.00 (postgres) Top gives: 15786 pgsql 64 0 4380K 568K RUN 5:56 93.80% 93.80% postgres Now, the program won't respond to kill, or to a ctrl+c on the command line, I have to kill it with -9. I've tried to run it with a higher debug level, but this does not give any useful information, except for a sequence of (of course, while performing query's with gnucash, a lot of query information is shown, but after quitting it and leaving postmaster to itself only this is shown): DEBUG: proc_exit(0) DEBUG: shmem_exit(0) DEBUG: exit(0) DEBUG: child process (PID 24738) exited with exit code 0 With varying PID's. After a while, this stops, and everything hangs as described above. The only thing to remark is that it does not seem to happen when running with -d 5 (but I'm not really sure). As said, I'm running NetBSD 2.0, with my own kernel, on the i386 platform. I hope this gives someone enough information to make a guess about the cause, although I realise the problem is quite vague. Berteun
Berteun Damman <berteun@gmail.com> writes: > After I've finished using it, and leaving it to itself for a while, it > starts to consume all CPU time for, apparently, no good reason > (because it's not doing anything). Would you attach to the process with a debugger and get a stack trace? $ gdb /usr/pkg/bin/postgres PID-of-process gdb> bt gdb> q Probably should repeat this a few times to get a clear sense of where it's looping. regards, tom lane
On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > Would you attach to the process with a debugger and get a stack trace? > > $ gdb /usr/pkg/bin/postgres PID-of-process > gdb> bt > gdb> q > > Probably should repeat this a few times to get a clear sense of where > it's looping. I think it has a locking problem: #0 0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0 Error accessing memory address 0x483bbb26: Operation not permitted. And the other time: #0 0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0 And again an accessing error. Does this indicate an error in NetBSD's pthreading library? Berteun
Berteun Damman <berteun@gmail.com> writes: > On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Would you attach to the process with a debugger and get a stack trace? > I think it has a locking problem: > #0 0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0 > Error accessing memory address 0x483bbb26: Operation not permitted. > And the other time: > #0 0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0 > And again an accessing error. > Does this indicate an error in NetBSD's pthreading library? Not necessarily --- it just means that gdb is confused and can't find the stacked return addresses :-(. One thing to check is whether you have the most up-to-date available version of gdb. Also, I'd suggest trying it a dozen or two times in hopes of catching it when it's not inside libpthread. Another trick I've sometimes had success with is to kill the process in such a way that it produces a core dump (kill -ABRT should do this), and then gdb the core dump file instead of the live process. gdb seems to handle that a bit differently and sometimes you can get a stack trace one way when you couldn't get it the other way. If none of that works, I'd suggest asking for help from the NetBSD hackers; they may know some special way of finding out the call stack. But we aren't going to be able to get far if we can't figure out what it's doing. regards, tom lane
On Sat, 15 Jan 2005 16:25:34 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote: > You don't need to reproduce the bug from scratch each time. What I > meant was, once it seems to be spinning, repeatedly attach to it with > gdb and see if you can get a backtrace. If not, just quit gdb and try > again. Oh, I was unclear there, the problem is, the process get's killed by gdb (apparently), anyway, it does not run anymore after I've attached gdb. I'll continue again at the NetBSD mailing list. Berteun