Thread: Pgsql taking a *lot* of CPU time (unkillable).

Pgsql taking a *lot* of CPU time (unkillable).

From
Berteun Damman
Date:
Hello,

I'm currently running PostgreSQL 7.4.6 under NetBSD 2.0 (Release), but
with a custom kernel. I can start it, and it performs normally, i.e. I
can access my databases and such. Now I'm primarily using it with the
GNUCash PostgreSQL backend.

After I've finished using it, and leaving it to itself for a while, it
starts to consume all CPU time for, apparently, no good reason
(because it's not doing anything).

I started it thusly:
/usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/

And the following output appeared:

LOG:  database system was interrupted at 2005-01-14 13:52:42 CET
LOG:  checkpoint record is at 0/2329AA0
LOG:  redo record is at 0/2329AA0; undo record is at 0/0; shutdown FALSE
LOG:  next transaction ID: 46264; next OID: 142900
LOG:  database system was not properly shut down; automatic recovery in progress
LOG:  record with zero length at 0/2329AE0
LOG:  redo is not required
LOG:  database system is ready

ps auxww | grep pgsql shows:
pgsql    15786 94.8  0.3  4380   568 p2 R+    4:13PM   5:13.13
/usr/pkg/bin/postmaster -i -D /usr/pkg/pgsql/data/ (postgres)
pgsql    24309  0.0  0.0  5368     4 p2 IW+   4:13PM   0:00.01
postmaster: stats buffer process    (postgres)
pgsql    25177  0.0  0.0  4420     4 p2 IW+   4:13PM   0:00.01
postmaster: stats collector process    (postgres)
pgsql    29008  0.0  0.0     0     0 p2 ZW+        -   0:00.00 (postgres)

Top gives:
15786 pgsql     64    0  4380K  568K RUN        5:56 93.80% 93.80% postgres

Now, the program won't respond to kill, or to a ctrl+c on the command
line, I have to kill it with -9.

I've tried to run it with a higher debug level, but this does not give
any useful information, except for a sequence of (of course, while
performing query's with gnucash, a lot of query information is shown,
but after quitting it and leaving postmaster to itself only this is
shown):

DEBUG:  proc_exit(0)
DEBUG:  shmem_exit(0)
DEBUG:  exit(0)
DEBUG:  child process (PID 24738) exited with exit code 0

With varying PID's. After a while, this stops, and everything hangs as
described above. The only thing to remark is that it does not seem to
happen when running with -d 5 (but I'm not really sure).

As said, I'm running NetBSD 2.0, with my own kernel, on the i386
platform. I hope this gives someone enough information to make a guess
about the cause, although I realise the problem is quite vague.

Berteun

Re: Pgsql taking a *lot* of CPU time (unkillable).

From
Tom Lane
Date:
Berteun Damman <berteun@gmail.com> writes:
> After I've finished using it, and leaving it to itself for a while, it
> starts to consume all CPU time for, apparently, no good reason
> (because it's not doing anything).

Would you attach to the process with a debugger and get a stack trace?

    $ gdb /usr/pkg/bin/postgres PID-of-process
    gdb> bt
    gdb> q

Probably should repeat this a few times to get a clear sense of where
it's looping.

            regards, tom lane

Re: Pgsql taking a *lot* of CPU time (unkillable).

From
Berteun Damman
Date:
On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> Would you attach to the process with a debugger and get a stack trace?
>
>         $ gdb /usr/pkg/bin/postgres PID-of-process
>         gdb> bt
>         gdb> q
>
> Probably should repeat this a few times to get a clear sense of where
> it's looping.

I think it has a locking problem:
#0  0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
Error accessing memory address 0x483bbb26: Operation not permitted.

And the other time:
#0  0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
And again an accessing error.

Does this indicate an error in NetBSD's pthreading library?

Berteun

Re: Pgsql taking a *lot* of CPU time (unkillable).

From
Tom Lane
Date:
Berteun Damman <berteun@gmail.com> writes:
> On Sat, 15 Jan 2005 13:15:36 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Would you attach to the process with a debugger and get a stack trace?

> I think it has a locking problem:
> #0  0x483bbb2e in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
> Error accessing memory address 0x483bbb26: Operation not permitted.

> And the other time:
> #0  0x483bbb31 in pthread__lock_ras_end () from /usr/lib/libpthread.so.0
> And again an accessing error.

> Does this indicate an error in NetBSD's pthreading library?

Not necessarily --- it just means that gdb is confused and can't find
the stacked return addresses :-(.  One thing to check is whether you
have the most up-to-date available version of gdb.  Also, I'd suggest
trying it a dozen or two times in hopes of catching it when it's not
inside libpthread.

Another trick I've sometimes had success with is to kill the process in
such a way that it produces a core dump (kill -ABRT should do this),
and then gdb the core dump file instead of the live process.  gdb seems
to handle that a bit differently and sometimes you can get a stack trace
one way when you couldn't get it the other way.

If none of that works, I'd suggest asking for help from the NetBSD
hackers; they may know some special way of finding out the call stack.
But we aren't going to be able to get far if we can't figure out what
it's doing.

            regards, tom lane

Re: Pgsql taking a *lot* of CPU time (unkillable).

From
Berteun Damman
Date:
On Sat, 15 Jan 2005 16:25:34 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> You don't need to reproduce the bug from scratch each time.  What I
> meant was, once it seems to be spinning, repeatedly attach to it with
> gdb and see if you can get a backtrace.  If not, just quit gdb and try
> again.

Oh, I was unclear there, the problem is, the process get's killed by
gdb (apparently), anyway, it does not run anymore after I've attached
gdb.

I'll continue again at the NetBSD mailing list.

Berteun