Thread: Simple query showing 270 hours of CPU time

Simple query showing 270 hours of CPU time

From
Dan Harris
Date:
Today, I looked at 'top' on my PG server and saw a pid that reported 270 hours
of CPU time.  Considering this is a very simple query, I was surprised to say
the least.  I was about to just kill the pid, but I figured I'd try and see
exactly what it was stuck doing for so long.

Here's the strace summary as run for a few second sample:

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------
  97.25    0.671629          92      7272           semop
   1.76    0.012171         406        30           recvfrom
   0.57    0.003960          66        60           gettimeofday
   0.36    0.002512          28        90           sendto
   0.05    0.000317          10        32           lseek
   0.01    0.000049           1        48           select
------ ----------- ----------- --------- --------- ----------------
100.00    0.690638                  7532           total

Here's the query:

select id from eventkeywords where word = '00003322'

If I run the query manually, it completes in about 500ms, which is very reasonable.

There are 408563 rows in this table.  I just noticed there is no index on word (
there should be! ).  Would this have caused the problem?

This is 8.0.12

Linux sunrise 2.6.15-26-amd64-server #1 SMP Fri Sep 8 20:33:15 UTC 2006 x86_64
GNU/Linux

Any idea what might have set it into this loop?

-Dan

Re: Simple query showing 270 hours of CPU time

From
Tom Lane
Date:
Dan Harris <fbsd@drivefaster.net> writes:
> Here's the strace summary as run for a few second sample:

> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>   97.25    0.671629          92      7272           semop
>    1.76    0.012171         406        30           recvfrom
>    0.57    0.003960          66        60           gettimeofday
>    0.36    0.002512          28        90           sendto
>    0.05    0.000317          10        32           lseek
>    0.01    0.000049           1        48           select
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.690638                  7532           total

> Here's the query:
> select id from eventkeywords where word = '00003322'

How sure are you that (a) that's really what it's doing and (b) you are
not observing multiple executions of the query?  There are no recvfrom
calls in the inner loops of the backend AFAIR, so this looks to me like
the execution of 30 different queries.  The number of semops is
distressingly high, but that's a contention issue not an
amount-of-runtime issue.  I think you're looking at a backend that has
simply executed one heckuva lot of queries on behalf of its client,
and that inquiring into what the client thinks it's doing might be the
first order of business.

            regards, tom lane

Re: Simple query showing 270 hours of CPU time

From
PFC
Date:
> Today, I looked at 'top' on my PG server and saw a pid that reported 270
> hours of CPU time.  Considering this is a very simple query, I was
> surprised to say the least.  I was about to just kill the pid, but I
> figured I'd try and see exactly what it was stuck doing for so long.

    If you are using connection pooling, or if your client keeps the
connections for a long time, this backend could be very old...
    With PHP's persistent connections, for instance, backends restart when
you restart the webserver, which isn't usually very often.

Re: Simple query showing 270 hours of CPU time

From
Dan Harris
Date:
Tom Lane wrote:
> Dan Harris <fbsd@drivefaster.net> writes:
>> Here's the strace summary as run for a few second sample:
>
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>   97.25    0.671629          92      7272           semop
>>    1.76    0.012171         406        30           recvfrom
>>    0.57    0.003960          66        60           gettimeofday
>>    0.36    0.002512          28        90           sendto
>>    0.05    0.000317          10        32           lseek
>>    0.01    0.000049           1        48           select
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00    0.690638                  7532           total
>
>> Here's the query:
>> select id from eventkeywords where word = '00003322'
>
> How sure are you that (a) that's really what it's doing and (b) you are
> not observing multiple executions of the query?  There are no recvfrom
> calls in the inner loops of the backend AFAIR, so this looks to me like
> the execution of 30 different queries.  The number of semops is
> distressingly high, but that's a contention issue not an
> amount-of-runtime issue.

You were absolutely right.  This is one connection that is doing a whole lot of
( slow ) queries.  I jumped the gun on this and assumed it was a single query
taking this long.  Sorry to waste time and bandwidth.

Since you mentioned the number of semops is distressingly high, does this
indicate a tuning problem?  The machine has 64GB of RAM and as far as I can tell
about 63GB is all cache.  I wonder if this is a clue to an undervalued
memory-related setting somewhere?

-Dan

Re: Simple query showing 270 hours of CPU time

From
Tom Lane
Date:
Dan Harris <fbsd@drivefaster.net> writes:
> Since you mentioned the number of semops is distressingly high, does this
> indicate a tuning problem?

More like an old-version problem.  We've done a lot of work on
concurrent performance since 8.0.x, and most likely you are hitting
one of the bottlenecks that have been solved since then.

            regards, tom lane