Thread: high CPU usage for stats collector in 8.2

high CPU usage for stats collector in 8.2

From
Darcy Buskermolen
Date:
I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
on an 8.2.3 box  investigation has lead me to belive that the stats file is 
written a lot more often that once every 500ms  the following shows this 
behavior.

PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
20060404 (Red Hat 3.4.6-3)

I ran a 
time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
"select 1 from test where msg_id=$i" ; done

which took
real    1m23.288s
user    0m24.142s
sys     0m21.536s


to execute, during which time I ran a strace on the stats collector which 
produces the following output.  From this it looks like the stats file is 
getting rewritten for each connection teardown, not just every 500ms.

Process 10061 attached - interrupt to quit
Process 10061 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------68.14   28.811963          17   1663827
write18.22   7.701885         123     62808     12793 poll11.31    4.783082         365     13101           rename 0.58
  0.246169           5     50006           recvfrom 0.57    0.241073          18     13101           open 0.43
0.182816         14     13101           munmap 0.18    0.076176           6     13101           mmap 0.17    0.072746
       6     13101           close 0.14    0.060483           5     13101           setitimer 0.10    0.041344
3     13101     12793 rt_sigreturn 0.09    0.039240           3     13101           fstat 0.06    0.024041           2
  13110           getppid
 
------ ----------- ----------- --------- --------- ----------------
100.00   42.281018               1894559     25586 total

As you can see rename was called more than the theroitcal 167  times for 500ms 
slices that elapsed during the test

Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
[All be it this is slower hardware..]

time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
"select 1 from test where msg_id=$i" ; done

which took
real    9m25.380s
user    6m51.254s
sys     1m47.687s
(and therefor should be about 1130 stat write cycles)

and yielded the following strace

% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------93.64   20.422006         334     61212
select3.49    0.760963           7    110192           read 1.82    0.396654          19     21128           write 0.64
  0.139679         126      1112           rename 0.27    0.057970          52      1112           open 0.06
0.012177         11      1112           munmap 0.04    0.008901           8      1112           mmap 0.03    0.006402
       6      1112           close 0.02    0.004282           4      1112           fstat
 
------ ----------- ----------- --------- --------- ----------------
100.00   21.809034                199204           total



During this run the stats collector does not even show and CPU usage according 
to top.


both 8.1 and 8.2 have the following postgresql.conf parameters

stats_command_string =  off
stats_start_collector = on
stats_block_level = on
stats_row_level = on



-- 
Darcy Buskermolen
Command Prompt, Inc.
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
PostgreSQL solutions since 1997
http://www.commandprompt.com/


Re: high CPU usage for stats collector in 8.2

From
"Joshua D. Drake"
Date:
Darcy Buskermolen wrote:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
> on an 8.2.3 box  investigation has lead me to belive that the stats file is 
> written a lot more often that once every 500ms  the following shows this 
> behavior.

Any thoughts on the below?

> 
> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
> 20060404 (Red Hat 3.4.6-3)
> 
> I ran a 
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
> "select 1 from test where msg_id=$i" ; done
> 
> which took
> real    1m23.288s
> user    0m24.142s
> sys     0m21.536s
> 
> 
> to execute, during which time I ran a strace on the stats collector which 
> produces the following output.  From this it looks like the stats file is 
> getting rewritten for each connection teardown, not just every 500ms.
> 
> Process 10061 attached - interrupt to quit
> Process 10061 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  68.14   28.811963          17   1663827           write
>  18.22    7.701885         123     62808     12793 poll
>  11.31    4.783082         365     13101           rename
>   0.58    0.246169           5     50006           recvfrom
>   0.57    0.241073          18     13101           open
>   0.43    0.182816          14     13101           munmap
>   0.18    0.076176           6     13101           mmap
>   0.17    0.072746           6     13101           close
>   0.14    0.060483           5     13101           setitimer
>   0.10    0.041344           3     13101     12793 rt_sigreturn
>   0.09    0.039240           3     13101           fstat
>   0.06    0.024041           2     13110           getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   42.281018               1894559     25586 total
> 
> As you can see rename was called more than the theroitcal 167  times for 500ms 
> slices that elapsed during the test
> 
> Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
> [All be it this is slower hardware..]
> 
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
> "select 1 from test where msg_id=$i" ; done
> 
> which took
> real    9m25.380s
> user    6m51.254s
> sys     1m47.687s
> (and therefor should be about 1130 stat write cycles)
> 
> and yielded the following strace
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  93.64   20.422006         334     61212           select
>   3.49    0.760963           7    110192           read
>   1.82    0.396654          19     21128           write
>   0.64    0.139679         126      1112           rename
>   0.27    0.057970          52      1112           open
>   0.06    0.012177          11      1112           munmap
>   0.04    0.008901           8      1112           mmap
>   0.03    0.006402           6      1112           close
>   0.02    0.004282           4      1112           fstat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   21.809034                199204           total
> 
> 
> 
> During this run the stats collector does not even show and CPU usage according 
> to top.
> 
> 
> both 8.1 and 8.2 have the following postgresql.conf parameters
> 
> stats_command_string =  off
> stats_start_collector = on
> stats_block_level = on
> stats_row_level = on
> 
> 
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Possible Bug: high CPU usage for stats collector in 8.2

From
"Joshua D. Drake"
Date:
Darcy Buskermolen wrote:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
> on an 8.2.3 box  investigation has lead me to belive that the stats file is 
> written a lot more often that once every 500ms  the following shows this 
> behavior.
> 
> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
> 20060404 (Red Hat 3.4.6-3)
> 
> I ran a 
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
> "select 1 from test where msg_id=$i" ; done
> 
> which took
> real    1m23.288s
> user    0m24.142s
> sys     0m21.536s
> 
> 
> to execute, during which time I ran a strace on the stats collector which 
> produces the following output.  From this it looks like the stats file is 
> getting rewritten for each connection teardown, not just every 500ms.
> 
> Process 10061 attached - interrupt to quit
> Process 10061 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  68.14   28.811963          17   1663827           write
>  18.22    7.701885         123     62808     12793 poll
>  11.31    4.783082         365     13101           rename
>   0.58    0.246169           5     50006           recvfrom
>   0.57    0.241073          18     13101           open
>   0.43    0.182816          14     13101           munmap
>   0.18    0.076176           6     13101           mmap
>   0.17    0.072746           6     13101           close
>   0.14    0.060483           5     13101           setitimer
>   0.10    0.041344           3     13101     12793 rt_sigreturn
>   0.09    0.039240           3     13101           fstat
>   0.06    0.024041           2     13110           getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   42.281018               1894559     25586 total
> 
> As you can see rename was called more than the theroitcal 167  times for 500ms 
> slices that elapsed during the test
> 
> Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
> [All be it this is slower hardware..]
> 
> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
> "select 1 from test where msg_id=$i" ; done
> 
> which took
> real    9m25.380s
> user    6m51.254s
> sys     1m47.687s
> (and therefor should be about 1130 stat write cycles)
> 
> and yielded the following strace
> 
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  93.64   20.422006         334     61212           select
>   3.49    0.760963           7    110192           read
>   1.82    0.396654          19     21128           write
>   0.64    0.139679         126      1112           rename
>   0.27    0.057970          52      1112           open
>   0.06    0.012177          11      1112           munmap
>   0.04    0.008901           8      1112           mmap
>   0.03    0.006402           6      1112           close
>   0.02    0.004282           4      1112           fstat
> ------ ----------- ----------- --------- --------- ----------------
> 100.00   21.809034                199204           total
> 
> 
> 
> During this run the stats collector does not even show and CPU usage according 
> to top.
> 
> 
> both 8.1 and 8.2 have the following postgresql.conf parameters
> 
> stats_command_string =  off
> stats_start_collector = on
> stats_block_level = on
> stats_row_level = on
> 
> 
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Possible Bug: high CPU usage for stats collector in 8.2

From
"Joshua D. Drake"
Date:
Joshua D. Drake wrote:
> Darcy Buskermolen wrote:
>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats collector 
>> on an 8.2.3 box  investigation has lead me to belive that the stats file is 
>> written a lot more often that once every 500ms  the following shows this 
>> behavior.

I have just done a test separate from darcy on my workstation:


jd@scratch:~/82$ strace -c -p 16130
Process 16130 attached - interrupt to quit
Process 16130 detached
% time     seconds  usecs/call     calls    errors syscall
------ ----------- ----------- --------- --------- ----------------50.00    0.167103           3     60264     10225
poll28.31   0.094600           9     10398           rename16.89    0.056457           5     10398           open 2.48
 0.008278           0     50023           recvfrom 1.14    0.003804           0     10398           setitimer 0.53
0.001781          0     20796           write 0.43    0.001432           0     10398           close 0.21    0.000690
       0     10398           munmap 0.02    0.000057           0     10398           mmap 0.00    0.000000           0
  10398           fstat 0.00    0.000000           0     10398     10225 rt_sigreturn 0.00    0.000000           0
10414          getppid
 
------ ----------- ----------- --------- --------- ----------------
100.00    0.334202                224681     20450 total


Query:

time for i in `bin/psql -p8000 -d postgres -c "select
generate_series(1,10000)"`; do bin/psql -p8000 -d postgres -qc "select 1
from pg_database where datname = 'postgres'"; done;

Time:

real    2m5.077s
user    0m28.414s
sys     0m39.762s


PostgreSQL 8.2.0, Ubuntu Edgy 64bit.

Seems like something is extremely wonky here.

Joshua D. Drake






>>
>> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 
>> 20060404 (Red Hat 3.4.6-3)
>>
>> I ran a 
>> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
>> "select 1 from test where msg_id=$i" ; done
>>
>> which took
>> real    1m23.288s
>> user    0m24.142s
>> sys     0m21.536s
>>
>>
>> to execute, during which time I ran a strace on the stats collector which 
>> produces the following output.  From this it looks like the stats file is 
>> getting rewritten for each connection teardown, not just every 500ms.
>>
>> Process 10061 attached - interrupt to quit
>> Process 10061 detached
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>  68.14   28.811963          17   1663827           write
>>  18.22    7.701885         123     62808     12793 poll
>>  11.31    4.783082         365     13101           rename
>>   0.58    0.246169           5     50006           recvfrom
>>   0.57    0.241073          18     13101           open
>>   0.43    0.182816          14     13101           munmap
>>   0.18    0.076176           6     13101           mmap
>>   0.17    0.072746           6     13101           close
>>   0.14    0.060483           5     13101           setitimer
>>   0.10    0.041344           3     13101     12793 rt_sigreturn
>>   0.09    0.039240           3     13101           fstat
>>   0.06    0.024041           2     13110           getppid
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00   42.281018               1894559     25586 total
>>
>> As you can see rename was called more than the theroitcal 167  times for 500ms 
>> slices that elapsed during the test
>>
>> Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC 
>> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5) 
>> [All be it this is slower hardware..]
>>
>> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql  -qtc 
>> "select 1 from test where msg_id=$i" ; done
>>
>> which took
>> real    9m25.380s
>> user    6m51.254s
>> sys     1m47.687s
>> (and therefor should be about 1130 stat write cycles)
>>
>> and yielded the following strace
>>
>> % time     seconds  usecs/call     calls    errors syscall
>> ------ ----------- ----------- --------- --------- ----------------
>>  93.64   20.422006         334     61212           select
>>   3.49    0.760963           7    110192           read
>>   1.82    0.396654          19     21128           write
>>   0.64    0.139679         126      1112           rename
>>   0.27    0.057970          52      1112           open
>>   0.06    0.012177          11      1112           munmap
>>   0.04    0.008901           8      1112           mmap
>>   0.03    0.006402           6      1112           close
>>   0.02    0.004282           4      1112           fstat
>> ------ ----------- ----------- --------- --------- ----------------
>> 100.00   21.809034                199204           total
>>
>>
>>
>> During this run the stats collector does not even show and CPU usage according 
>> to top.
>>
>>
>> both 8.1 and 8.2 have the following postgresql.conf parameters
>>
>> stats_command_string =  off
>> stats_start_collector = on
>> stats_block_level = on
>> stats_row_level = on
>>
>>
>>
> 
> 


-- 
     === The PostgreSQL Company: Command Prompt, Inc. ===
Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240
Providing the most comprehensive  PostgreSQL solutions since 1997            http://www.commandprompt.com/

Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
PostgreSQL Replication: http://www.commandprompt.com/products/



Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Darcy Buskermolen
Date:
On Wednesday 28 February 2007 14:31, Joshua D. Drake wrote:
> Joshua D. Drake wrote:
> > Darcy Buskermolen wrote:
> >> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> >> collector on an 8.2.3 box  investigation has lead me to belive that the
> >> stats file is written a lot more often that once every 500ms  the
> >> following shows this behavior.

And just so that noone thinks it's linux specific, I tried the same test on a 
FreeBSD box with 8.2 on it and got the following.  Ignore the time portion of 
the results, but what is important is the number of calls occuring in the 
timeframe (given this test took about 38:30, it should have resulted in about 
4620 calls to rename)

% time     seconds                      usecs/call      calls      err syscall
------ ----------- ----------- --------- --------- ----------------
149.23    1170797349.233781   464000015      2519188 2            write57.71     452768432.575458      2036010019
71515      20238  poll17.59     138017435.049569      -1944030463  20766                    getppid
 
-136.71  -1072628999.112811   -1254412922  20317                    rename
-136.71  -1072629000.340448   -1254412960  20317                    close
-136.71  -1072629001.605166   -1254412996  20317                    setitimer
-146.81  -1151850347.801977   -861500230    20316                     open
-146.81  -1151850349.967150   -861500320    20316                     fstat
-226.75  -1779034289.1666093 -639263959    50831                     recvfrom
------ ----------- ----------- --------- --------- ----------------

real    38m29.897s
user    0m40.641s
sys     0m47.487s

....


>
> I have just done a test separate from darcy on my workstation:
>
>
> jd@scratch:~/82$ strace -c -p 16130
> Process 16130 attached - interrupt to quit
> Process 16130 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  50.00    0.167103           3     60264     10225 poll
>  28.31    0.094600           9     10398           rename
>  16.89    0.056457           5     10398           open
>   2.48    0.008278           0     50023           recvfrom
>   1.14    0.003804           0     10398           setitimer
>   0.53    0.001781           0     20796           write
>   0.43    0.001432           0     10398           close
>   0.21    0.000690           0     10398           munmap
>   0.02    0.000057           0     10398           mmap
>   0.00    0.000000           0     10398           fstat
>   0.00    0.000000           0     10398     10225 rt_sigreturn
>   0.00    0.000000           0     10414           getppid
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.334202                224681     20450 total
>
>
> Query:
>
> time for i in `bin/psql -p8000 -d postgres -c "select
> generate_series(1,10000)"`; do bin/psql -p8000 -d postgres -qc "select 1
> from pg_database where datname = 'postgres'"; done;
>
> Time:
>
> real    2m5.077s
> user    0m28.414s
> sys     0m39.762s
>
>
> PostgreSQL 8.2.0, Ubuntu Edgy 64bit.
>
> Seems like something is extremely wonky here.
>
> Joshua D. Drake
>
> >> PostgreSQL 8.2.3 on x86_64-redhat-linux-gnu, compiled by GCC gcc (GCC)
> >> 3.4.6 20060404 (Red Hat 3.4.6-3)
> >>
> >> I ran a
> >> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql 
> >> -qtc "select 1 from test where msg_id=$i" ; done
> >>
> >> which took
> >> real    1m23.288s
> >> user    0m24.142s
> >> sys     0m21.536s
> >>
> >>
> >> to execute, during which time I ran a strace on the stats collector
> >> which produces the following output.  From this it looks like the stats
> >> file is getting rewritten for each connection teardown, not just every
> >> 500ms.
> >>
> >> Process 10061 attached - interrupt to quit
> >> Process 10061 detached
> >> % time     seconds  usecs/call     calls    errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >>  68.14   28.811963          17   1663827           write
> >>  18.22    7.701885         123     62808     12793 poll
> >>  11.31    4.783082         365     13101           rename
> >>   0.58    0.246169           5     50006           recvfrom
> >>   0.57    0.241073          18     13101           open
> >>   0.43    0.182816          14     13101           munmap
> >>   0.18    0.076176           6     13101           mmap
> >>   0.17    0.072746           6     13101           close
> >>   0.14    0.060483           5     13101           setitimer
> >>   0.10    0.041344           3     13101     12793 rt_sigreturn
> >>   0.09    0.039240           3     13101           fstat
> >>   0.06    0.024041           2     13110           getppid
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00   42.281018               1894559     25586 total
> >>
> >> As you can see rename was called more than the theroitcal 167  times for
> >> 500ms slices that elapsed during the test
> >>
> >> Compared to  PostgreSQL 8.1.8 on x86_64-pc-linux-gnu, compiled by GCC
> >> gcc-4.0.gcc-opt (GCC) 4.0.3 (Ubuntu 4.0.3-1ubuntu5)
> >> [All be it this is slower hardware..]
> >>
> >> time for i in `psql -qtc "select generate_series(1,10000)"`; do psql 
> >> -qtc "select 1 from test where msg_id=$i" ; done
> >>
> >> which took
> >> real    9m25.380s
> >> user    6m51.254s
> >> sys     1m47.687s
> >> (and therefor should be about 1130 stat write cycles)
> >>
> >> and yielded the following strace
> >>
> >> % time     seconds  usecs/call     calls    errors syscall
> >> ------ ----------- ----------- --------- --------- ----------------
> >>  93.64   20.422006         334     61212           select
> >>   3.49    0.760963           7    110192           read
> >>   1.82    0.396654          19     21128           write
> >>   0.64    0.139679         126      1112           rename
> >>   0.27    0.057970          52      1112           open
> >>   0.06    0.012177          11      1112           munmap
> >>   0.04    0.008901           8      1112           mmap
> >>   0.03    0.006402           6      1112           close
> >>   0.02    0.004282           4      1112           fstat
> >> ------ ----------- ----------- --------- --------- ----------------
> >> 100.00   21.809034                199204           total
> >>
> >>
> >>
> >> During this run the stats collector does not even show and CPU usage
> >> according to top.
> >>
> >>
> >> both 8.1 and 8.2 have the following postgresql.conf parameters
> >>
> >> stats_command_string =  off
> >> stats_start_collector = on
> >> stats_block_level = on
> >> stats_row_level = on

-- 


Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Tom Lane
Date:
Darcy Buskermolen <darcyb@commandprompt.com> writes:
> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> collector on an 8.2.3 box  investigation has lead me to belive that the
> stats file is written a lot more often that once every 500ms  the
> following shows this behavior.

Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
and 8.2 ... it's waiting 500 microseconds, not the intended 500
milliseconds.
        regards, tom lane


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Darcy Buskermolen <darcyb@commandprompt.com> writes:
>   
>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
>> collector on an 8.2.3 box  investigation has lead me to belive that the
>> stats file is written a lot more often that once every 500ms  the
>> following shows this behavior.
>>     
>
> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
> and 8.2 ... it's waiting 500 microseconds, not the intended 500
> milliseconds.
>   

Good catch. I am also a bit dubious about this code:
       input_fd.fd = pgStatSock;       input_fd.events = POLLIN | POLLERR;       input_fd.revents = 0;
       if (poll(&input_fd, 1, PGSTAT_SELECT_TIMEOUT * 1000) < 0)       {           if (errno == EINTR)
continue;          ereport(ERROR,                   (errcode_for_socket_access(),                    errmsg("poll()
failedin statistics collector: %m")));       }
 
       got_data = (input_fd.revents != 0);


AIUI you are not supposed to put POLLERR in the events field. We should 
probably be setting POLLIN | POLLPRI,  and we should also probably check 
exactly what event was returned in revents.

cheers

andrew





Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> Good catch. I am also a bit dubious about this code:

>         input_fd.fd = pgStatSock;
>         input_fd.events = POLLIN | POLLERR;
>         input_fd.revents = 0;

Hm.  The Single Unix Spec saith that POLLERR is ignored in the events
field, but it is not clear to me that older systems might not treat it
as a condition bit.  For instance on HPUX the poll man page says only
    The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in    revents if the conditions they indicate are
truefor the specified    file descriptor, whether or not these flags are set in events.
 

> AIUI you are not supposed to put POLLERR in the events field. We should 
> probably be setting POLLIN | POLLPRI,  and we should also probably check 
> exactly what event was returned in revents.

We don't need to check what was returned because the action is the same
either way: do a recv().  I'm not seeing the point of setting POLLPRI.
        regards, tom lane


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Andrew Dunstan
Date:
Tom Lane wrote:
> Andrew Dunstan <andrew@dunslane.net> writes:
>   
>> Good catch. I am also a bit dubious about this code:
>>     
>
>   
>>         input_fd.fd = pgStatSock;
>>         input_fd.events = POLLIN | POLLERR;
>>         input_fd.revents = 0;
>>     
>
> Hm.  The Single Unix Spec saith that POLLERR is ignored in the events
> field, but it is not clear to me that older systems might not treat it
> as a condition bit.  For instance on HPUX the poll man page says only
>
>      The condition flags POLLERR, POLLHUP, and POLLNVAL are always set in
>      revents if the conditions they indicate are true for the specified
>      file descriptor, whether or not these flags are set in events.
>   


yeah.

http://www.opengroup.org/onlinepubs/009695399/functions/poll.html says:

"An error has occurred on the device or stream. This flag is only valid 
in the /revents/ bitmask; it shall be ignored in the /events/ member."

>   
>> AIUI you are not supposed to put POLLERR in the events field. We should 
>> probably be setting POLLIN | POLLPRI,  and we should also probably check 
>> exactly what event was returned in revents.
>>     
>
> We don't need to check what was returned because the action is the same
> either way: do a recv().  I'm not seeing the point of setting POLLPRI.
>
>             
>   

Maybe none in this case - I guess we're not sending priority data.

This might all be OK - I just noticed it as I was looking at the 
problem, so I though I'd mention it. I have no doubt your fix is the 
correct one, though.


cheers

andrew



Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Tom Lane
Date:
Andrew Dunstan <andrew@dunslane.net> writes:
> This might all be OK - I just noticed it as I was looking at the 
> problem, so I though I'd mention it.

I'm inclined to leave that code alone unless someone can point to a
platform where setting POLLERR in events actually causes a problem.
The pgstat code was modeled on libpq, which has been passing POLLERR
ever since its poll() support was written (for 7.4), so I think we
have lots of evidence that this way works everywhere, and none that
the other one does.
        regards, tom lane


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Darcy Buskermolen
Date:
On Thursday 01 March 2007 11:57, Tom Lane wrote:
> Darcy Buskermolen <darcyb@commandprompt.com> writes:
> > I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
> > collector on an 8.2.3 box  investigation has lead me to belive that the
> > stats file is written a lot more often that once every 500ms  the
> > following shows this behavior.
>
> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
> and 8.2 ... it's waiting 500 microseconds, not the intended 500
> milliseconds.

I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.

The stats collector CPU usage has dropped from inexcess of 95% to 5%

Thanks.

>
>             regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

-- 


Darcy Buskermolen
The PostgreSQL company, Command Prompt Inc.
http://www.commandprompt.com/


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Larry Rosenman
Date:
On Fri, 2 Mar 2007, Darcy Buskermolen wrote:

> On Thursday 01 March 2007 11:57, Tom Lane wrote:
>> Darcy Buskermolen <darcyb@commandprompt.com> writes:
>>> I'm observing high CPU usage (95%) of a 2.6GHz opteron by the stats
>>> collector on an 8.2.3 box  investigation has lead me to belive that the
>>> stats file is written a lot more often that once every 500ms  the
>>> following shows this behavior.
>>
>> Looks like someone broke the usage of PGSTAT_STAT_INTERVAL between 8.1
>> and 8.2 ... it's waiting 500 microseconds, not the intended 500
>> milliseconds.
>
> I can confirm that rev 1.140.2.3 of pgstat.c does fix this issue.
>
> The stats collector CPU usage has dropped from inexcess of 95% to 5%
>
> Thanks.
>
Any guess on when we'd see an 8.2.4?  I have a business reason for asking.

Thanks!

-- 
Larry Rosenman                     http://www.lerctr.org/~ler
Phone: +1 512-248-2683                 E-Mail: ler@lerctr.org
US Mail: 430 Valona Loop, Round Rock, TX 78681-3893


Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Jim Nasby
Date:
On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
> Any guess on when we'd see an 8.2.4?  I have a business reason for  
> asking.

If history is any guide, probably when a 'reasonably large fix' to  
something comes around, though reviewing early 8.1 release history  
might prove enlightening...
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)




Re: Possible Bug: high CPU usage for stats collector in 8.2

From
Tom Lane
Date:
Jim Nasby <decibel@decibel.org> writes:
> On Mar 2, 2007, at 4:14 PM, Larry Rosenman wrote:
>> Any guess on when we'd see an 8.2.4?  I have a business reason for  
>> asking.

> If history is any guide, probably when a 'reasonably large fix' to  
> something comes around,

Well, what with Bruce on vacation for the next ten days, and tremendous
pressure to review various people's random 8.3 patches, you should not
expect to see any new back-branch releases in March unless the sky
starts falling on us.  I'm not real happy about that, because we've
accumulated several moderately significant bug fixes in the 8.2 branch
since 8.2.3, but I don't currently see a sufficient case for expending
a day on a new release...
        regards, tom lane