Thread: Stats collector on rampage (8.2.3)

Stats collector on rampage (8.2.3)

From
Gaetano Mendola
Date:
-----BEGIN PGP SIGNED MESSAGE-----
Hash: SHA1

Hi all,
it seems that the stats collector on my box is using more CPU than
it did in the past.

This is what I'm observing:

CPU usage for the stat process: 25% flat

$ psql -c "select version()"                                               version
- -------------------------------------------------------------------------------------------------------PostgreSQL
8.2.3on i686-redhat-linux-gnu, compiled by GCC gcc (GCC) 3.4.6 20060404 (Red Hat 3.4.6-3)
 
(1 row)


$ strace -tt -p 10773
[...]
09:47:37.867655 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.867738 write(3, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 823) = 823
09:47:37.867820 close(3)                = 0
09:47:37.867862 munmap(0xb7ced000, 4096) = 0
09:47:37.867906 rename("global/pgstat.tmp", "global/pgstat.stat") = 0
09:47:37.868188 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868245 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.868317 setitimer(ITIMER_REAL, {it_interval={0, 0}, it_value={0, 500}}, NULL) = 0
09:47:37.868372 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868428 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0g\n\0"..., 1000, 0) = 976
09:47:37.868501 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868559 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\0\0\0\0i\n\0"..., 1000, 0) = 976
09:47:37.868629 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868687 recv(7, "\1\0\0\0\4\3\0\0\rg\0\0\v\0\0\0\0\0\0\0\0\0\0\0^\n\0\0"..., 1000, 0) = 772
09:47:37.868757 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868815 recv(7, "\1\0\0\0\240\0\0\0\0\0\0\0\2\0\0\0\0\0\0\0\0\0\0\0\354"..., 1000, 0) = 160
09:47:37.868886 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.868944 recv(7, "\1\0\0\0\240\0\0\0\rg\0\0\2\0\0\0\1\0\0\0\0\0\0\0<\n\0"..., 1000, 0) = 160
09:47:37.869012 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869070 recv(7, "\1\0\0\0l\1\0\0\0\0\0\0\5\0\0\0\0\0\0\0\0\0\0\0\354\4\0"..., 1000, 0) = 364
09:47:37.869141 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869198 recv(7, "\1\0\0\0\300\2\0\0\rg\0\0\n\0\0\0\1\0\0\0\0\0\0\0007\n"..., 1000, 0) = 704
09:47:37.869267 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869329 recv(7, "\1\0\0\0(\1\0\0\rg\0\0\4\0\0\0\1\0\0\0\0\0\0\0/\n\0\0\0"..., 1000, 0) = 296
09:47:37.869398 poll([{fd=7, events=POLLIN|POLLERR, revents=POLLIN}], 1, 2000) = 1
09:47:37.869456 recv(7, "\1\0\0\0\320\3\0\0\rg\0\0\16\0\0\0\0\0\0\0\1\0\0\0\347"..., 1000, 0) = 976
09:47:37.869524 --- SIGALRM (Alarm clock) @ 0 (0) ---
09:47:37.869575 sigreturn()             = ? (mask now [])
09:47:37.869659 getppid()               = 10768
09:47:37.869702 open("global/pgstat.tmp", O_WRONLY|O_CREAT|O_TRUNC|O_LARGEFILE, 0666) = 3
09:47:37.869775 fstat64(3, {st_mode=S_IFREG|0600, st_size=0, ...}) = 0
09:47:37.869871 mmap2(NULL, 4096, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0xb7ced000
09:47:37.869928 write(3, "\226\274\245\1D\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
09:47:37.870252 write(3, ";\n\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096
[...]

and doing the statistics on the system calls:


$ time strace -c -p 10773
Process 10773 attached - interrupt to quit
Process 10773 detached
% time     seconds  usecs/call     calls    errors syscall
- ------ ----------- ----------- --------- --------- ----------------62.70    4.980721          16    307851
write25.72   2.043299         156     13058      3039 poll 9.61    0.763046         248      3078           rename 0.64
  0.050992          17      3079           open 0.49    0.038819           4     10019           recv 0.26    0.020469
        7      3078           munmap 0.13    0.010344           3      3078           close 0.12    0.009425
3     3079           mmap2 0.11    0.008353           3      3079           setitimer 0.09    0.007114           2
3079     3039 sigreturn 0.07    0.005923           2      3079           fstat64 0.06    0.004734           2      3079
         getppid
 
- ------ ----------- ----------- --------- --------- ----------------
100.00    7.943239                358636      6078 total

real    0m16.313s
user    0m1.428s
sys     0m3.802s


so instead of 32 or such rename it did the rename 3K times.

To solve the problem is it possible to kill that process? (will it be respawned?)


Regards
Gaetano Mendola











-----BEGIN PGP SIGNATURE-----
Version: GnuPG v1.4.6 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD8DBQFHJZ9I7UpzwH2SGd4RAqwrAJ9vyt5fd1rdEu+uTnef6QpFYVBbhwCePB69
b0jA7Ko85TyEfMqAmVVRy/w=
=ax+l
-----END PGP SIGNATURE-----


Re: Stats collector on rampage (8.2.3)

From
Magnus Hagander
Date:
Gaetano Mendola wrote:
> Hi all,
> it seems that the stats collector on my box is using more CPU than
> it did in the past.

This is a known bug that was fixed in 8.2.4, so you need to upgrade.

//Magnus



Re: Stats collector on rampage (8.2.3)

From
hubert depesz lubaczewski
Date:
On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
> it seems that the stats collector on my box is using more CPU than
> it did in the past.

it's well known bug, and it was fixed in 8.2.4:
http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
...
Prevent the statistics collector from writing to disk too frequently
(Tom)
...

best regards,

depesz

-- 
quicksil1er: "postgres is excellent, but like any DB it requires a
highly paid DBA.  here's my CV!" :)
http://www.depesz.com/ - blog dla ciebie (i moje CV)


Re: Stats collector on rampage (8.2.3)

From
Gaetano Mendola
Date:
hubert depesz lubaczewski wrote:
> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>> it seems that the stats collector on my box is using more CPU than
>> it did in the past.
> 
> it's well known bug, and it was fixed in 8.2.4:
> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
> ...
> Prevent the statistics collector from writing to disk too frequently
> (Tom)
> ...

I saw that, upgrading the DB at this very moment is not doable, killing
that process will the postmaster respawn another one? BTW I discover that
it was triggered by the time change due the daylight saving.

Regards
Gaetano Mendola


Re: Stats collector on rampage (8.2.3)

From
Gaetano Mendola
Date:
hubert depesz lubaczewski wrote:
> > On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>> >> it seems that the stats collector on my box is using more CPU than
>> >> it did in the past.
> >
> > it's well known bug, and it was fixed in 8.2.4:
> > http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
> > ...
> > Prevent the statistics collector from writing to disk too frequently
> > (Tom)
> > ...

I saw that, upgrading the DB at this very moment is not doable, killing
that process will the postmaster respawn another one? BTW I discover that
it was triggered by the time change due the daylight saving.

Regards
Gaetano Mendola


Re: Stats collector on rampage (8.2.3)

From
Magnus Hagander
Date:
Gaetano Mendola wrote:
> hubert depesz lubaczewski wrote:
>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>>> it seems that the stats collector on my box is using more CPU than
>>> it did in the past.
>> it's well known bug, and it was fixed in 8.2.4:
>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
>> ...
>> Prevent the statistics collector from writing to disk too frequently
>> (Tom)
>> ...
> 
> I saw that, upgrading the DB at this very moment is not doable, killing
> that process will the postmaster respawn another one? BTW I discover that
> it was triggered by the time change due the daylight saving.

IIRC, it will. You need to change postgresql.conf and disable the stats
collector. If you do that, it won't be started.

Shouldn't be trigged by DST.

//Magnus


Re: Stats collector on rampage (8.2.3)

From
Andrew Dunstan
Date:

Gaetano Mendola wrote:
> hubert depesz lubaczewski wrote:
>   
>>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>>>       
>>>>> it seems that the stats collector on my box is using more CPU than
>>>>> it did in the past.
>>>>>           
>>> it's well known bug, and it was fixed in 8.2.4:
>>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
>>> ...
>>> Prevent the statistics collector from writing to disk too frequently
>>> (Tom)
>>> ...
>>>       
>
> I saw that, upgrading the DB at this very moment is not doable, killing
> that process will the postmaster respawn another one? BTW I discover that
> it was triggered by the time change due the daylight saving.
>
>
>   

you do realize that this upgrade wouldn't require a dump/restore, don't 
you? It would be close to instantaneous.

cheers

andrew


Re: Stats collector on rampage (8.2.3)

From
Gaetano Mendola
Date:
Magnus Hagander wrote:
> Gaetano Mendola wrote:
>> hubert depesz lubaczewski wrote:
>>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>>>> it seems that the stats collector on my box is using more CPU than
>>>> it did in the past.
>>> it's well known bug, and it was fixed in 8.2.4:
>>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
>>> ...
>>> Prevent the statistics collector from writing to disk too frequently
>>> (Tom)
>>> ...
>> I saw that, upgrading the DB at this very moment is not doable, killing
>> that process will the postmaster respawn another one? BTW I discover that
>> it was triggered by the time change due the daylight saving.
> 
> IIRC, it will. You need to change postgresql.conf and disable the stats
> collector. If you do that, it won't be started.
> 
> Shouldn't be trigged by DST.

The high cpu usage started at that time, may be that is another problem of some
applications stuck on it. I will investigate further.

Gaetano


Re: Stats collector on rampage (8.2.3)

From
Gaetano Mendola
Date:
Andrew Dunstan wrote:
> 
> 
> Gaetano Mendola wrote:
>> hubert depesz lubaczewski wrote:
>>  
>>>> On Mon, Oct 29, 2007 at 09:52:24AM +0100, Gaetano Mendola wrote:
>>>>      
>>>>>> it seems that the stats collector on my box is using more CPU than
>>>>>> it did in the past.
>>>>>>           
>>>> it's well known bug, and it was fixed in 8.2.4:
>>>> http://www.postgresql.org/docs/current/interactive/release-8-2-4.html
>>>> ...
>>>> Prevent the statistics collector from writing to disk too frequently
>>>> (Tom)
>>>> ...
>>>>       
>>
>> I saw that, upgrading the DB at this very moment is not doable, killing
>> that process will the postmaster respawn another one? BTW I discover that
>> it was triggered by the time change due the daylight saving.
>>
>>
>>   
> 
> you do realize that this upgrade wouldn't require a dump/restore, don't
> you? It would be close to instantaneous.

Sure I do, for me it's even easier than that, I use DRBD so I can just shut
down one node upgrade it and then upgrade the other one; however upgrade the
DB means for me "touch" an entire satellite trasmission platform, I have to
schedule a platform maintenance for that...

Regards
Gaetano Mendola