Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size - Mailing list pgsql-bugs

From Andres Freund
Subject Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size
Date
Msg-id 201208170931.43034.andres@2ndquadrant.com
Whole thread Raw
In response to Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size  (Valentine Gogichashvili <valgog@gmail.com>)
List pgsql-bugs
Hi Valentine,

On Thursday, August 16, 2012 05:05:11 PM Valentine Gogichashvili wrote:
> Hello again,
>
> now I have restarted the slave with shared_buffers set to 32GB, and now the
> recovery process is at 100% CPU:

> $ strace -c -f -p 27076
> Process 27076 attached - interrupt to quit
> Process 27076 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  44.44    0.000240           0      5976           lseek
>  38.15    0.000206           0      5522           getppid
>  17.41    0.000094           1       172           read
>   0.00    0.000000           0         4         1 open
>   0.00    0.000000           0         2           close
>   0.00    0.000000           0         1           stat
>   0.00    0.000000           0       159       159 unlink
> ------ ----------- ----------- --------- --------- ----------------
> 100.00    0.000540                 11836       160 total

> $ strace -c -f -p 36450
> Process 36450 attached - interrupt to quit
> Process 39336 attached
> Process 39337 attached
> Process 39337 detached
> Process 39338 attached
> Process 39338 detached
> Process 39339 attached
> Process 39339 detached
> Process 39336 detached
> Process 39359 attached
> Process 39360 attached
> Process 39360 detached
> Process 39361 attached
> Process 39361 detached
> Process 39362 attached
> Process 39362 detached
> Process 39359 detached
> Process 40168 attached
> Process 40169 attached
> Process 40169 detached
> Process 40170 attached
> Process 40170 detached
> Process 40171 attached
> Process 40171 detached
> Process 40168 detached
> Process 36450 detached
> % time     seconds  usecs/call     calls    errors syscall
> ------ ----------- ----------- --------- --------- ----------------
>  88.21    0.668159       31817        21         9 wait4
>   4.69    0.035553           6      5970           read
>   1.92    0.014572           0    120565           lseek
>   1.91    0.014474           5      2957      2954 unlink
>   1.55    0.011766           8      1545           write
>   1.27    0.009582           0    109274           getppid
>   0.31    0.002340           3       762           fcntl
>   0.03    0.000245          20        12           clone
>   0.02    0.000163           1       219           rt_sigprocmask
>   0.02    0.000162           1       154        10 open
>   0.01    0.000092           0       216           mmap
>   0.01    0.000084           1       133        42 stat
>   0.01    0.000066           6        12           getrlimit
>   0.01    0.000046           0       178           close
>
Those straces don't indicate contention in the kernel to me.

Could you get a profile with 'perf' or if youre more familiar with it with
'oprofile'. With the former its just something like:
perf record -F 10000 -p 27076
perf report > somefile

And then send in somefile.

Greetings,

Andres

pgsql-bugs by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: BUG #6067: In PL/pgsql, EXISTS(SELECT ... INTO...) fails
Next
From: Andres Freund
Date:
Subject: Re: BUG #7494: WAL replay speed depends heavily on the shared_buffers size