Thread: Strange situation on slave server

Strange situation on slave server

From
Condor
Date:
Hello,

I have master - slave replication hot standby. Both server are linux
slackware64 current with postgresql 9.4.4.

Today when I logged to check some things on slave server I see on top
memory taken 26%. That was strange for me and I restart server.
Well, I checked after 40 minute and I see again replication process is
little growing. For 10 min about 1.0.


postgres 14162  0.0  0.7 6514656 190684 ?      S    11:50   0:00
/usr/bin/postgres -D /var/lib/postgres/database -F
postgres 14163  0.4  5.4 6517856 1341268 ?     Ss   11:50   0:26  \_
postgres: startup process   recovering 00000001000005E50000005E
postgres 14166  0.1  5.4 6517876 1335980 ?     Ss   11:50   0:09  \_
postgres: checkpointer process
postgres 14168  0.0  0.2 6517732 53276 ?       Ss   11:50   0:01  \_
postgres: writer process
postgres 14180  0.0  0.0  29888  2864 ?        Ss   11:50   0:01  \_
postgres: stats collector process
postgres 14183  0.2  0.0 6522488 5584 ?        Ss   11:50   0:15  \_
postgres: wal receiver process   streaming 5E5/5EE36000


Today our system is little bit average but slave server receive one wal
file every minute. As you can see checkpointer process and recovering is
5.4% and until I write the email,
I check them again and now is 6.0


postgres 14162  0.0  0.7 6514656 190684 ?      S    11:50   0:00
/usr/bin/postgres -D /var/lib/postgres/database -F
postgres 14163  0.4  6.0 6517856 1499868 ?     Ss   11:50   0:30  \_
postgres: startup process   recovering 00000001000005E500000072
postgres 14166  0.1  6.0 6517876 1497064 ?     Ss   11:50   0:10  \_
postgres: checkpointer process
postgres 14168  0.0  0.2 6517732 53276 ?       Ss   11:50   0:01  \_
postgres: writer process
postgres 14180  0.0  0.0  29888  2864 ?        Ss   11:50   0:01  \_
postgres: stats collector process
postgres 14183  0.2  0.0 6522488 5584 ?        Ss   11:50   0:18  \_
postgres: wal receiver process   streaming 5E5/72B49718


I have free memory:
Mem:  24634464k total, 14674532k used,  9959932k free,   324108k buffers
Swap:  6589196k total,     1872k used,  6587324k free, 11819140k cached


I try to use strace on process:

# strace -p 14163

lseek(9, 0, SEEK_END)                   = 381943808
lseek(23, 0, SEEK_END)                  = 376832
lseek(5, 3563520, SEEK_SET)             = 3563520
read(5,
"~\320\5\0\1\0\0\0\0`6C\345\5\0\0\344\7\0\0\0\0\0\0\0\0\0\0\333I\f\0"...,
8192) = 8192
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(9, 0, SEEK_END)                   = 381943808
lseek(5, 3571712, SEEK_SET)             = 3571712
read(5,
"~\320\5\0\1\0\0\0\0\2006C\345\5\0\0^\6\0\0\0\0\0\0Ja.\0\20\0\220h"...,
8192) = 8192
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(10, 0, SEEK_END)                  = 186392576
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(11, 0, SEEK_END)                  = 182566912
read(3, 0x7ffd2d7a60d0, 16)             = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, 5000)  = ? ERESTART_RESTARTBLOCK
(Interrupted by signal)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
si_uid=1000} ---
write(4, "\0", 1)                       = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system
call)
read(3, "\0", 16)                       = 1
read(6, 0x7ffd2d7a611f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(5, 3579904, SEEK_SET)             = 3579904
read(5,
"~\320\5\0\1\0\0\0\0\2406C\345\5\0\0\232\5\0\0\0\0\0\0\0\0\0\0*\231\1\0"...,
8192) = 8192
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(12, 0, SEEK_END)                  = 203612160
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(13, 0, SEEK_END)                  = 331071488
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(14, 0, SEEK_END)                  = 193331200
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(15, 0, SEEK_END)                  = 271171584
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(16, 0, SEEK_END)                  = 187580416
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(17, 0, SEEK_END)                  = 193257472
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(18, 0, SEEK_END)                  = 277381120
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(19, 0, SEEK_END)                  = 199884800
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(20, 0, SEEK_END)                  = 193396736
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(21, 0, SEEK_END)                  = 204808192
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(22, 0, SEEK_END)                  = 193355776
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(9, 0, SEEK_END)                   = 381943808
lseek(23, 0, SEEK_END)                  = 376832
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(9, 0, SEEK_END)                   = 381943808
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(24, 0, SEEK_END)                  = 374423552
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(25, 0, SEEK_END)                  = 1777664
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
lseek(24, 0, SEEK_END)                  = 374423552
read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
temporarily unavailable)
read(3, 0x7ffd2d7a60d0, 16)             = -1 EAGAIN (Resource
temporarily unavailable)
poll([{fd=3, events=POLLIN}], 1, 5000)  = ? ERESTART_RESTARTBLOCK
(Interrupted by signal)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
si_uid=1000} ---
write(4, "\0", 1)                       = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system
call)


and continue...
That is the slave server only for mirroring database no any connections
and query ever run on him.

My question is this normal ?



Regards,
Hristo Simeonov



Re: Strange situation on slave server

From
Condor
Date:
On 01-07-2015 13:53, Condor wrote:
> Hello,
>
> I have master - slave replication hot standby. Both server are linux
> slackware64 current with postgresql 9.4.4.
>
> Today when I logged to check some things on slave server I see on top
> memory taken 26%. That was strange for me and I restart server.
> Well, I checked after 40 minute and I see again replication process is
> little growing. For 10 min about 1.0.
>
>
> postgres 14162  0.0  0.7 6514656 190684 ?      S    11:50   0:00
> /usr/bin/postgres -D /var/lib/postgres/database -F
> postgres 14163  0.4  5.4 6517856 1341268 ?     Ss   11:50   0:26  \_
> postgres: startup process   recovering 00000001000005E50000005E
> postgres 14166  0.1  5.4 6517876 1335980 ?     Ss   11:50   0:09  \_
> postgres: checkpointer process
> postgres 14168  0.0  0.2 6517732 53276 ?       Ss   11:50   0:01  \_
> postgres: writer process
> postgres 14180  0.0  0.0  29888  2864 ?        Ss   11:50   0:01  \_
> postgres: stats collector process
> postgres 14183  0.2  0.0 6522488 5584 ?        Ss   11:50   0:15  \_
> postgres: wal receiver process   streaming 5E5/5EE36000
>
>
> Today our system is little bit average but slave server receive one
> wal file every minute. As you can see checkpointer process and
> recovering is 5.4% and until I write the email,
> I check them again and now is 6.0
>
>
> postgres 14162  0.0  0.7 6514656 190684 ?      S    11:50   0:00
> /usr/bin/postgres -D /var/lib/postgres/database -F
> postgres 14163  0.4  6.0 6517856 1499868 ?     Ss   11:50   0:30  \_
> postgres: startup process   recovering 00000001000005E500000072
> postgres 14166  0.1  6.0 6517876 1497064 ?     Ss   11:50   0:10  \_
> postgres: checkpointer process
> postgres 14168  0.0  0.2 6517732 53276 ?       Ss   11:50   0:01  \_
> postgres: writer process
> postgres 14180  0.0  0.0  29888  2864 ?        Ss   11:50   0:01  \_
> postgres: stats collector process
> postgres 14183  0.2  0.0 6522488 5584 ?        Ss   11:50   0:18  \_
> postgres: wal receiver process   streaming 5E5/72B49718
>
>
> I have free memory:
> Mem:  24634464k total, 14674532k used,  9959932k free,   324108k
> buffers
> Swap:  6589196k total,     1872k used,  6587324k free, 11819140k cached
>
>
> I try to use strace on process:
>
> # strace -p 14163
>
> lseek(9, 0, SEEK_END)                   = 381943808
> lseek(23, 0, SEEK_END)                  = 376832
> lseek(5, 3563520, SEEK_SET)             = 3563520
> read(5,
> "~\320\5\0\1\0\0\0\0`6C\345\5\0\0\344\7\0\0\0\0\0\0\0\0\0\0\333I\f\0"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END)                   = 381943808
> lseek(5, 3571712, SEEK_SET)             = 3571712
> read(5,
> "~\320\5\0\1\0\0\0\0\2006C\345\5\0\0^\6\0\0\0\0\0\0Ja.\0\20\0\220h"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(10, 0, SEEK_END)                  = 186392576
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(11, 0, SEEK_END)                  = 182566912
> read(3, 0x7ffd2d7a60d0, 16)             = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN}], 1, 5000)  = ? ERESTART_RESTARTBLOCK
> (Interrupted by signal)
> --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
> si_uid=1000} ---
> write(4, "\0", 1)                       = 1
> rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system
> call)
> read(3, "\0", 16)                       = 1
> read(6, 0x7ffd2d7a611f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(5, 3579904, SEEK_SET)             = 3579904
> read(5,
> "~\320\5\0\1\0\0\0\0\2406C\345\5\0\0\232\5\0\0\0\0\0\0\0\0\0\0*\231\1\0"...,
> 8192) = 8192
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(12, 0, SEEK_END)                  = 203612160
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(13, 0, SEEK_END)                  = 331071488
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(14, 0, SEEK_END)                  = 193331200
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(15, 0, SEEK_END)                  = 271171584
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(16, 0, SEEK_END)                  = 187580416
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(17, 0, SEEK_END)                  = 193257472
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(18, 0, SEEK_END)                  = 277381120
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(19, 0, SEEK_END)                  = 199884800
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(20, 0, SEEK_END)                  = 193396736
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(21, 0, SEEK_END)                  = 204808192
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(22, 0, SEEK_END)                  = 193355776
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END)                   = 381943808
> lseek(23, 0, SEEK_END)                  = 376832
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(9, 0, SEEK_END)                   = 381943808
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(24, 0, SEEK_END)                  = 374423552
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(25, 0, SEEK_END)                  = 1777664
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> lseek(24, 0, SEEK_END)                  = 374423552
> read(6, 0x7ffd2d7a672f, 1)              = -1 EAGAIN (Resource
> temporarily unavailable)
> read(3, 0x7ffd2d7a60d0, 16)             = -1 EAGAIN (Resource
> temporarily unavailable)
> poll([{fd=3, events=POLLIN}], 1, 5000)  = ? ERESTART_RESTARTBLOCK
> (Interrupted by signal)
> --- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=14183,
> si_uid=1000} ---
> write(4, "\0", 1)                       = 1
> rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system
> call)
>
>
> and continue...
> That is the slave server only for mirroring database no any
> connections and query ever run on him.
>
> My question is this normal ?
>
>
>
> Regards,
> Hristo Simeonov


Hello again,
sry I up the mail again, here is my new statistic:

USER       PID %CPU %MEM    VSZ   RSS TTY      STAT START   TIME COMMAND
postgres 14162  0.0  0.7 6514656 188168 ?      S    Jul01   0:00
/usr/bin/postgres -D /var/lib/postgres/database -F
postgres 14163  0.4 26.1 6518012 6437776 ?     Ss   Jul01  10:04
postgres: startup process   recovering 00000001000005F10000007A
postgres 14166  0.1 25.9 6518516 6396756 ?     Ss   Jul01   3:00
postgres: checkpointer process
postgres 14168  0.0 15.0 6517872 3708012 ?     Ss   Jul01   0:23
postgres: writer process
postgres 14180  0.0  0.0  29888  1992 ?        Ss   Jul01   0:13
postgres: stats collector process
postgres 14183  0.2  0.0 6522488 4064 ?        Ss   Jul01   5:36
postgres: wal receiver process   streaming 5F1/7A854300


Last wal file was received 10 minutes ago.

Free mem now:
Mem:  24634464k total, 19669032k used,  4965432k free,   350428k buffers
Swap:  6589196k total,     2704k used,  6586492k free, 17825832k cached


Any one have idea why the hell the checkpointer or writer process in
idle state take so much memory ? Is that normal ?

Cheers,
Hristo