why would postgres be throttling a streaming replication slot'ssending? - Mailing list pgsql-general

From Chris Withers
Subject why would postgres be throttling a streaming replication slot'ssending?
Date
Msg-id 1a8891b4-10ca-6545-79bc-95becb8c17d4@withers.org
Whole thread Raw
In response to Re: finding out what's generating WALs  (Achilleas Mantzios <achill@matrix.gatewaynet.com>)
Responses Re: why would postgres be throttling a streaming replication slot'ssending?
List pgsql-general
On 11/12/2018 14:48, Achilleas Mantzios wrote:
> On 11/12/18 4:00 μ.μ., Chris Withers wrote:
>>
>> I'm looking after a multi-tenant PG 9.4 cluster, and we've started 
>> getting alerts for the number of WALs on the server.
>> It'd be great to understand what's generating all that WAL and what's 
>> likely to be causing any problems.\
> 
> Regarding you wals in pg_wal,  a good threshold could be anything more 
> than a e.g. 10% increase from wal_keep_segments with a trend to go up. 
> If this number goes up chances are something bad is happening.

wal_keep_segments is 0 here, so I went hunting in pg_replication_slots 
and found that it's the barman slot that's behind on the WAL.

strace'ing on the barman receiving side shows it waiting on the socket, 
so no problem on that side.

on the sending side, it's a little confusing, since the server is 
basically idle, no cpu or disk activity, and yet strace'ing the sending 
process attached to the other end of the socket shows time being spent 
waiting on a poll which, while it includes the socket being sent to, 
also includes a bunch of pipes. I've attached a chunk of the output 
below in the hope that someone on this list could offer an explanation 
as to what might cause the WAL to be trickling over this port rather 
than catching up as fast as it can?

cheers,

Chris

strace output for the streaming_barman slot servicing the barman 
replication slot:

--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} ---
write(4, "\0", 1)                       = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
read(3, "\0", 16)                       = 1
read(6, 0x7fffffffdcd7, 1)              = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(10, 0x7fffffffdcaf, 1, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
read(5, 
"\3\1\0\0\7\326\355\2\343\0\0\0@\n\0\0\0\17\1\230\313\4\0\0)\347\372l\0\0\0\0"..., 
4272) = 4272
sendto(10, "d\0\0\20\315w\0\0\4\313\230\1\17P\0\0\4\313\230\1 
\0\0\2\37\321\343\345\306\20\3\1"..., 4302, 0, NULL, 0) = 4302
read(3, 0x7fffffffdc90, 16)             = -1 EAGAIN (Resource 
temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6, 
events=POLLIN}], 3, 26076) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} ---
write(4, "\0", 1)                       = 1
rt_sigreturn({mask=[]})                 = -1 EINTR (Interrupted system call)
read(3, "\0", 16)                       = 1
read(6, 0x7fffffffdcd7, 1)              = -1 EAGAIN (Resource 
temporarily unavailable)
recvfrom(10, 0x7fffffffdcaf, 1, 0, NULL, NULL) = -1 EAGAIN (Resource 
temporarily unavailable)
read(5, "~\320\5\0\3\0\0\0\0 
\1\230\313\4\0\0\3\0\0\0\0\0\0\0\323\316\314\0\0\0\0\0"..., 8192) = 8192
sendto(10, "d\0\0 \35w\0\0\4\313\230\1 
\0\0\0\4\313\230\1@\0\0\2\37\321\343\361\376\226~\320"..., 8222, 0, 
NULL, 0) = 8222
read(3, 0x7fffffffdc90, 16)             = -1 EAGAIN (Resource 
temporarily unavailable)
poll([{fd=10, events=POLLIN}, {fd=3, events=POLLIN}, {fd=6, 
events=POLLIN}], 3, 25275) = ? ERESTART_RESTARTBLOCK (Interrupted by signal)
--- SIGUSR1 {si_signo=SIGUSR1, si_code=SI_USER, si_pid=3577, si_uid=26} ---

Interesting numbers:
- process 3577 is the wal writer process
- fd 10 is the socket, 3 and 6 are pipes
- time is visibly spent sitting on that poll
- the network is only seeing 2Mbit/sec, which is nothing

cheers,

Chris


pgsql-general by date:

Previous
From: Moreno Andreo
Date:
Subject: REVOKE to an user that doesn't exist
Next
From: Adrian Klaver
Date:
Subject: Re: REVOKE to an user that doesn't exist