Walsender waiting on SnapbuildSync - Mailing list pgsql-general

From Brent Kerby
Subject Walsender waiting on SnapbuildSync
Date
Msg-id CAH8WVsjqRzVNSAaM68PMWt2s+4gcntAh7JpiSwFhAHY=WSRc3g@mail.gmail.com
Whole thread Raw
Responses Re: Walsender waiting on SnapbuildSync
List pgsql-general
On Postgres 10.3 (on AWS RDS), I am running logical decoding using the test_decoding output plugin, and every few minutes I am seeing pauses in the stream, unrelated to any large transactions. About once every hour or two, the pause is long enough that the database disconnects my client due to exceeding wal_sender_timeout (30 seconds -- the RDS default value); after reconnecting it is able to make progress again. My client is using the streaming replication protocol via pgjdbc (with a status interval of 1 second). What I'm seeing is that during such a pause, the server is not sending any data to the client:

- pg_stat_replication.sent_lsn stops advancing
- My client is blocking in a call to PGReplicationStream.read()
- pg_stat_activity shows that the walsender process has a wait_event of 'SnapbuildSync'.

In this scenario, it makes sense that the client would be timed out: pgjdbc only sends feedback to the server at the beginning of a call to PGReplicationStream.read(), so if a single call blocks a long time, never receiving any data from the server, then the client would stop sending feedback to the server, causing timeout.

My question is why might the server be spending so much time waiting on SnapbuildSync? The docs describe this event as follows:

"IO / SnapbuildSync / Waiting for a serialized historical catalog snapshot to reach stable storage."

I gather that this is related to statistics collection, but I'm not understanding why a walsender process would wait on such an event nor why it would take such a long time. Any ideas?

Another thing is that when these pauses occur they are always in between transactions, i.e., after the client has received a COMMIT message but before receiving the next BEGIN. And the transactions before and after are generally normally-sized ones (at most a few kilobytes of WAL), so this doesn't appear to be related to issues with large transactions that have been discussed in the past.

- Brent

pgsql-general by date:

Previous
From: Dmitry Igrishin
Date:
Subject: Re: Add column with If Not Exists
Next
From: Michael Paquier
Date:
Subject: Re: Pg_rewind cannot load history wal