Hi all,
I noticed that when synchronous_commit=off were not waking up the wal sender
latch in xact.c:RecordTransactionCommit which leads to ugly delays of approx 7
seconds (1 + replication_timeout/10) with default settings.
Given that were flushing the wal to disk much sooner this appears to be a bad
idea - especially as this may happen even under load if we ever reach the
'coughtup' state.
I wonder why the WalSndWakeup isn't done like:
diff --git a/src/backend/access/transam/xlog.c
b/src/backend/access/transam/xlog.c
index ecb71b6..7a3224b 100644
--- a/src/backend/access/transam/xlog.c
+++ b/src/backend/access/transam/xlog.c
@@ -1906,6 +1906,10 @@ XLogWrite(XLogwrtRqst WriteRqst, bool flexible, bool
xlog_switch) xlogctl->LogwrtRqst.Flush = LogwrtResult.Flush; SpinLockRelease(&xlogctl->info_lck); }
+
+ /* the walsender wasn't woken up in xact.c */
+ if(max_wal_senders > 1 && synchronous_commit == SYNCHRONOUS_COMMIT_OFF)
+ WalSndWakeup();}
Doing that for the synchronous_commit=off case can imo be considered a bugfix,
but I wonder why we ever wake the senders somewhere else?
The only argument I can see for doing it at places like StartTransactionCommit
is that thats the place after which the data will be visible on the client. I
think thats a non-argument though because if wal is flushed to disk outside of
a commit there normally is enough data to make it worthwile.
Doing the above results in a very noticeable reduction in lagginess and even a
noticeable reduction in cpu-usage spikes on a busy replication test setup.
Greetings,
Andres
-- Andres Freund http://www.2ndQuadrant.com/PostgreSQL Development, 24x7 Support, Training &
Services