Hi
Do you think it's reasonable to want to COMMIT a particular transaction on a master node, and then immediately run a read-only query on a hot standby node that is guaranteed to see that transaction?
A friend of mine who works with a different RDBMS technology that can do that asked me how to achieve this with Postgres, and I suggested waiting for the standby's pg_last_xlog_replay_location() to be >= the master's pg_current_xlog_location() after COMMIT, which might involve some looping and sleeping.
As a quick weekend learning exercise/hack I recently went looking into how we could support $SUBJECT. I discovered we already report the apply progress back to the master, and the synchronous waiting facility seemed to be all ready to support this. In fact it seemed a little too easy so something tells me it must be wrong! But anyway, please see the attached toy POC patch which does that.
The next problem is that the master can be waiting quite a long time for a reply from the remote walreceiver containing the desired apply LSN: in the best case it learns of apply progress from replies to subsequent unrelated records (which might be very soon on a busy system but still involves waiting for the next transaction's WAL flush), and in the worst case it needs to wait for wal_receiver_status_interval (10 seconds by default), which makes for a long COMMIT delay. I was thinking that the solution to that may be to teach StartupLOG to signal the walreceiver after it updates XLogCtl->lastReplayedEndRecPtr, which should cause walrcv_receive to be interrupted and return early, and then walreceiver could send a reply if it sees that lastReplayedEndRecPtr has moved. Maybe that would generate an unacceptably high frequency of signals, and maybe there is a better form of IPC for this. Without introducing any new IPC, the walreceiver could instead simply report apply progress to the master whenever it sees that the apply LSN has moved after its regular NAPTIME_PER_CYCLE wait (100ms), but that would still introduces bogus latency. A quick and dirty way to see that on top of the attached patch is to set requestReply = true in WalReceiverMain to force a send after every nap.
I can see that using synchronous_commit = apply in the practice might prove difficult: how does a client know which node is the synchronous standby? Perhaps those sorts of practical problems are the reason no one has done or wanted this.
Thoughts?
Thanks for reading!
--