AFAIK the commit on master happens only after it receives ack from the slave. This is how synchronous replication ensures that the slave is'in sync'.
If that is the case , then why does PG find it impossible to sync back with the primary after a crash.
Other products offering similar technology do not have this issue.
In my opinion this is quite a serious limitation with PG replication. Every time the primary crashes and the business continues with the promotion of standby as the new primary, the crashed server has to be reinitialized for the set up of the replication.
I want to understand how PG sync replication works. This is what I know (assuming two node sync replication)
1 - Application issues commit. 2 - PG commits the transaction locally on the primary server. 3 - At this stage the application has not got the commit indication back. 4 - PG transmits the transaction from the local to the remote server. 5 - Remote server sends back acknowledgement 6 - The app gets commit ack back.
So this means, between step 2 and step 6, the app is not aware that the transaction has already been committed. This is the reason why, in the event of server crashing between step 2 and step 6, and the remote takes over as the new primary, the crashed server can not restart as standby and the only option is to recreate the db from the remote
server (which is now acting as the primary).
Am I correct in the understanding?
One more question: In Step 5, does the remote harden the transaction on the disk, or merely receives the transaction in the log buffer and it sends back ACK to the local server.