Thread: BUG #14044: Queries immediately conflict with recovery when recovery_min_apply_delay is used
BUG #14044: Queries immediately conflict with recovery when recovery_min_apply_delay is used
From
josnyder@yelp.com
Date:
The following bug has been logged on the website: Bug reference: 14044 Logged by: Josh Snyder Email address: josnyder@yelp.com PostgreSQL version: 9.5.1 Operating system: Ubuntu 14.04 (Trusty) Description: I encountered an issue using a hot standby together with recovery_min_apply_delay. For context, we would like to use a small recovery_min_apply_delay (on the order of seconds) to make bugs in our application's handling readily apparent. We are trying to avoid situations where a feature works by coincidence only because replication delay is low. In this case, recovery_min_apply_delay was 5 seconds. max_standby_streaming_delay and max_standby_archive_delay are both 180 minutes. Approximately 180 minutes after recovery_min_apply_delay was set (and Postgres restarted), our application began to experience errors of the form: ERROR: canceling statement due to conflict with recovery DETAIL: User query might have needed to see row versions that must be removed The queries in question were SELECTs that typically returned in less than a millisecond, and were executed outside of a transaction. Under these circumstances, I expected the replica to allow these queries to run for up to 180 minutes before cancelling them due to recovery conflict. A comment in xlog.c appears to explain this behavior: * We only advance XLogReceiptTime when we obtain fresh * WAL from walreceiver and observe that we had already * processed everything before the most recent "chunk" * that it flushed to disk. In steady state where we are * keeping up with the incoming data, XLogReceiptTime will * be updated on each cycle. When we are behind, * XLogReceiptTime will not advance, so the grace time * allotted to conflicting queries will decrease. */ Based on this comment, it appears that using recovery_min_apply_delay on a hot standby is inadvisable. As far as I can tell, this incompatibility is not documented anywhere. Calling out the incompatibility in documentation would help. But it is also unclear to the novice reader (such as myself) why we elect not to update XLogReceiptTime if there is outstanding WAL data to process. Or, semi-equivalently, why should "grace time allotted to conflicting queries" ever fall below max_standby_streaming_delay - <current delay>?