On Fri, Jul 25, 2025 at 9:57 AM Jon Zeppieri <zeppieri@gmail.com> wrote:
Thanks for the response, Nick. I'm curious why the situation you describe wouldn't also lead to the write_lag and flush_lag also being high. If the problem is simply keeping up with the primary, wouldn't you expect all three lag times to be elevated?
No - write and flush are pretty quick and simple, it's just putting the WAL onto the local disk. Replay involves a lot more work as we have to parse the WAL and apply the changes, which means doing a lot of I/O across many files. Still, *hours* to me indicates more than just a lot of extra traffic. Check that recovery_min_apply_delay is still 0, then log onto the replica and see what's going on with regards to open transactions and locks.