Every now and again, I will encounter an unexplained long-running query.
It’s a head scratcher moment, because this query that is still running for 20 minutes (not blocking) can be run independently in about 500ms
I can only assume that the problem query ran against the table(s) at a time when it was perhaps in need of a vacuum analyze... I’m guessing here, that the table had seen some amount of change and simply had out of date statistics.
How can I avoid this?
The auto-vacuum daemon is doing it’s thing, but there is always going to be an opportunity for a query to sneak in against a table that has recently seen large change, but not yet been analysed.
On the application side, we can explicitly issue a VACUUM ANALYZE after each bulk operation - and often that is precisely what happens..
But - I am keenly aware that this cannot be performed within a transaction. That means there is always a (small) window in which a query can still execute in this scenario.
Are there any other best practices that can mitigate this kind of problem?
It’s rare, sure - but I don’t like sweeping these under the rug.
I’m on PG 9.6.. perhaps there are planner improvements since then that might reduce the incidence of these (rare) issues.
Any advice appreciated, thanks.