Thread: Avoiding out of date statistics / planner

Avoiding out of date statistics / planner

From
Tim Kane
Date:
Every now and again, I will encounter an unexplained long-running query.

It’s a head scratcher moment, because this query that is still running for 20 minutes (not blocking) can be run independently in about 500ms

I can only assume that the problem query ran against the table(s) at a time when it was perhaps in need of a vacuum analyze...  I’m guessing here, that the table had seen some amount of change and simply had out of date statistics.

How can I avoid this?
The auto-vacuum daemon is doing it’s thing, but there is always going to be an opportunity for a query to sneak in against a table that has recently seen large change, but not yet been analysed.

On the application side, we can explicitly issue a VACUUM ANALYZE after each bulk operation - and often that is precisely what happens..

But - I am keenly aware that this cannot be performed within a transaction. That means there is always a (small) window in which a query can still execute in this scenario.

Are there any other best practices that can mitigate this kind of problem?

It’s rare, sure - but I don’t like sweeping these under the rug.

I’m on PG 9.6.. perhaps there are planner improvements since then that might reduce the incidence of these (rare) issues.

Any advice appreciated, thanks.

Re: Avoiding out of date statistics / planner

From
Tom Lane
Date:
Tim Kane <tim.kane@gmail.com> writes:
> Every now and again, I will encounter an unexplained long-running query.
> It’s a head scratcher moment, because this query that is still running for
> 20 minutes (not blocking) can be run independently in about 500ms

Without some kind of context (like, have you been doing something to
the table(s) involved that would drastically change their statistics)
it's hard to comment on this.  It's not obvious from the info
provided that this is a bad-plan issue rather than something else.

> On the application side, we can explicitly issue a VACUUM ANALYZE after
> each bulk operation - and often that is precisely what happens..
> But - I am keenly aware that this cannot be performed within a transaction.

Plain ANALYZE can be, and that's all you need if the problem is to
update stats.

            regards, tom lane



Re: Avoiding out of date statistics / planner

From
Michael Lewis
Date:
It may also be worth noting that it is possible to make autovacuum/analyze more aggressive, perhaps only on the tables that see large changes in data that might result in a statistics issue. If you could share a query, explain analyze output, and pseudo code or at least description of what sort of bulk operations are being done, then more insight could be offered.

Re: Avoiding out of date statistics / planner

From
Tomas Vondra
Date:
On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:
>It may also be worth noting that it is possible to make autovacuum/analyze
>more aggressive, perhaps only on the tables that see large changes in data
>that might result in a statistics issue. If you could share a query,
>explain analyze output, and pseudo code or at least description of what
>sort of bulk operations are being done, then more insight could be offered.

Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

-- 
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



Re: Avoiding out of date statistics / planner

From
Tim Kane
Date:
That looks very useful indeed. Thanks Tomas

On Wed, Feb 12, 2020 at 8:32 PM Tomas Vondra <tomas.vondra@2ndquadrant.com> wrote:
On Wed, Feb 12, 2020 at 10:23:22AM -0700, Michael Lewis wrote:
>It may also be worth noting that it is possible to make autovacuum/analyze
>more aggressive, perhaps only on the tables that see large changes in data
>that might result in a statistics issue. If you could share a query,
>explain analyze output, and pseudo code or at least description of what
>sort of bulk operations are being done, then more insight could be offered.

Another thing you can do is deploy auto_explain, and log explain plan
for long-runnning queries. That won't fix the root cause, but it will
help you with confirming the root cause - you'll see the query plan,
which should give you enough context.

regards

--
Tomas Vondra                  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services