Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction - Mailing list pgsql-general

From Martin Kováčik
Subject Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
Date
Msg-id CABp97hP+m6NQwr=4u=AH2baRn--LVCAWA9b+Ybu=zigcOGMBXg@mail.gmail.com
Whole thread Raw
In response to Re: analyze causes query planner to choose suboptimal plan for a select query in separate transaction  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-general
Thanks Tom,

that is a great explanation. The default plan, which overestimates number of rows is much better than the plan (which is based on wrong stats) that underestimates it. This is true for my particular case but may be bad for other cases.

The schema is created before the tests begin so the suggestion to create required tables inside a transaction isn't feasible for my case. For the build server I'm going with Michael's suggestion to disable autovacuum. Instead I'll do manual vacuuming/analyzing before build starts.

For the production use case I'll do my own analyze after bulk insert as you suggested.

Initially I thought that inside a transaction stats are automatically updated after the inserts because the query ran much faster than the query which was based on statistics with no rows. I did not realize the planner chooses a "default" plan when there are no stats.

Martin Kováčik
CEO
redByte, s.r.o.
+421 904 236 791
kovacik@redbyte.eu, www.redbyte.eu


On Thu, Apr 25, 2019 at 9:46 PM Tom Lane <tgl@sss.pgh.pa.us> wrote:
Martin Kováčik <kovacik@redbyte.eu> writes:
> To illustrate my situation let's consider my tests look like this:

> BEGIN;

> -- A: insert data for the test

> -- B: i'll refer to this point later

> -- C: select(s)

> ROLLBACK;

> Everything is fine, until autovacuum (analyze) runs when the test is at
> point B. After that the query planner at point C chooses wrong plan and the
> query takes a long time to complete, blocking one CPU core for a long time.
> It seems like the planner statistics inside running transaction are
> affected by analyze task running outside of the transaction.

Yup, they are.  However, you're already at risk of a pretty bad plan for
a case like this, since (by assumption) the stats before you did the
insert at step A are radically different from what they should be after
the insert.

The standard recommendation, when you need the results of a data change
to be understood by the planner immediately, is to do your own ANALYZE:

BEGIN;

-- A: insert data for the test

ANALYZE test_table;

-- C: select(s)

ROLLBACK;

This should protect step C against seeing any irrelevant stats, because

(a) once your transaction has done an ANALYZE, autovacuum shouldn't
think the stats are out of date, and

(b) even if it does, your transaction is now holding
ShareUpdateExclusiveLock on the table so auto-ANALYZE can't get
that lock to do a fresh ANALYZE.


Another thing you could do is not even allow the test table to exist
outside your transaction:

BEGIN;

CREATE TABLE test_table (...);

-- A: insert data for the test

ANALYZE test_table;  -- this is now somewhat optional

-- C: select(s)

ROLLBACK;

Then there's nothing for auto-ANALYZE to get its hands on.  If you're
satisfied with the planner's default behavior in the absence of any
stats, you could omit the post-insertion ANALYZE in this case.  But
I'm not sure that that would represent a test that has much to do with
production situations.

                        regards, tom lane

pgsql-general by date:

Previous
From: Adrian Klaver
Date:
Subject: Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction
Next
From: Martin Kováčik
Date:
Subject: Re: analyze causes query planner to choose suboptimal plan for aselect query in separate transaction