Thread: Trying to understand Stats/Query planner issue

Trying to understand Stats/Query planner issue

From
"Strange, John W"
Date:

I have a question on how the analyzer works in this type of scenario.

 

We calculate some results and COPY INTO some partitioned tables, which we use some selects to aggregate the data back out into reports.  Everyone once in a while the aggregation step picks a bad plan due to stats on the tables that were just populated.   Updating the stats and rerunning the query seems to solve the problem, this only happens if we enable nested loop query plans.

 

Which leads me to a few questions:

 

Assumption: that starts aren’t getting created fast enough and then the query planner then picks a bad plan since we query the tables shortly after being populated, so it decided to use a nested loop on a large set of results incorrectly.

 

- if there are no stats on the table how does the query planner identify the best query plan?

- we have tried really aggressive auto_analyze settings down to .001, so basically any insert will get the analyze running with no luck.

- will an analyze block on update to the statistics tables, which makes me wonder if we are updating too often?

 

The other option is just to analyze each table involved in the query after the insert, but that seems a bit counterproductive.

 

Thoughts?

_______________________________________________________________________________________________
| John W. Strange | Vice President | Global Commodities Technology
| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236-3333
|
john.w.strange@jpmchase.com | jpmorgan.com

 

This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.

Re: Trying to understand Stats/Query planner issue

From
Tom Lane
Date:
"Strange, John W" <john.w.strange@jpmorgan.com> writes:
> I have a question on how the analyzer works in this type of scenario.
> We calculate some results and COPY INTO some partitioned tables, which we use some selects to aggregate the data back
outinto reports.  Everyone once in a while the aggregation step picks a bad plan due to stats on the tables that were
justpopulated.   Updating the stats and rerunning the query seems to solve the problem, this only happens if we enable
nestedloop query plans. 

Well, even if auto-analyze launches instantly after you commit the
insertions (which it won't), it's going to take time to scan the table
and then commit the updates to pg_statistic.  So there is always going
to be some window where queries will get planned with obsolete
information.  If you're inserting enough data to materially change the
statistics of a table, and you need to query that table right away,
doing a manual ANALYZE rather than waiting for auto-analyze is
recommended.

> The other option is just to analyze each table involved in the query after the insert, but that seems a bit
counterproductive.

Why would you think that?  This type of scenario is exactly why ANALYZE
isn't deprecated as a user command.

            regards, tom lane