Thread: Trying to understand Stats/Query planner issue
I have a question on how the analyzer works in this type of scenario.
We calculate some results and COPY INTO some partitioned tables, which we use some selects to aggregate the data back out into reports. Everyone once in a while the aggregation step picks a bad plan due to stats on the tables that were just populated. Updating the stats and rerunning the query seems to solve the problem, this only happens if we enable nested loop query plans.
Which leads me to a few questions:
Assumption: that starts aren’t getting created fast enough and then the query planner then picks a bad plan since we query the tables shortly after being populated, so it decided to use a nested loop on a large set of results incorrectly.
- if there are no stats on the table how does the query planner identify the best query plan?
- we have tried really aggressive auto_analyze settings down to .001, so basically any insert will get the analyze running with no luck.
- will an analyze block on update to the statistics tables, which makes me wonder if we are updating too often?
The other option is just to analyze each table involved in the query after the insert, but that seems a bit counterproductive.
Thoughts?
_______________________________________________________________________________________________
| John W. Strange | Vice President | Global Commodities Technology
| J.P. Morgan | 700 Louisiana, 11th Floor | T: 713-236-4122 | C: 281-744-6476 | F: 713 236-3333
| john.w.strange@jpmchase.com | jpmorgan.com
This email is confidential and subject to important disclaimers and conditions including on offers for the purchase or sale of securities, accuracy and completeness of information, viruses, confidentiality, legal privilege, and legal entity disclaimers, available at http://www.jpmorgan.com/pages/disclosures/email.
"Strange, John W" <john.w.strange@jpmorgan.com> writes: > I have a question on how the analyzer works in this type of scenario. > We calculate some results and COPY INTO some partitioned tables, which we use some selects to aggregate the data back outinto reports. Everyone once in a while the aggregation step picks a bad plan due to stats on the tables that were justpopulated. Updating the stats and rerunning the query seems to solve the problem, this only happens if we enable nestedloop query plans. Well, even if auto-analyze launches instantly after you commit the insertions (which it won't), it's going to take time to scan the table and then commit the updates to pg_statistic. So there is always going to be some window where queries will get planned with obsolete information. If you're inserting enough data to materially change the statistics of a table, and you need to query that table right away, doing a manual ANALYZE rather than waiting for auto-analyze is recommended. > The other option is just to analyze each table involved in the query after the insert, but that seems a bit counterproductive. Why would you think that? This type of scenario is exactly why ANALYZE isn't deprecated as a user command. regards, tom lane