A costing analysis tool - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | A costing analysis tool |
Date | |
Msg-id | s34d55ef.014@gwmta.wicourts.gov Whole thread Raw |
Responses |
Re: A costing analysis tool
Re: A costing analysis tool Re: A costing analysis tool |
List | pgsql-hackers |
I'm looking at trying to fix some clear flaws in costing which cause of our real-world queries to choose sub-optimal plans under PostgreSQL. It's clear that there needs to be a tool to analyze the accuracy of costing for a variety of queries, both to direct any efforts to fix problems and to test for possible costing regressions. As far as I can tell, no such tool currently exists. If I've missed something, please let me know, even if it's ad hoc or incomplete. Note that I'm talking about a tool strictly to check the accuracy of the estimated costs of plans chosen by the planner, nothing else. I'm at the rough planning stage, and would appreciate any feedback on my thoughts before I start actual development. Considerations, in no particular order (but numbered for easy reference): (1) Most of the time, on a well-configured database, PostgreSQL chooses a plan which performs very well. Many test cases need to cover these normal cases, both to set a baseline and to catch any regression. (2) A large database must be created for these tests, since many issues don't show up in small tables. The same data must be generated in every database, so results are comparable and reproducable. (3) Developers should be able to easily add test cases, either for their own use or contributed to the community. (4) The same test query should be able to easily run in different permutations of the following: (a) With no data cached before the run, or as fully cached as possible. (b) With various enable_xxx settings on or off.(c) With or without significant dead space in the tables/indexes. (5) The tool needs to be able to run in a variety of OS environments. At a minimum, this means some way to pick up configuration information to specify how to start and stop the back end, and how to flush the system cache. (6) The relative costs of various plans shifts dramatically when C asserts are enabled. To avoid misleading results, the tool should warn the user when run on a build configured with --enable-cassert, and all results from such an environment should be conspicuously identified as such. (7) I envision a process to create a test database, populate it, run a series of test cases with EXPLAIN ANALYZE, capture the results, parse the results and store them in a database, analyze the results to find means and standard deviations both overall and for each type of plan, and report summaries and outliers -- with references to the test cases. The primary statistic of interest is actual time divided by cost. This seems like it would be of interest overall, and within the permutations mentioned above for a single query. A reasonable set of test cases would illuminate where costing adjustments would provide the most benefit for the least risk. By reining in the most extreme outliers, we could allow the planner to recognize the best plan among the options it is considering with greater accuracy. This tool could be used as a regression test to ensure that a costing adjustment didn't distort the cost of something which had been accurately costed. So, what do you think? -Kevin
pgsql-hackers by date: