Re: A costing analysis tool - Mailing list pgsql-hackers
From | Bruce Momjian |
---|---|
Subject | Re: A costing analysis tool |
Date | |
Msg-id | 200510130127.j9D1Rks06511@candle.pha.pa.us Whole thread Raw |
In response to | A costing analysis tool ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
List | pgsql-hackers |
Have you looked at the TODO list to see our previous ideas on tuning diagnotics? --------------------------------------------------------------------------- Kevin Grittner wrote: > I'm looking at trying to fix some clear flaws in costing which cause > of our real-world queries to choose sub-optimal plans under PostgreSQL. > It's clear that there needs to be a tool to analyze the accuracy of > costing for a variety of queries, both to direct any efforts to fix > problems and to test for possible costing regressions. As far as I can > tell, no such tool currently exists. If I've missed something, please > let me know, even if it's ad hoc or incomplete. > > Note that I'm talking about a tool strictly to check the accuracy of > the estimated costs of plans chosen by the planner, nothing else. > > I'm at the rough planning stage, and would appreciate any feedback on > my thoughts before I start actual development. Considerations, in no > particular order (but numbered for easy reference): > > (1) Most of the time, on a well-configured database, PostgreSQL > chooses a plan which performs very well. Many test cases need to cover > these normal cases, both to set a baseline and to catch any regression. > > (2) A large database must be created for these tests, since many > issues don't show up in small tables. The same data must be generated > in every database, so results are comparable and reproducable. > > (3) Developers should be able to easily add test cases, either for > their own use or contributed to the community. > > (4) The same test query should be able to easily run in different > permutations of the following: > > (a) With no data cached before the run, or as fully cached as possible. > (b) With various enable_xxx settings on or off. > (c) With or without significant dead space in the tables/indexes. > > (5) The tool needs to be able to run in a variety of OS environments. > At a minimum, this means some way to pick up configuration information > to specify how to start and stop the back end, and how to flush the > system cache. > > (6) The relative costs of various plans shifts dramatically when C > asserts are enabled. To avoid misleading results, the tool should warn > the user when run on a build configured with --enable-cassert, and all > results from such an environment should be conspicuously identified as > such. > > (7) I envision a process to create a test database, populate it, run a > series of test cases with EXPLAIN ANALYZE, capture the results, parse > the results and store them in a database, analyze the results to find > means and standard deviations both overall and for each type of plan, > and report summaries and outliers -- with references to the test cases. > The primary statistic of interest is actual time divided by cost. This > seems like it would be of interest overall, and within the permutations > mentioned above for a single query. > > A reasonable set of test cases would illuminate where costing > adjustments would provide the most benefit for the least risk. By > reining in the most extreme outliers, we could allow the planner to > recognize the best plan among the options it is considering with > greater accuracy. This tool could be used as a regression test to > ensure that a costing adjustment didn't distort the cost of something > which had been accurately costed. > > So, what do you think? > > -Kevin > > > ---------------------------(end of broadcast)--------------------------- > TIP 2: Don't 'kill -9' the postmaster > -- Bruce Momjian | http://candle.pha.pa.us pgman@candle.pha.pa.us | (610) 359-1001+ If your life is a hard drive, | 13 Roberts Road + Christ can be your backup. | Newtown Square, Pennsylvania19073
pgsql-hackers by date: