Re: A costing analysis tool - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: A costing analysis tool
Date
Msg-id 200510130127.j9D1Rks06511@candle.pha.pa.us
Whole thread Raw
In response to A costing analysis tool  ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>)
List pgsql-hackers
Have you looked at the TODO list to see our previous ideas on tuning
diagnotics?

---------------------------------------------------------------------------

Kevin Grittner wrote:
> I'm looking at trying to fix some clear flaws in costing which cause
> of our real-world queries to choose sub-optimal plans under PostgreSQL.
> It's clear that there needs to be a tool to analyze the accuracy of
> costing for a variety of queries, both to direct any efforts to fix
> problems and to test for possible costing regressions.  As far as I can
> tell, no such tool currently exists.  If I've missed something, please
> let me know, even if it's ad hoc or incomplete.
> 
> Note that I'm talking about a tool strictly to check the accuracy of
> the estimated costs of plans chosen by the planner, nothing else.
> 
> I'm at the rough planning stage, and would appreciate any feedback on
> my thoughts before I start actual development.  Considerations, in no
> particular order (but numbered for easy reference):
> 
> (1)  Most of the time, on a well-configured database, PostgreSQL
> chooses a plan which performs very well.  Many test cases need to cover
> these normal cases, both to set a baseline and to catch any regression.
> 
> (2)  A large database must be created for these tests, since many
> issues don't show up in small tables.  The same data must be generated
> in every database, so results are comparable and reproducable.
> 
> (3)  Developers should be able to easily add test cases, either for
> their own use or contributed to the community.
> 
> (4)  The same test query should be able to easily run in different
> permutations of the following:
> 
>   (a)  With no data cached before the run, or as fully cached as possible.
>   (b)  With various enable_xxx settings on or off.
>   (c)  With or without significant dead space in the tables/indexes.
> 
> (5)  The tool needs to be able to run in a variety of OS environments.
> At a minimum, this means some way to pick up configuration information
> to specify how to start and stop the back end, and how to flush the
> system cache.
> 
> (6)  The relative costs of various plans shifts dramatically when C
> asserts are enabled.  To avoid misleading results, the tool should warn
> the user when run on a build configured with --enable-cassert, and all
> results from such an environment should be conspicuously identified as
> such.
> 
> (7)  I envision a process to create a test database, populate it, run a
> series of test cases with EXPLAIN ANALYZE, capture the results, parse
> the results and store them in a database, analyze the results to find
> means and standard deviations both overall and for each type of plan,
> and report summaries and outliers -- with references to the test cases.
> The primary statistic of interest is actual time divided by cost.  This
> seems like it would be of interest overall, and within the permutations
> mentioned above for a single query.
> 
> A reasonable set of test cases would illuminate where costing
> adjustments would provide the most benefit for the least risk.  By
> reining in the most extreme outliers, we could allow the planner to
> recognize the best plan among the options it is considering with
> greater accuracy.  This tool could be used as a regression test to
> ensure that a costing adjustment didn't distort the cost of something
> which had been accurately costed.
> 
> So, what do you think?
> 
> -Kevin
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: Don't 'kill -9' the postmaster
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: Bruce Momjian
Date:
Subject: Re: How TODO prevent PQfnumber() from lowercasing?
Next
From: Christopher Kings-Lynne
Date:
Subject: Re: Comments on columns in the pg_catalog tables/views