Re: Idea for the statistics collector - Mailing list pgsql-general
From | Doug Fields |
---|---|
Subject | Re: Idea for the statistics collector |
Date | |
Msg-id | 5.1.0.14.2.20020621105749.02b99b78@pop.pexicom.com Whole thread Raw |
In response to | Re: Idea for the statistics collector (Jan Wieck <JanWieck@Yahoo.com>) |
Responses |
Re: Idea for the statistics collector
|
List | pgsql-general |
>Tom Lane wrote: > > Bruce Momjian <pgman@candle.pha.pa.us> writes: > > > Martijn van Oosterhout wrote: > > >> Firstly, I was only thinking of going for the basic nodes (Index > Scan, Seq > > >> Scan, Distinct). Other types have far more variables. Secondly, even > if you > > >> only count, it's useful. For example, if it tells you that the > planner is > > >> off by a factor of 10 more than 75% of the time, that's useful > information > > >> independant of what the actual variables are. > > > > And reduce the usefulness even more ;-). As a pure stats-gathering > > exercise it might be worth doing, but not if you only log the failure > > cases. How will you know how well you are doing if you take a > > biased-by-design sample? Personally, given that it seems like at least once or twice a day someone asks about performance or "why isn't my index being used" and other stuff - I think doing this would be a great idea. Perhaps not necessarily in the full-fledged way, but creating a sort of "ANALYZE log," wherein it logs the optimizer's estimate of a query and the actual results of a query, for every query. This, of course, could be enableable/disableable on a per-connection basis, per-table basis (like OIDs), or whatever other basis makes life easiest to the developers. Then, when the next ANALYZE is run, it could do it's usual analysis, and apply some additional heuristics based upon what it learns from the "ANALYZE log," possibly to do several things: 1) Automatically increase/decrease the SET STATISTICS information included in the analyze, for example, increasing it as a table grows larger and the "randomness" grows less than linearly with size (e.g., if you have 50 or 60 groups in a 1,000,000 row table, that certainly needs a higher SET STATISTICS and I do it on my tables). 2) Have an additional value on the statistics table called the "index_heuristic" or "random_page_adjustment_heuristic" which when 1 does nothing, but otherwise modifies the cost of using an index/seq scan by that factor - and don't ever change this more than a few percent each ANALYZE 3) Flags in a second log (maybe the regular log) really bad query estimates - let it do an analysis of the queries and flag anything two or three std deviations outside. Now, I suggest all this stuff in the name of usability and self-maintainability. Unfortunately, I don't have the wherewithal to actually assist in development. Another possibility is to put "use_seq_scan" default to OFF, or whatever the parameter is (I did my optimizing a while ago so it's fading), so that if there's an index, it will use it, regardless - as this seems to be what the great majority of people expect to happen. And/or add this to a FAQ, and let us all reply "see http://.../indexfaq.html." :) Cheers, Doug
pgsql-general by date: