Re: A costing analysis tool - Mailing list pgsql-hackers
From | Kevin Grittner |
---|---|
Subject | Re: A costing analysis tool |
Date | |
Msg-id | s34e9be4.028@gwmta.wicourts.gov Whole thread Raw |
In response to | A costing analysis tool ("Kevin Grittner" <Kevin.Grittner@wicourts.gov>) |
Responses |
Re: A costing analysis tool
|
List | pgsql-hackers |
Thanks for the well wishes. It sounds like you were addressing a slightly different problem -- more ambitious than what I propose tackle do as a first step. If I understand you, you were trying to develop your own predictive costing formulas based on plans. I'm merely talking about a tool to evaluate the relative accuracy of the predictions generated by PostgreSQL. So for the purposes of the proposed tool, the PostgreSQL estimates are central. The ultimate goal is to be able to spot where the current calculations are least accurate, so that any adjustments can be made where they are most needed. You will notice that my working assumptions start with the observation that most of the time, PostgreSQL does very well. I have certainly found that to be the case, compared to a commercial product running the same queries against the same data. I'm operating on the assumption that relatively minor adjustments to current techniques can take off what rough edges there are. That said, there's certainly overlap between your effort and what I'm going to be developing. Do you have anything from your work which might save me some time? You point regarding a convenient way for people to submit results from diverse environments, with a nice web presentation of collated results is well taken. I'm not sure my "off-the-cuff" estimate from an earlier post allows enough time for that -- I'll have to be sure to include something at least adequate and expandable in the design. I probably won't try for anything too fancy. I'm at my best on frameworky internal sorts of things. There's a chance that I may be able to talk my client into putting a web app guy on this for a few days to make it pretty. You never know. Would it make more sense for my client to host something like that on their servers, or would it be more appropriate to have something which would install on a postgresql.org server? If the latter, what sorts of technologies are supported? (My client is partial to cocoon.) -Kevin >>> Martijn van Oosterhout <kleptog@svana.org> 10/13/05 2:41 PM >>> On Thu, Oct 13, 2005 at 01:52:10PM -0500, Kevin Grittner wrote: > Thanks, Josh, for the feedback. > > It sounds as though you are more focused on picking up costing > problems which happen during production -- which is clearly > valuable, but addresses a somewhat different set of needs than > I was looking at. That said, it seems like there is potential to share > signifcant code between the two techniques. We'll have to see if > we can work that out. Firstly, I really hope you get further with this than I did a while ago when I attempted. It's certainly a worthly goal. Secondly, while checking for problems in productions systems is good, it's not going to help with fixing the cost model. For that you need raw data. My basic plan was to setup tables of different sizes and attempt to run queries such as: - Index Scan on each table with different types of keys and coverage. - Seq Scan - Nested loop, etc... I did reach the point where I was wishing I could just give PostgreSQL the plan and tell it to execute it. :) The point of the exercise is to be able to derive correlations so you could from the plan calcuate the actual costs. For example, run a nested loop with an inner index scan once, twice, three times etc so we can actually *see* what the cache effects are. I got stuck on working out how to force the optimiser to produce the plan I want. I didn't try too hard though. The enable_xxx options should be enough, hopefully. Ofcourse you want to run it with different numbers of shared buffers to see how they affect the results. And then you ideally want the results for several different machines, different disk subsystems, memory types, etc and placed on a nice web page so other people can run correlations on the data themselves. This is essentially what you already came up with. Note that for these purposes the actual estimates by PostgreSQL are irrelevent. However, I strongly suggest finding a way of collating the results publically from lots of people because digging for correlations is something lots of people can hammer on and is really hard to program. Hope this helps,
pgsql-hackers by date: