Re: Index Tuning Features - Mailing list pgsql-hackers
From | Simon Riggs |
---|---|
Subject | Re: Index Tuning Features |
Date | |
Msg-id | 1160559701.2659.1114.camel@holly Whole thread Raw |
In response to | Index Tuning Features (Simon Riggs <simon@2ndquadrant.com>) |
Responses |
Re: Index Tuning Features
|
List | pgsql-hackers |
Thanks everybody for comments so far; this will be a useful discussion. On Tue, 2006-10-10 at 18:56 -0400, Robert Treat wrote: > On Tuesday 10 October 2006 12:06, Tom Lane wrote: > > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > > list of indexes that need to be added to get the cheapest plan for a > > > particular query (no explain plan result though). > > > > Both of these seem to assume that EXPLAIN results, without EXPLAIN > > ANALYZE results to back them up, are sufficient for tuning. I find > > this idea a bit dubious, particularly for cases of "marginal" indexes. > > > > While I agree with Tom that generally EXPLAIN is not enough for tuning, I also > know that when your dealing with queries that have run times in multiples of > hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just > isn't an option. Anything that can be done to wheedle down your choices > before you have to run EXPLAIN ANALYZE is a bonus. IMHO you need EXPLAIN, EXPLAIN ANALYZE and RECOMMEND As Robert points out, using EA can make tuning take a long time and that is the critical factor when you have a whole database/app to tune. This discussion helps me to make explicit what my thoughts had been on what an ideal index tuning process is: 1. Recommendation: Use RECOMMEND to get an 80/20 setting for a statement. As Peter suggests a "user-space" tool, I also imagine a tool that would automatically run RECOMMEND on all SQL statements in a workload and come up with proposals for additional indexes. We would have a first cut index design in minutes rather than days. 2. Evaluation: We can then create the potential indexes as Virtual ones and then re-run EXPLAINs to model how a whole workload would behave. We can begin to prune low-impact indexes out of the mix at this stage. Again, this can be done automatically. 3. Implementation: We re-create the new indexes as real indexes (perhaps concurrently) 4. Correction: We then run the workload and then use existing tools to spot the statements causing the most problems and manually assess them using EXPLAIN ANALYZE. Manually postulate new Virtual indexes and re-model the workload again as (2) Steps (3) and (4) have both been improved for 8.2. Steps (1) and (2) are completely new steps for 8.3 The above process can be performed without tool support, but its clear that further automation will help greatly here. I foresee that the development of both server-side and tools will take more than one release. Discussion of tool support can begin once we have agreed server-side capability. With that as a backdrop, further comments are: On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Anything that can be done to wheedle down your choices > > before you have to run EXPLAIN ANALYZE is a bonus. > > Fair enough, but I prefer Peter's suggestion of attaching the > hypothetical index definitions to EXPLAIN itself, rather than making > bogus catalog entries. Something along the line of > > EXPLAIN <statement> > ASSUMING INDEX fooi ON foo .... > [ ASSUMING INDEX ... ] I do like this, though for step (2) above we would need to attach the appropriate indexes to each of the SQL statements prior to execution. Doing this for a single SQL statement is fine, but doing that for a whole workload of 1000s of statements is not very practical, hence an externally declarative approach seems better. I can imagine many other declarative approaches other than the one I proposed; it just seems pretty neat to me to use almost exactly the same syntax for a virtual index as for a real index. As I mentioned, ideally this would not be a full-strength catalog object, but I was thinking towards implementation also. Another possibility would be to use a local pg_virtual_indexes table. On Tue, 2006-10-10 at 18:06 +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > > > - RECOMMEND command > > > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > list of indexes that need to be added to get the cheapest plan for a > > particular query (no explain plan result though). > > This functionality also seems useful, but maybe it should be the job of > a user-space tool? So from above, Yes, I see a user-space tool also, but not instead. The RECOMMEND command is the minimal server functionality required to enable an (external) automated tuning support tool to be developed. Possible architectures for this functionality include both user-space and server-space options. Much thinking has been done on this in the DB research community, with the general consensus being its easier to extend the planner to cope with postulation that it is to create an external postulation tool that acts (accurately) like the planner. "DB2 advisor: An optimizer smart enough to recommend its own indexes." Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohnman, and Alan Skelley. In The 16th International Conference on Data Engineering (ICDE'00), San Diego, CA. IEEE Computer Society, February 2000. A wonderful summary of which is available here, but not sure if the full paper is publicly available for free. http://www.andrew.cmu.edu/user/ngm/15-823/summaries/08.pdf DB2 Design Advisor: Integrated Automatic Physical Database Design "DB2 Design Advisor: Integrated Automatic Physical Database Design" Zilio et al which is available at http://www.vldb.org/conf/2004/IND4P1.PDF#search=%22db2%20design% 20advisor%22 On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: > > Specifically, multi-column indexes are not considered very heavily in > > RECOMMEND. > > That seems like a bad idea as well --- multicol indexes are exactly the > sort of thing a novice DBA might fail to consider. If you're going to > do this then you should consider all cases. Calculating all index cases would follow the combinatorial explosion of sum(N!/(r!(N-r)!)) though we can argue about exactly what N is in this case. So we have the same problem as the main optimiser: exhaustive search is not practical, so we must find a heuristic that allows us to limit the search space so RECOMMEND doesn't run for too long. The "no multi-col indexes except FKs" is just a proposed heuristic, so happy to debate exactly what that heuristic should be. (There are various research papers available with proposed heuristics). Multi-col indexes are also subject to over-fitting, since RECOMMEND would be liable to return (for example) 7-column indexes as the best choice for a single query, which would be bad overall. I'd been thinking about this for some time: the virtual index concept fills in the gaps so that taken together, RECOMMEND and virtual indexes provide a reasonable toolset for both limiting search space and yet allowing more complex ideas to be tested. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
pgsql-hackers by date: