Thread: automated index suggestor -- request for comment
The manual is pretty sparse on advice regarding indices. Plenty of good feature documentation, but not much about when and where an index is appropriate (except a suggestion that multi-column indices should be avoided). Of course, the ultimate arbiter of which indices are used is the planner/optimizer. If i could somehow convince the optimizer to consider indices that don't yet exist, it could tell me which would give the greatest benefit should i add them. So, i'm writing for two reasons. First, i want to gauge interest in this tool. Is this something that people would find useful? Second, i am looking to solicit some advice. Is this project even feasible? If so, where would be the best place to start? My assumption has been that i would need to hack into the current code for determining index paths, and spoof it somehow, but is that possible without actually creating the indices? Any and all feedback welcome. -johnnnnnnnnnn
On Fri, 2002-12-13 at 03:22, johnnnnnn wrote: > The manual is pretty sparse on advice regarding indices. Plenty of > good feature documentation, but not much about when and where an index > is appropriate (except a suggestion that multi-column indices should > be avoided). > > Of course, the ultimate arbiter of which indices are used is the > planner/optimizer. If i could somehow convince the optimizer to > consider indices that don't yet exist, it could tell me which would > give the greatest benefit should i add them. the generated index names should be self-explaining or else we would have to change EXPLAIN output code as well, just to tell what the actual index definition was. That could become the EXPLAIN SPECULATE command ? > So, i'm writing for two reasons. First, i want to gauge interest in > this tool. Is this something that people would find useful? Sure it would be helpful. > Second, i am looking to solicit some advice. Is this project even > feasible? As tom recently wrote on this list, no statistics is _gathered_ base on existence of indexes, so pretending that they are there should be limited just to planner changes plus a way to tell the planner to do it. > If so, where would be the best place to start? My assumption > has been that i would need to hack into the current code for > determining index paths, and spoof it somehow, but is that possible > without actually creating the indices? Either with or without real indexes, it's all just code ;) In worst case you could generate the entries in pg_class table without building the actual index and then drop or rollback when the explain is ready. Of course you could just determine all possibly useful indexes and generate then anyhow an then drop them if they were not used ;) -- Hannu Krosing <hannu@tm.ee>
Hannu Krosing <hannu@tm.ee> writes: > That could become the EXPLAIN SPECULATE command ? [ snicker... ] Seriously, it wouldn't be hard to inject a slew of phony index definitions into the planner to see what it comes up with. You just have to cons up an IndexOptInfo record, the planner will be none the wiser. The tricky part is deciding which indexes are even worth expending planner cycles on. ("Make 'em all" doesn't seem very practical when you consider multi-column or functional indexes.) Also, I don't see any reasonable way to automatically suggest partial indexes; certainly not on the basis of individual queries. The big boys approach this sort of problem with "workload analysis" tools, which start from a whole collection of sample queries not just one. I don't think EXPLAIN applied to individual queries can hope to produce similarly useful results. regards, tom lane
I cc'b back to list, hope this is ok? On Fri, 2002-12-13 at 14:56, george young wrote: > On 13 Dec 2002 12:05:36 +0000 > Hannu Krosing <hannu@tm.ee> wrote: > > > On Fri, 2002-12-13 at 03:22, johnnnnnn wrote: > > > > In worst case you could generate the entries in pg_class table without > > building the actual index and then drop or rollback when the explain is > > ready. > > > --> Of course you could just determine all possibly useful indexes and <-- > --> generate then anyhow an then drop them if they were not used ;) <-- > > Why not! At least for selects, this seems like the ideal. For insert > and update, you have to deal with updating the superfluous indexes -- > does the planner include index updating in its work estimates? Probably not - the work should be almost the same (modulo cached status of index pages) for any plan. At least I think we don't optimize the plan for different index access patterns for updating indexes. > > For queries > that use functions in the where clause, you'd have to parse enough to know > to include indexes on the functions (I know-- the last time I said "all I > have to do is parse ..." I was really sorry later...). -- Hannu Krosing <hannu@tm.ee>
On Fri, Dec 13, 2002 at 09:49:53AM -0500, Tom Lane wrote: > Hannu Krosing <hannu@tm.ee> writes: > > That could become the EXPLAIN SPECULATE command ? > > [ snicker... ] Seriously, it wouldn't be hard to inject a slew of > phony index definitions into the planner to see what it comes up > with. You just have to cons up an IndexOptInfo record, the planner > will be none the wiser. That's good news. The easier it is, the more likely i am to actually get it working and available to people. > The tricky part is deciding which indexes are even worth expending > planner cycles on. ("Make 'em all" doesn't seem very practical when > you consider multi-column or functional indexes.) Agreed. But for a first development iteration, "Make 'em all" could certainly include the combinatorial explosion of all single- and multi-column indices. It might be slow as a dog, but it would exist. > The big boys approach this sort of problem with "workload analysis" > tools, which start from a whole collection of sample queries not > just one. I don't think EXPLAIN applied to individual queries can > hope to produce similarly useful results. Again, agreed. My intent was to start with something simple which could only deal with one query at a time, and then build a more robust tool from that point. That said, i wasn't planning on grafting onto the EXPLAIN syntax, but rather creating a new SUGGEST command, which could take a query or eventually a workload file. The other option was to decouple it from pg proper and have an independent application to live in contrib/ or gborg. -johnnnnnnnnnnn
On Fri, Dec 13, 2002 at 05:00:32PM +0000, Hannu Krosing wrote: > On Fri, 2002-12-13 at 14:56, george young wrote: > > > Of course you could just determine all possibly useful indexes > > > and generate then anyhow an then drop them if they were not used > > > ;) > > > > Why not! At least for selects, this seems like the ideal. For > > insert and update, you have to deal with updating the superfluous > > indexes -- does the planner include index updating in its work > > estimates? Well, i had a few reasons i didn't want to *actually* create the indices: 1- Disk space. If it's evaluating all indices, including multi-column indices, that ends up being a significant space drain. 2- Time. Creating indices can take a while for big tables (again, moreso for multi-column indices). 3- Usability on running systems. If i can eliminate actual index creation, it won't tie up disk access on systems that are already dealing with high load. > At least I think we don't optimize the plan for different index > access patterns for updating indexes. I don't think that's the case either, which makes it more difficult to estimate negative cost of index creation. Not sure how i'll deal with that except by (for now) ignoring it. -johnnnnnnnnnnn
On Thu, 2002-12-12 at 21:22, johnnnnnn wrote: > The manual is pretty sparse on advice regarding indices. Plenty of > good feature documentation, but not much about when and where an index > is appropriate (except a suggestion that multi-column indices should > be avoided). > > Of course, the ultimate arbiter of which indices are used is the > planner/optimizer. If i could somehow convince the optimizer to > consider indices that don't yet exist, it could tell me which would > give the greatest benefit should i add them. > > So, i'm writing for two reasons. First, i want to gauge interest in > this tool. Is this something that people would find useful? > > Second, i am looking to solicit some advice. Is this project even > feasible? If so, where would be the best place to start? My assumption > has been that i would need to hack into the current code for > determining index paths, and spoof it somehow, but is that possible > without actually creating the indices? Isn't this what a DBA (or, heck, even a modestly bright developer) does during transactional analysis? You know what the INSERTs and statements-that-have-WHERE-clauses are, and, hopefully, approximately how often per day (or week) each should execute. Then, *you* make the decision about which single-key and multi-key indexes should be created, based upon a) the cardinality of each table b) the frequency each query (includes UPDATE & DELETE) is run c) how often INSERT statements occur Thus, for example, an OLTP database will have a significantly different mix of indexes than, say, a "reporting" database... -- +---------------------------------------------------------------+ | Ron Johnson, Jr. mailto:ron.l.johnson@cox.net | | Jefferson, LA USA http://members.cox.net/ron.l.johnson | | | | "My advice to you is to get married: If you find a good wife, | | you will be happy; if not, you will become a philosopher." | | Socrates | +---------------------------------------------------------------+