Thread: Index Tuning Features
For 8.3, I'd like to add the following two related features to assist with Index Tuning and usability: - Virtual Indexes An index which only exists in the catalog, so is visible to the planner but not the executor. This is useful where a specific SQL query is being hand-tuned, allowing very specific options to be selected. Virtual indexes would only be seen by the planner when performing an EXPLAIN and when enable_virtual_index = on (default: off, Userset). Normal SQL statements would ignore them completely, whatever enable_virtual_index is set to. It would not be possible to have both a virtual and a real index defined identically at the same time. (If facilities existed to make temporary tables exist only for a single backend, rather than requiring catalog access then that implementation route would also work here, but until that does, simple updates seem fine). SQL: CREATE [VIRTUAL] [UNIQUE] INDEX ... - RECOMMEND command Similar in usage to an EXPLAIN, the RECOMMEND command would return a list of indexes that need to be added to get the cheapest plan for a particular query (no explain plan result though). At planning time, all possible single column indexes would be assumed to exist, plus all groups of cols that make up a multi-col Foreign Key would be assumed to make a multi-col index. (PKs always exist, remember). We track whether hypothetical indexes exist on the plan, so once the cheapest plan has been decided we can report what they are (if any). Hypothetical indexes last only for the duration of planning - no catalog changes are made. Command will return 1 row per selected index (can be more than one for a complex query), first col gives list of indexed cols, second col shows the SQL required to create that index. Virtual indexes will be noted, though treated identically to hypothetical indexes. The changes to do this would not be very invasive to the planner and mainly involve adding additional fields to the planner data structures, some additional branching code and command changes/additions. Overall we need both of these new features: RECOMMEND covers many cases in an easy to use form, with VIRTUAL indexes covers the rest of the search space for possible new indexes for specific cases. There's a host of other little tweaky bits we might imagine to enhance this capability further, but this seems to cover the basic requirements. Specifically, multi-column indexes are not considered very heavily in RECOMMEND. This is deliberate because a) we don't have good multi-col interaction stats (though we might have for 8.3?) b) it greatly increases the run-time of exhaustive searching and c) because we have bitmap index interaction the usefulness of multi-column indexes is much reduced anyhow, so cost/benefit not good. Comments? (I'll do a summary of feedback tomorrow.) -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
Simon Riggs wrote: > For 8.3, I'd like to add the following two related features to assist > with Index Tuning and usability: > > - Virtual Indexes This seems useful, but I'm not sure we need a catalog object for that. It might be sufficient to declare these hypothetical indexes within the EXPLAIN command. That is after all the only place where they are applied. > - RECOMMEND command > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > list of indexes that need to be added to get the cheapest plan for a > particular query (no explain plan result though). This functionality also seems useful, but maybe it should be the job of a user-space tool? -- Peter Eisentraut http://developer.postgresql.org/~petere/
Simon Riggs <simon@2ndquadrant.com> writes: > - Virtual Indexes > An index which only exists in the catalog, so is visible to the planner > but not the executor. Say what? What would that possibly be useful for, other than crashing any bit of code that failed to know about it? > - RECOMMEND command > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > list of indexes that need to be added to get the cheapest plan for a > particular query (no explain plan result though). Both of these seem to assume that EXPLAIN results, without EXPLAIN ANALYZE results to back them up, are sufficient for tuning. I find this idea a bit dubious, particularly for cases of "marginal" indexes. > Specifically, multi-column indexes are not considered very heavily in > RECOMMEND. That seems like a bad idea as well --- multicol indexes are exactly the sort of thing a novice DBA might fail to consider. If you're going to do this then you should consider all cases. regards, tom lane
Peter Eisentraut wrote: > Simon Riggs wrote: >> For 8.3, I'd like to add the following two related features to assist >> with Index Tuning and usability: >> >> - Virtual Indexes > > This seems useful, but I'm not sure we need a catalog object for that. > It might be sufficient to declare these hypothetical indexes within the > EXPLAIN command. That is after all the only place where they are > applied. > >> - RECOMMEND command >> >> Similar in usage to an EXPLAIN, the RECOMMEND command would return a >> list of indexes that need to be added to get the cheapest plan for a >> particular query (no explain plan result though). > > This functionality also seems useful, but maybe it should be the job of > a user-space tool? On this same vein I thought it would be interesting if we added a suggestion to explain analyze... Something like: Your estimated number of rows appears to be off. Have you ran analyze lately? Sincerely, Joshua D. Drake -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
On Tue, Oct 10, 2006 at 06:06:09PM +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > > For 8.3, I'd like to add the following two related features to assist > > with Index Tuning and usability: > > > > - Virtual Indexes > > This seems useful, but I'm not sure we need a catalog object for that. > It might be sufficient to declare these hypothetical indexes within the > EXPLAIN command. That is after all the only place where they are > applied. If you wanted to try multiple scenarios, that might become a pain. I guess it depends on how verbose the syntax was... > > - RECOMMEND command > > > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > list of indexes that need to be added to get the cheapest plan for a > > particular query (no explain plan result though). > > This functionality also seems useful, but maybe it should be the job of > a user-space tool? I think it makes the most sense to have this in core, though I guess an argument could be made for having it be seperate from the backend. But it'd have to be easy to call from an external tool, such as pgAdmin, which means in probably needs to speak libpq. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
On Tuesday 10 October 2006 12:06, Tom Lane wrote: > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > list of indexes that need to be added to get the cheapest plan for a > > particular query (no explain plan result though). > > Both of these seem to assume that EXPLAIN results, without EXPLAIN > ANALYZE results to back them up, are sufficient for tuning. I find > this idea a bit dubious, particularly for cases of "marginal" indexes. > While I agree with Tom that generally EXPLAIN is not enough for tuning, I also know that when your dealing with queries that have run times in multiples of hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just isn't an option. Anything that can be done to wheedle down your choices before you have to run EXPLAIN ANALYZE is a bonus. -- Robert Treat Build A Brighter LAMP :: Linux Apache {middleware} PostgreSQL
Robert Treat <xzilla@users.sourceforge.net> writes: > Anything that can be done to wheedle down your choices > before you have to run EXPLAIN ANALYZE is a bonus. Fair enough, but I prefer Peter's suggestion of attaching the hypothetical index definitions to EXPLAIN itself, rather than making bogus catalog entries. Something along the line of EXPLAIN <statement> ASSUMING INDEX fooi ON foo .... [ ASSUMING INDEX ... ] although this exact syntax probably doesn't work unless we're willing to make ASSUMING a fully reserved word :-( I have some vague recollection that this idea has been discussed before... regards, tom lane
> Simon Riggs <simon@2ndquadrant.com> writes: >> - RECOMMEND command > >> Similar in usage to an EXPLAIN, the RECOMMEND command would return a >> list of indexes that need to be added to get the cheapest plan for a >> particular query (no explain plan result though). > > Both of these seem to assume that EXPLAIN results, without EXPLAIN > ANALYZE results to back them up, are sufficient for tuning. I find > this idea a bit dubious, particularly for cases of "marginal" indexes. I think the idea of "virtual indexes" is pretty interesting, but ultimately a lesser solution to a more fundimental issue, and that would be "hands on" control over the planner. Estimating the effect of an index on a query "prior" to creating the index is a great idea, how that is done is something different than building concensus that it should be done. Another thing that this brings up is "hints" to a query. Over the years, I have run into situation where the planner wasn't great. It would be nice to try forcing different strategies on the planner and see if performance caan be improved.
Mark, > Another thing that this brings up is "hints" to a query. Over the years, > I have run into situation where the planner wasn't great. It would be > nice to try forcing different strategies on the planner and see if > performance caan be improved. See discussion on -performance. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote: > I think the idea of "virtual indexes" is pretty interesting, but > ultimately a lesser solution to a more fundimental issue, and that would > be "hands on" control over the planner. Estimating the effect of an index > on a query "prior" to creating the index is a great idea, how that is done > is something different than building concensus that it should be done. > > Another thing that this brings up is "hints" to a query. Over the years, I > have run into situation where the planner wasn't great. It would be nice > to try forcing different strategies on the planner and see if performance > caan be improved. > you can do this by setting enable_"access_method" type parameters. -- regards, Jaime Casanova "Programming today is a race between software engineers striving to build bigger and better idiot-proof programs and the universe trying to produce bigger and better idiots. So far, the universe is winning." Richard Cook
On Tue, 2006-10-10 at 20:17 -0400, Mark Woodward wrote: > Another thing that this brings up is "hints" to a query. Over the > years, I > have run into situation where the planner wasn't great. It would be > nice > to try forcing different strategies on the planner and see if > performance > caan be improved. /*+ Not on this thread, p-l-e-a-s-e */ -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> > Another thing that this brings up is "hints" to a query. Over the > > years, I have run into situation where the planner wasn't > great. It > > would be nice to try forcing different strategies on the > planner and > > see if performance caan be improved. > > > > you can do this by setting enable_"access_method" type parameters. No, not generally. Usual problems include join order and wrong index, not only wrong access method. Andreas
Thanks everybody for comments so far; this will be a useful discussion. On Tue, 2006-10-10 at 18:56 -0400, Robert Treat wrote: > On Tuesday 10 October 2006 12:06, Tom Lane wrote: > > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > > list of indexes that need to be added to get the cheapest plan for a > > > particular query (no explain plan result though). > > > > Both of these seem to assume that EXPLAIN results, without EXPLAIN > > ANALYZE results to back them up, are sufficient for tuning. I find > > this idea a bit dubious, particularly for cases of "marginal" indexes. > > > > While I agree with Tom that generally EXPLAIN is not enough for tuning, I also > know that when your dealing with queries that have run times in multiples of > hours (and the corresponding hour long index builds) EXPLAIN ANALYZE just > isn't an option. Anything that can be done to wheedle down your choices > before you have to run EXPLAIN ANALYZE is a bonus. IMHO you need EXPLAIN, EXPLAIN ANALYZE and RECOMMEND As Robert points out, using EA can make tuning take a long time and that is the critical factor when you have a whole database/app to tune. This discussion helps me to make explicit what my thoughts had been on what an ideal index tuning process is: 1. Recommendation: Use RECOMMEND to get an 80/20 setting for a statement. As Peter suggests a "user-space" tool, I also imagine a tool that would automatically run RECOMMEND on all SQL statements in a workload and come up with proposals for additional indexes. We would have a first cut index design in minutes rather than days. 2. Evaluation: We can then create the potential indexes as Virtual ones and then re-run EXPLAINs to model how a whole workload would behave. We can begin to prune low-impact indexes out of the mix at this stage. Again, this can be done automatically. 3. Implementation: We re-create the new indexes as real indexes (perhaps concurrently) 4. Correction: We then run the workload and then use existing tools to spot the statements causing the most problems and manually assess them using EXPLAIN ANALYZE. Manually postulate new Virtual indexes and re-model the workload again as (2) Steps (3) and (4) have both been improved for 8.2. Steps (1) and (2) are completely new steps for 8.3 The above process can be performed without tool support, but its clear that further automation will help greatly here. I foresee that the development of both server-side and tools will take more than one release. Discussion of tool support can begin once we have agreed server-side capability. With that as a backdrop, further comments are: On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Anything that can be done to wheedle down your choices > > before you have to run EXPLAIN ANALYZE is a bonus. > > Fair enough, but I prefer Peter's suggestion of attaching the > hypothetical index definitions to EXPLAIN itself, rather than making > bogus catalog entries. Something along the line of > > EXPLAIN <statement> > ASSUMING INDEX fooi ON foo .... > [ ASSUMING INDEX ... ] I do like this, though for step (2) above we would need to attach the appropriate indexes to each of the SQL statements prior to execution. Doing this for a single SQL statement is fine, but doing that for a whole workload of 1000s of statements is not very practical, hence an externally declarative approach seems better. I can imagine many other declarative approaches other than the one I proposed; it just seems pretty neat to me to use almost exactly the same syntax for a virtual index as for a real index. As I mentioned, ideally this would not be a full-strength catalog object, but I was thinking towards implementation also. Another possibility would be to use a local pg_virtual_indexes table. On Tue, 2006-10-10 at 18:06 +0200, Peter Eisentraut wrote: > Simon Riggs wrote: > > > - RECOMMEND command > > > > Similar in usage to an EXPLAIN, the RECOMMEND command would return a > > list of indexes that need to be added to get the cheapest plan for a > > particular query (no explain plan result though). > > This functionality also seems useful, but maybe it should be the job of > a user-space tool? So from above, Yes, I see a user-space tool also, but not instead. The RECOMMEND command is the minimal server functionality required to enable an (external) automated tuning support tool to be developed. Possible architectures for this functionality include both user-space and server-space options. Much thinking has been done on this in the DB research community, with the general consensus being its easier to extend the planner to cope with postulation that it is to create an external postulation tool that acts (accurately) like the planner. "DB2 advisor: An optimizer smart enough to recommend its own indexes." Gary Valentin, Michael Zuliani, Daniel C. Zilio, Guy M. Lohnman, and Alan Skelley. In The 16th International Conference on Data Engineering (ICDE'00), San Diego, CA. IEEE Computer Society, February 2000. A wonderful summary of which is available here, but not sure if the full paper is publicly available for free. http://www.andrew.cmu.edu/user/ngm/15-823/summaries/08.pdf DB2 Design Advisor: Integrated Automatic Physical Database Design "DB2 Design Advisor: Integrated Automatic Physical Database Design" Zilio et al which is available at http://www.vldb.org/conf/2004/IND4P1.PDF#search=%22db2%20design% 20advisor%22 On Tue, 2006-10-10 at 19:15 -0400, Tom Lane wrote: > > Specifically, multi-column indexes are not considered very heavily in > > RECOMMEND. > > That seems like a bad idea as well --- multicol indexes are exactly the > sort of thing a novice DBA might fail to consider. If you're going to > do this then you should consider all cases. Calculating all index cases would follow the combinatorial explosion of sum(N!/(r!(N-r)!)) though we can argue about exactly what N is in this case. So we have the same problem as the main optimiser: exhaustive search is not practical, so we must find a heuristic that allows us to limit the search space so RECOMMEND doesn't run for too long. The "no multi-col indexes except FKs" is just a proposed heuristic, so happy to debate exactly what that heuristic should be. (There are various research papers available with proposed heuristics). Multi-col indexes are also subject to over-fitting, since RECOMMEND would be liable to return (for example) 7-column indexes as the best choice for a single query, which would be bad overall. I'd been thinking about this for some time: the virtual index concept fills in the gaps so that taken together, RECOMMEND and virtual indexes provide a reasonable toolset for both limiting search space and yet allowing more complex ideas to be tested. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> The above process can be performed without tool support, but its clear > that further automation will help greatly here. I foresee that the > development of both server-side and tools will take more than one > release. Discussion of tool support can begin once we have agreed > server-side capability. If it came to automated tools, wouldn't fit in this discussion to give some performance requirement limits to the RECOMMEND tool ? In a workload not all queries are real time or high priority, and such a lesser impact index can help enough sometimes to meet the requirements, compared to a high impact index which would make the query fly. Example: inserting in a table must be real time, reporting can be taken offline... So it would be nice to have a recommendation tool which can take into account the performance requirements of the individual queries, possibly making the right compromises to meat all requirements for all queries. Cheers, Csaba.
> On 10/10/06, Mark Woodward <pgsql@mohawksoft.com> wrote: >> I think the idea of "virtual indexes" is pretty interesting, but >> ultimately a lesser solution to a more fundimental issue, and that would >> be "hands on" control over the planner. Estimating the effect of an >> index >> on a query "prior" to creating the index is a great idea, how that is >> done >> is something different than building concensus that it should be done. >> >> Another thing that this brings up is "hints" to a query. Over the years, >> I >> have run into situation where the planner wasn't great. It would be >> nice >> to try forcing different strategies on the planner and see if >> performance >> caan be improved. >> > > you can do this by setting enable_"access_method" type parameters. Here's your hammer, all your problems are now nails. The enable_xxx setting are OK for simple queries gone wrong, but if you have a more complex query, any one of those settins may help or hinder different parts of a query, then you would be left with choosing which of them helps more than hurts the over-all query. being able to alter the query plan would help in areas where there are data patterns in a database that the ANALYZE command can't pick up because it is not designed too. Imagine you have a street map database ordered by zip, street, number. The primary order is zipcode, the secondary order is street. There is a relationship of number to street, and zip to street. The analyzer, at least the last time I checked, does not recognize these relationships. So, a search by street and number would probably use a sequential scan rather than the street index.
"Mark Woodward" <pgsql@mohawksoft.com> writes: > The analyzer, at least the last time I checked, does not recognize these > relationships. The analyzer is imperfect but arguing from any particular imperfection is weak because someone will just come back and say we should work on that problem -- though I note nobody's actually volunteering to do so whereas they appear to be for hints. I think the stronger argument is to say that there are some statistical properties that the analyzer _cannot_ be expected to figure out. Either because a) they're simply too complex to ever expect to be able to find automatically, b) too expensive to make it worthwhile in the general case, or c) because of some operational issue such as the data changing frequently enough that the analyzes that would be necessaryto keep the statistics up to date would become excessively expensive or even be impossible to perform rapidlyenough. The people arguing that hints themselves are of negative benefit are taking the argument far too far. I've never heard an Oracle DBA gripe about having to fix hints on an upgrade; they're usually the first ones to suggest hinting a poorly written query. In fact Oracle is going in the opposite direction of even relying on hints internally. Its plan stability feature depends on generating and storing hints internally associated with every query. The argument against hints is usually that the effort would be better spent elsewhere, not that hints are inherently a bad idea. We already have enable_* parameters and they are absolutely necessary for testing and experimenting to understand whether the planner is incorrect and where it has gone wrong. Hints are just a more precisely targeted version of these. There have been plenty of instances on this list where people posted 20-30 line query plans with several joins of each type where the enable_* parameters were too coarse grained to use effectively. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> > "Mark Woodward" <pgsql@mohawksoft.com> writes: > >> The analyzer, at least the last time I checked, does not recognize these >> relationships. > > The analyzer is imperfect but arguing from any particular imperfection is > weak > because someone will just come back and say we should work on that problem > -- > though I note nobody's actually volunteering to do so whereas they appear > to > be for hints. > > I think the stronger argument is to say that there are some statistical > properties that the analyzer _cannot_ be expected to figure out. Either > because > > a) they're simply too complex to ever expect to be able to find > automatically, > > b) too expensive to make it worthwhile in the general case, or > > c) because of some operational issue such as the data changing frequently > enough that the analyzes that would be necessary to keep the statistics > up > to date would become excessively expensive or even be impossible to > perform rapidly enough. Well, from a purely data domain standpoint, it is impossible to charactize the exact nature of a data set without enough information to recreate it. Anything less must be designed for a fixed set of assumptions. There is no way that every specific trend can be covered by a fixed number of assumptions. The argument that all we need is better statistics completely misses the point. There will *always* be a number cases where the planner will not work optimally. I would say that a "simpler" planner with better hints will always be capable of creating a better query plan.
"Mark Woodward" <pgsql@mohawksoft.com> writes: > I would say that a "simpler" planner with better hints > will always be capable of creating a better query plan. This is demonstrably false: all you need is an out-of-date hint, and you can have a worse plan. The argument against hints is not about whether someone could knock together a crappy hint facility and be able to get some use out of it. It is about how much work it would take to design a *good* hint facility that makes it easy to maintain hints that are robust in the face of data and query changes. If someone were to sit down and design and build such a thing, it'd very likely get accepted into core Postgres --- but personally, I think the equivalent amount of effort would be better spent on improving the planner and the statistics. As Josh already noted, Oracle-like hints are pretty likely to get rejected ... not only because of doubts about their true usefulness, but out of fear of falling foul of some Oracle patent or other. regards, tom lane
> "Mark Woodward" <pgsql@mohawksoft.com> writes: >> I would say that a "simpler" planner with better hints >> will always be capable of creating a better query plan. > > This is demonstrably false: all you need is an out-of-date hint, and > you can have a worse plan. That doesn't make it false, it makes it higher maintenance. Hints are understood to require maintenance. > > The argument against hints is not about whether someone could knock > together a crappy hint facility and be able to get some use out of it. > It is about how much work it would take to design a *good* hint facility > that makes it easy to maintain hints that are robust in the face of data > and query changes. If someone were to sit down and design and build > such a thing, it'd very likely get accepted into core Postgres --- but > personally, I think the equivalent amount of effort would be better > spent on improving the planner and the statistics. While it is always true that something can be improved, there comes a point where work outweighs benefits. I can't say that the planner is at that point, but I think that isn't even an issue. The notion of hints would probably one of the biggest steps toward improving the planner. Like I said, it is inarguable that there will always be queries that the planner can not execute efficiently based on the statistics gathered by analze. Since that number must be greater than zero, some methodology to deal with it should be created. > > As Josh already noted, Oracle-like hints are pretty likely to get > rejected ... not only because of doubts about their true usefulness, > but out of fear of falling foul of some Oracle patent or other. Well, if it would get rejected if it looked like Oracle, assuming you would probably be one of the people rejecting it, what do you envision as not being rejected?
On Wed, Oct 11, 2006 at 12:40:42PM -0400, Gregory Stark wrote: > poorly written query. In fact Oracle is going in the opposite direction of > even relying on hints internally. Its plan stability feature depends on > generating and storing hints internally associated with every query. But IBM, whose DB2 planner and optimiser is generally regarded as way better than Oracle's (at least by anyone I know who's used both), doesn't like hints. The IBM people all say the same thing Tom has said before: that the work to design the thing correctly is better spent making the planner and optimiser parts smarter and cheaper, because out of that work you also manage not to have the DBA accidentally mess things up by simple-minded rule-based hints. (Note that I'm not trying to wade into the actual argument; I'm just pointing out that even the biggest industry people don't agree on this point.) A -- Andrew Sullivan | ajs@crankycanuck.ca A certain description of men are for getting out of debt, yet are against all taxes for raising money to pay it off. --Alexander Hamilton
On Wed, Oct 11, 2006 at 03:27:19PM -0400, Mark Woodward wrote: > improving the planner. Like I said, it is inarguable that there will > always be queries that the planner can not execute efficiently based on > the statistics gathered by analze. Since that number must be greater than > zero, some methodology to deal with it should be created. Just because I'm one of those statistics true believers, what sort of information do you think it is possible for the DBA to take into consideration, when building a hint, that could not in principle be gathered efficiently by a statistics system? It seems to me that you're claiming that DBAs can have magic knowledge. While I would be delighted to learn that my thumb in the air guesses in the past had turned out to be due to my deep knowledge of my data, I'm instead unhappily confessing that what I really, really wanted when I made those guesses was better knowledge, based on some analysis of the data. A -- Andrew Sullivan | ajs@crankycanuck.ca The whole tendency of modern prose is away from concreteness. --George Orwell
On Oct 11, 2006, at 3:00 PM, Andrew Sullivan wrote: > On Wed, Oct 11, 2006 at 12:40:42PM -0400, Gregory Stark wrote: >> poorly written query. In fact Oracle is going in the opposite >> direction of >> even relying on hints internally. Its plan stability feature >> depends on >> generating and storing hints internally associated with every query. > > But IBM, whose DB2 planner and optimiser is generally regarded as way > better than Oracle's (at least by anyone I know who's used both), > doesn't like hints. The IBM people all say the same thing Tom has > said before: that the work to design the thing correctly is better > spent making the planner and optimiser parts smarter and cheaper, > because out of that work you also manage not to have the DBA > accidentally mess things up by simple-minded rule-based hints. (Note > that I'm not trying to wade into the actual argument; I'm just > pointing out that even the biggest industry people don't agree on > this point.) DBAs can mess things up already if they misuse the tools they are provided. Like 'rm'. Which is there, but should _RARELY_ be used on database datafiles. The argument that people _could_ use them in a bad way is silly. Of course, they could use them in a bad way, that's not an _argument_. Everyone agrees people can be stupid. However, the planner will never be perfect. I would like to see 1 out of every 500,000 queries actually benefit from a hint system (which means that 499,999 of the queries were planned perfectly fine by the planner). To fix my one query, that is crucially important to my business, it is a much more sane approach to hint the system to change its plan than it is to have to upgrade my binaries. // Theo Schlossnagle // CTO -- http://www.omniti.com/~jesus/ // OmniTI Computer Consulting, Inc. -- http://www.omniti.com/
Simon, The University of North Carolina (I think?) did some nice work on not only hypothetical indexes, but hypothetical materialized views (as well as really materialized view planner selection). Have you looked at that work? I think I forwarded the paper & code to Jonah at one point ... -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Mark, First off, I'm going to request that you (and other people) stop hijacking Simon's thread on hypothetical indexes. Hijacking threads is an effective way to get your ideas rejected out of hand, just because the people whose thread you hijacked are angry with you. So please observe the thread split, thanks. > Well, if it would get rejected if it looked like Oracle, assuming you > would probably be one of the people rejecting it, what do you envision > as not being rejected? Something "better than Oracle". Since you're the one who wants hints, that's kind of up to you to define. Write a specification and make a proposal. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco
Andrew Sullivan wrote: > Just because I'm one of those statistics true believers, what sort of > information do you think it is possible for the DBA to take into > consideration, when building a hint, that could not in principle be > gathered efficiently by a statistics system? It seems to me that > you're claiming that DBAs can have magic knowledge. Is one example is the table of addresses clustered by zip-code and indexes on State, City, County, etc? The current statistics systems at least see no correlation between these fields (since the alphabetical ordering of cities and numbering of postal codes is quite different). This makes the planner under-use the indexes because it sees no correlation and overestimates the number of pages read and the random accesses needed. However since San Francisco, CA data happens to be tightly packed on a few pages (since it shares the same few zip codes), few pages are needed and mostly sequential access could be used when querying SF data -- though the optimizer guesses most pages in the table may be hit, so often ignores the indexes. Now I'm not saying that a more advanced statistics system couldn't one-day be written that sees these patterns in the data -- but it doesn't seem likely in the near term. DBA-based hints could be a useful interim work-around.
Tom Lane <tgl@sss.pgh.pa.us> writes: > Robert Treat <xzilla@users.sourceforge.net> writes: > > Anything that can be done to wheedle down your choices > > before you have to run EXPLAIN ANALYZE is a bonus. > > Fair enough, but I prefer Peter's suggestion of attaching the > hypothetical index definitions to EXPLAIN itself, rather than making > bogus catalog entries. Something along the line of While I do like avoiding the bogus catalog entries and attaching the declarations to the explain plan. One advantage of that is that I can see extending it to handling "IGNORING INDEX foo" as well which may be just as important. One disadvantage is that it doesn't let you gather any statistics related to the new index to see what the plan would really be. "But indexes don't influence statistics" I can hear already from the chorus. But the reason we have indexes not affecting planning is precisely because we don't want to require an analyze after creating an index before it's used. Which these bogus entries would resolve. If we had the ability to create bogus indexes it would kill two birds with one stone. You could use that as the facility for noting which multi-column combinations are interesting. You would create your proposed index, then run ANALYZE and EXPLAIN to your heart's content. When you have it set up just so then you REINDEX your index and you're set. We already have these "bogus" indexes incidentally, we just create the index with indisvalid=f. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
> Mark, > > First off, I'm going to request that you (and other people) stop hijacking > Simon's thread on hypothetical indexes. Hijacking threads is an > effective way to get your ideas rejected out of hand, just because the > people whose thread you hijacked are angry with you. > > So please observe the thread split, thanks. > >> Well, if it would get rejected if it looked like Oracle, assuming you >> would probably be one of the people rejecting it, what do you envision >> as not being rejected? > > Something "better than Oracle". > > Since you're the one who wants hints, that's kind of up to you to define. > Write a specification and make a proposal. > What is the point of writing a proposal if there is a threat of "will be rejected" if one of the people who would do the rejection doesn't at least outline what would be acceptable?
>> >> Since you're the one who wants hints, that's kind of up to you to define. >> Write a specification and make a proposal. >> > > What is the point of writing a proposal if there is a threat of "will be > rejected" if one of the people who would do the rejection doesn't at least > outline what would be acceptable? Oh come on Mark, you have been here long enough to know how this works. You define what you would like to see and submit it for feedback. -Hackers submit feedback, you refine and the cyle continues till either -hackers determine it just isn't going to happen (packages), the would be hacker gives up, or a workable plan comes out of the discussion. Joshua D. Drake > > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/
Greg Stark <gsstark@mit.edu> writes: > You would create your proposed index, then run ANALYZE and EXPLAIN to your > heart's content. When you have it set up just so then you REINDEX your index > and you're set. And when you realize you don't want it after all ... you need an exclusive lock on the table to drop it. (Yes, you would, see relcache load.) The advantage of keeping this idea all inside EXPLAIN is that there's guaranteed to be no interference with anything else. > We already have these "bogus" indexes incidentally, we just create the index > with indisvalid=f. Au contraire, that is something completely different. indisvalid=f is really the exact opposite: it's not there to the planner and it is there to the executor. As for the statistics business: really, we use the presence of an index as a hint to gather certain kinds of stats about its underlying table. If we had (ahem) statistical hints then we could gather appropriate data with or without a real associated index. That sort of feature would have additional uses, ie, being able to estimate selectivities more accurately for expressions that might not have anything to do with any of the indexes on a table. regards, tom lane
>>> >>> Since you're the one who wants hints, that's kind of up to you to >>> define. >>> Write a specification and make a proposal. >>> >> >> What is the point of writing a proposal if there is a threat of "will be >> rejected" if one of the people who would do the rejection doesn't at >> least >> outline what would be acceptable? > > Oh come on Mark, you have been here long enough to know how this works. Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great system, and while I completely respect the individuals involved, I think the "management" for lack of a better term, is difficult.
Mark Woodward wrote: >>>> Since you're the one who wants hints, that's kind of up to you to >>>> define. >>>> Write a specification and make a proposal. >>>> >>> What is the point of writing a proposal if there is a threat of "will be >>> rejected" if one of the people who would do the rejection doesn't at >>> least >>> outline what would be acceptable? >> Oh come on Mark, you have been here long enough to know how this works. > > Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great > system, and while I completely respect the individuals involved, I think > the "management" for lack of a better term, is difficult. Well that is the nature of FOSS development. If you think we are bad.. I could easily list half a dozen that are worse ;) A couple of the much larger then us. Joshua D. Drake > > > ---------------------------(end of broadcast)--------------------------- > TIP 4: Have you searched our list archives? > > http://archives.postgresql.org > -- === The PostgreSQL Company: Command Prompt, Inc. === Sales/Support: +1.503.667.4564 || 24x7/Emergency: +1.800.492.2240 Providing the most comprehensive PostgreSQL solutionssince 1997 http://www.commandprompt.com/ Donate to the PostgreSQL Project: http://www.postgresql.org/about/donate
Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into her beard: >> Mark, >> >> First off, I'm going to request that you (and other people) stop >> hijacking Simon's thread on hypothetical indexes. Hijacking >> threads is an effective way to get your ideas rejected out of hand, >> just because the people whose thread you hijacked are angry with >> you. >> >> So please observe the thread split, thanks. >> >>> Well, if it would get rejected if it looked like Oracle, assuming >>> you would probably be one of the people rejecting it, what do you >>> envision as not being rejected? >> >> Something "better than Oracle". >> >> Since you're the one who wants hints, that's kind of up to you to >> define. Write a specification and make a proposal. > > What is the point of writing a proposal if there is a threat of > "will be rejected" if one of the people who would do the rejection > doesn't at least outline what would be acceptable? If your proposal is merely "let's do something like Oracle," it should be obvious why that would be rejected. There is considerable legal danger to slavish emulation. Further, since PostgreSQL isn't Oracle, slavish emulation wouldn't work anyways. If a proposal is too fuzzy to be considered a source of a specification, it should be obvious that that would be rejected. If you have an idea clear enough to turn into a meaningful proposal, put it in for the usual "to and fro"; that generally leads to enormous improvements. I'm not sure what a good hinting system ought to look like; what I *do* know is that a fuzzy proposal won't be much good. -- (format nil "~S@~S" "cbbrowne" "acm.org") http://linuxfinances.info/info/postgresql.html The quickest way to a man's heart is through his chest, with an axe.
"Mark Woodward" <pgsql@mohawksoft.com> writes: > What is the point of writing a proposal if there is a threat of "will be > rejected" if one of the people who would do the rejection doesn't at least > outline what would be acceptable? FWIW, I said some things about what I'd consider a good design in that other hints thread on pgsql-performance. regards, tom lane
* Andrew Sullivan: > Just because I'm one of those statistics true believers, what sort of > information do you think it is possible for the DBA to take into > consideration, when building a hint, that could not in principle be > gathered efficiently by a statistics system? Some statistics are very hard to gather from a sample, e.g. the number of distinct values in a column. -- Florian Weimer <fweimer@bfk.de> BFK edv-consulting GmbH http://www.bfk.de/ Durlacher Allee 47 tel: +49-721-96201-1 D-76131 Karlsruhe fax: +49-721-96201-99
On Wed, 2006-10-11 at 19:18 -0400, Mark Woodward wrote: > > > > Since you're the one who wants hints, that's kind of up to you to define. > > Write a specification and make a proposal. > > > > What is the point of writing a proposal if there is a threat of "will be > rejected" if one of the people who would do the rejection doesn't at least > outline what would be acceptable? The general theme of other discussions has been that the best approach is to provide additional information in a general declarative form. Further details on that have not yet been proposed. A hint touches a single SQL statement, so decorating 1000s of statements with exact tips about what to do is both time consuming and eventually inaccurate. Yet after all that work, the planner still doesn't know why you thought the hint was the right thing to do and so the 1001st query will perform poorly. AFAICS hints are a legacy code compatibility issue, not something truly desirable in the long run. Once you introduce them you must honour them across 10+ years of releases and then you remove any chance of improved optimisations speeding up applications in the future. Support for such tricks is possibly a different issue from encouraging their use; if we did support them I would welcome the day when enable_hints = off is the default and would discourage their general use where possible. We may be following other products in some ways, so that gives us an opportunity to learn from both the useful lessons and the mistakes. Deciding which is which is the hard part, IMHO. The *right* place, IMHO, for planner information is to decorate the tables, columns and relationships so that *every* SQL statement can pick that up. If the world changes, you make one change and all your SQL benefits. As the analyzers improve, you may be able to just remove those declarations entirely but generally I imagine the DB designer will for many years know things that cannot be determined by an analyzer. Some might say this is a EndUserDeveloper v DBA v InternalsHacker issue and I might agree, but would side with the DBAs on this. I'm not aware of any research specifically in that area - though I know many proposals have been made for various kinds of learning optimizer. Thats dandy, but you'll still need an infrastructure to support what has been learned and use it to override the more general analyzer info. So a manual declarative approach seems like the first step in that direction. So, I'm interested to hear various possible declarative approaches and will assist where I can with that. /*+ we might be able to use some special functions to do this, rather than more SQL */ -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
> Clinging to sanity, pgsql@mohawksoft.com ("Mark Woodward") mumbled into > her beard: >> What is the point of writing a proposal if there is a threat of >> "will be rejected" if one of the people who would do the rejection >> doesn't at least outline what would be acceptable? > > If your proposal is merely "let's do something like Oracle," it should > be obvious why that would be rejected. There is considerable legal > danger to slavish emulation. Further, since PostgreSQL isn't Oracle, > slavish emulation wouldn't work anyways. I don't actually like Oracle's hinting system. > > If a proposal is too fuzzy to be considered a source of a > specification, it should be obvious that that would be rejected. Well, "fuzzy" isn't a bad starting place to start gathering information for an eventual proposal. > > If you have an idea clear enough to turn into a meaningful proposal, > put it in for the usual "to and fro"; that generally leads to enormous > improvements. Absolutely. > > I'm not sure what a good hinting system ought to look like; what I > *do* know is that a fuzzy proposal won't be much good. That is sort of the stopping block. None of us "know" what it should look like, but leaving the topic as "if you want it, go do the work and submit a patch." Isn't going to get it done. First we should decide if it is, in fact, something that ought to happen, then if that happens, we should think about what it should be. Again, what would be the point of writing a proposal if there is *no* concensus on what would be acceptible?
On Wed, Oct 11, 2006 at 03:08:42PM -0700, Ron Mayer wrote: > Is one example is the table of addresses clustered by zip-code > and indexes on State, City, County, etc? No. > Now I'm not saying that a more advanced statistics system > couldn't one-day be written that sees these patterns in the > data -- but it doesn't seem likely in the near term. DBA-based > hints could be a useful interim work-around. Some others in the hints thread seem to be suggesting additional ways of teaching the optimiser what to do. _That_ seems to me to be a good idea (but I don't think that qualifies as what people usually think of as hints). A sufficiently general system of hints sprinkled on the SQL is a lot of work, and doesn't seem to me to be a whole lot easier than working out how to make second-order relationship discovery (of the sort you're talking about) cheaper and automatic. Certainly, there's plenty of statistics math kicking around that allows one to discover such relationships, and they have the benefit of not being by definition a way to work around the optimiser. A -- Andrew Sullivan | ajs@crankycanuck.ca Users never remark, "Wow, this software may be buggy and hard to use, but at least there is a lot of code underneath." --Damien Katz
On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote: > > Some statistics are very hard to gather from a sample, e.g. the number > of distinct values in a column. Then how can the DBA know it, either? The problem with this sort of argument is always that people are claiming some special knowledge is available to the DBA. If it's true that the DBA really _can_ know this stuff, then there must be some way to learn it. Which means that you can, in principle, figure out ways to communicate that to the optimizer. I like the suggestion, though, that there be ways to codify known relationships in the system in such a way that the optimizer can learn to use that information. _That_ seems to me to be a big improvement, because it can be taken into consideration along with relationships that emerge from the statistics, that the DBA may not know about. A -- Andrew Sullivan | ajs@crankycanuck.ca In the future this spectacle of the middle classes shocking the avant- garde will probably become the textbook definition of Postmodernism. --Brad Holland
Simon Riggs <simon@2ndquadrant.com> writes: > The *right* place, IMHO, for planner information is to decorate the > tables, columns and relationships so that *every* SQL statement can pick > that up. If the world changes, you make one change and all your SQL > benefits. As the analyzers improve, you may be able to just remove those > declarations entirely but generally I imagine the DB designer will for > many years know things that cannot be determined by an analyzer. Not to say this isn't a good idea -- i think it's a great idea. But note that it doesn't solve some of the use cases of hints. Consider something like: WHERE NOT radius_authenticate(suspected_hacker) or WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt) There's no way you can decorate the radius_authenticate or verify_pk_signature functions with any hint that would know when you're using it on a value you expect it to fail or succeed on. In some cases you pass data you expect to succeed 99.9% of the time and in others data you expect to fail. Only the author of the query knows what kind of value he's passing and how selective the resulting expression is. And while people seem to be worried about OLTP queries this is one area where I actually think of DSS queries first. OLTP queries run usually relatively simple and get optimized well. Also OLTP queries only have to be "fast enough", not optimal. So the planner usually does anm adequate job. DSS queries are often dozens of lines of plan -- this is where enable_* is insufficient to test the query and it's where the planner often goes wrong. And it's where an incremental speed difference can make a big difference with a report that takes 8 hours or 4 hours. Often these queries are ad-hoc queries that never will be run again anyways. -- Gregory Stark EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 12, 2006 at 08:50:04AM -0400, Greg Stark wrote: > Not to say this isn't a good idea -- i think it's a great idea. But note that > it doesn't solve some of the use cases of hints. Consider something like: > > WHERE NOT radius_authenticate(suspected_hacker) > > or > > WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt) We currently construct histograms for data in columns, there's no particular reason why we can't do the same for functions. In a similar vein, I don't see a reason why you couldn't enable a stats-gathering mode where function calls would be instrumented to collect information about: - time of execution - distribution of outputs Which could then be used by the planner. Or more directly: CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent) AS ( true = 99, false = 1 ); (Perhaps DECLARE is the better phrase?). Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Mark Woodward wrote: > > Exactly. IMHO, it is a frustrating environment. PostgreSQL is a great > system, and while I completely respect the individuals involved, I think > the "management" for lack of a better term, is difficult. 'course you're welcome to fork the project as well if your style and/or priorities are different than the postgresql core team's. If your approach is that much less frustrating, your project would gain that much more momentum from developers joining you. If more developers like your style and/or priorities, they'll migrate to your project. I think Bizgres, Mammoth, EnterpriseDB and RedHat DB and Gentoo's-occasional-bizzaro-patches are both proofs that it can work as well as proofs that it's difficult.
On Thu, 2006-10-12 at 15:06 +0200, Martijn van Oosterhout wrote: > On Thu, Oct 12, 2006 at 08:50:04AM -0400, Greg Stark wrote: > > Not to say this isn't a good idea -- i think it's a great idea. But note that > > it doesn't solve some of the use cases of hints. Consider something like: > > > > WHERE NOT radius_authenticate(suspected_hacker) > > > > or > > > > WHERE NOT verify_pk_signature(document_we_have_no_reason_to_doubt) > > We currently construct histograms for data in columns, there's no > particular reason why we can't do the same for functions. In a similar > vein, I don't see a reason why you couldn't enable a stats-gathering > mode where function calls would be instrumented to collect information > about: > > - time of execution > - distribution of outputs > > Which could then be used by the planner. Or more directly: > > CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent) > AS ( true = 99, false = 1 ); > > (Perhaps DECLARE is the better phrase?). The CREATE OPERATOR command already has a RESTRICT=res_proc clause which provides the ability to attach selectivity functions onto an operator. So this is already possible if you turn radius_authenticate() into an operator. The function parameters are passed to the selectivity function, so you can use that to steer the selectivity. Perhaps this should be allowed on the CREATE FUNCTION command when a procedure returns boolean. Greg is right though, there are some times when the default selectivity won't match what we know to be the case. His example of a function which might normally be expected to return 99.9% true being used to evaluate a list of suspected attempts where the return might well be 20% true is a good one. -- Simon Riggs EnterpriseDB http://www.enterprisedb.com
On Thu, Oct 12, 2006 at 02:25:29PM +0100, Simon Riggs wrote: > The CREATE OPERATOR command already has a RESTRICT=res_proc clause which > provides the ability to attach selectivity functions onto an operator. > > So this is already possible if you turn radius_authenticate() into an > operator. The function parameters are passed to the selectivity > function, so you can use that to steer the selectivity. > > Perhaps this should be allowed on the CREATE FUNCTION command when a > procedure returns boolean. Why limit it to booleans? For many functions you can get a reasonable estimate of the resulting data by feeding the keys of the histogram through the function. If you know how the data in "field" is distributed, you can take a good guess at the distribution of upper(field). > Greg is right though, there are some times when the default selectivity > won't match what we know to be the case. His example of a function which > might normally be expected to return 99.9% true being used to evaluate a > list of suspected attempts where the return might well be 20% true is a > good one. In the extreme case you could drop the histogram in a transaction, but I can see use-case for declaring a histogram for the current session only, or even having profile to select from. I don't think annotating the query itself is a particularly good idea. The hard part is stoing the histograms and getting the planner to use them, once that happens the really is trivial. Have a nice day, -- Martijn van Oosterhout <kleptog@svana.org> http://svana.org/kleptog/ > From each according to his ability. To each according to his ability to litigate.
Mark, > That is sort of the stopping block. None of us "know" what it should look > like, but leaving the topic as "if you want it, go do the work and submit > a patch." Isn't going to get it done. > > First we should decide if it is, in fact, something that ought to happen, > then if that happens, we should think about what it should be. Well, that's what the *rest* of us are doing on the two threads ... here, and "Simple Join Optimized Badly" on performance. You're the only one who seems to want others to do the specification work for him. Start making suggestions, and stop criticizing the process. And, to give you a starting point: the discussion has morphed into: "What manual ways can we come up with for the DBA to influence the planner and fix planner "bugs" which won't have the fragility of query-based hints ala Oracle?" --Josh Berkus
Martijn van Oosterhout <kleptog@svana.org> writes: > Which could then be used by the planner. Or more directly: > > CREATE HISTOGRAM FOR FUNCTION verify_pk_signature(documenent) > AS ( true = 99, false = 1 ); > > (Perhaps DECLARE is the better phrase?). Except that the distribution is a property of the values you're passing it, not the function itself. In theory verify_pk_signature() returns false for 99.999...% of its inputs. But of course unless you have a black hat or hardware problems you're not going to ever pass it any input that makes it return false. The query may be a routine session cookie check where it will virtually always return true, or it may be a DBA running an ad-hoc query to check suspicious records for invalid data. It may even be the same query from the same object method being called from different call sites in the application. I'm not saying the above isn't a good idea though. I rather like it actually. But the point of my example originally was specifically to show how at least sometimes the *only* place the knowledge of the data distribution lies is in the programmer's head. -- greg
On Oct 12, 2006, at 4:26 AM, Andrew Sullivan wrote: > On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote: >> >> Some statistics are very hard to gather from a sample, e.g. the >> number >> of distinct values in a column. > > Then how can the DBA know it, either? The problem with this sort of > argument is always that people are claiming some special knowledge is > available to the DBA. If it's true that the DBA really _can_ know > this stuff, then there must be some way to learn it. Which means > that you can, in principle, figure out ways to communicate that to > the optimizer. Yes, but it may be much more efficient for the human to tell the computer than for the computer to introspect things. Take, for example, ndisinct as data grows large. I, the database designer, may know (or simply see) that a certain foreign key column will have roughly a certain cardinality regardless of how big the table gets. It's a lot more efficient for me to tell the system that up front then have it need to do a full table scan or tens of millions of rows periodically to figure it out, or worse--as it is currently--to come up with an estimate that is multiple orders of magnitude off, even with the stats target turned all the way up. I realize that this is a case that is possible to do manually now, sort of. I can tweak the stats table myself. But it would be nice if you could do it in such a way that it would override what analyze comes up with on a case-by-case basis. We could have a perfect query planner, but feed it bad stats and it will still make poor decisions. I'm of the strong opinion that hinting the data is much better than hinting the queries. There tends to be many fewer places you need to do that, and new queries can automatically take advantage. > I like the suggestion, though, that there be ways to codify known > relationships in the system in such a way that the optimizer can > learn to use that information. _That_ seems to me to be a big > improvement, because it can be taken into consideration along with > relationships that emerge from the statistics, that the DBA may not > know about. I'm all for things the computer can do for me automagically. It's just good to have the ability to tell the computer about things you know about the data that it either can't efficiently figure out or can't figure out at all. -Casey
Casey Duncan <casey@pandora.com> writes: > Yes, but it may be much more efficient for the human to tell the > computer than for the computer to introspect things. Take, for > example, ndisinct as data grows large. Yeah, an override estimate for a column's ndistinct seems a perfect example of the sort of statistical hint that I'd be in favor of having. We have also talked about solving the multi-column statistics problem (which, at its core, is "which combinations of columns are worth accumulating stats for?" --- you can't possibly store stats for every combination!) by having what would amount to hints from the DBA saying "keep stats for these combinations". regards, tom lane
Tom Lane wrote: > We have also talked about solving the multi-column statistics problem > (which, at its core, is "which combinations of columns are worth > accumulating stats for?" --- you can't possibly store stats for every > combination!) by having what would amount to hints from the DBA saying > "keep stats for these combinations". > This strikes me intuitively as the most likely candidate so far for improvement. I'm much more interested in schemes that will improve the stats system, rather than providing a way around it. cheers andrew
> And, to give you a starting point: the discussion has morphed into: > "What manual ways can we come up with for the DBA to > influence the planner and fix planner "bugs" which won't have > the fragility of query-based hints ala Oracle?" I see a few downsides though: 1. it lacks a way to try out different plans in one sessionwithout actually influencing other sessions. Maybe the way to do this is:begin work;add statistics hintsexplain analyze your selectrollback work; this is not enough to try a whole program with a new manual correction though. 2. To try out sensible plans would only work if there where ways that allow all nodetypes (on specific tables/where clauses) to be made more/less expensive individually. 3. the manual entry may negatively influence other queries that previously were fastthus a typical tuning measure for one bad query plan imposes high risk 4. sometimes I know as a developer that certain queries only run in case of abnormal skew in data (e.g. I expect to get a lot of rows even if usually i would get few) Andreas
In article <20061012112647.GD28443@phlogiston.dyndns.org>,ajs@crankycanuck.ca (Andrew Sullivan) wrote: > On Thu, Oct 12, 2006 at 08:34:45AM +0200, Florian Weimer wrote: > > > > Some statistics are very hard to gather from a sample, e.g. the number > > of distinct values in a column. > > I like the suggestion, though, that there be ways to codify known > relationships in the system in such a way that the optimizer can > learn to use that information. Since there is already a genetic-algorithm based optimizer, is there any way to use that to gather information to improve statistics? For example, put the GA optimizer into a mode where it tries some of the plans it comes up with and collects data on how they perform? -arturo
Tom, > We have also talked about solving the multi-column statistics problem > (which, at its core, is "which combinations of columns are worth > accumulating stats for?" --- you can't possibly store stats for every > combination!) by having what would amount to hints from the DBA saying > "keep stats for these combinations". We could start just by keeping stats for multiple columns which are indexed together. That doesnt' cover everything (thanks to bitmapping) but would cover many cases. -- --Josh Josh Berkus PostgreSQL @ Sun San Francisco