Hints proposal - Mailing list pgsql-performance
From | Jim C. Nasby |
---|---|
Subject | Hints proposal |
Date | |
Msg-id | 20061012151439.GT28647@nasby.net Whole thread Raw |
Responses |
Re: Hints proposal
Re: Hints proposal Re: Hints proposal Re: Hints proposal |
List | pgsql-performance |
Posting here instead of hackers since this is where the thread got started... The argument has been made that producing a hints system will be as hard as actually fixing the optimizer. There's also been clamoring for an actual proposal, so here's one that (I hope) wouldn't be very difficult to implemen. My goal with this is to keep the coding aspect as simple as possible, so that implementation and maintenance of this isn't a big burden. Towards that end, these hints either tell the planner specifically how to handle some aspect of a query, or they tell it to modify specific cost estimates. My hope is that this information could be added to the internal representation of a query without much pain, and that the planner can then use that information when generating plans. The syntax these hints is something arbitrary. I'm borrowing Oracle's idea of embedding hints in comments, but we can use some other method if desired. Right now I'm more concerned with getting the general idea across. Since this is such a controversial topic, I've left this at a 'rough draft' stage - it's meant more as a framework for discussion than a final proposal for implementation. Forcing a Plan -------------- These hints would outright force the planner to do things a certain way. ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */ This would force the planner to access table via a seqscan or index_name. For the index case, you can also specify if the access must or must not be via a bitmap scan. If neither is specified, the planner is free to choose either one. Theoretically, we could also allow "ACCESS INDEX" without an index name, which would simply enforce that a seqscan not be used, but I'm not sure how useful that would be. ... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...) ... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */ Force the specified join mechanism on the join. The first form would not enforce a join order, it would only force table b to be joined to the rest of the relations using the specified join type. The second form would specify that a joins to b in that order, and optionally specify what type of join to use. ... GROUP BY ... /* {HASH|SORT} AGGREGATE */ Specify how aggregation should be handled. Cost Tweaking ------------- It would also be useful to allow tweaking of planner cost estimates. This would take the general form of node operator value where node would be a planner node/hint (ie: ACCESS INDEX), operator would be +, -, *, /, and value would be the amount to change the estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to cut the estimated cost of any index scan on a given table in half. (I realize the syntax will probably need to change to avoid pain in the grammar code.) Unlike the hints above that are ment to force a certain behavior on an operation, you could potentially have multiple cost hints in a single location, ie: FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */ JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */ The first comment block would apply to any joins against a, while the second one would apply only to joins between a and b. The effects would be cumulative, so this example means that any merge join against a gets an added cost of 5000, unless it's a join with b (because +5000 + -5000 = 0). I think you could end up with odd cases if the second form just over-rode the first, which is why it should be cummulative. -- Jim Nasby jim@nasby.net EnterpriseDB http://enterprisedb.com 512.569.9461 (cell)
pgsql-performance by date: