Hints proposal - Mailing list pgsql-performance

From Jim C. Nasby
Subject Hints proposal
Date
Msg-id 20061012151439.GT28647@nasby.net
Whole thread Raw
Responses Re: Hints proposal
Re: Hints proposal
Re: Hints proposal
Re: Hints proposal
List pgsql-performance
Posting here instead of hackers since this is where the thread got
started...

The argument has been made that producing a hints system will be as hard
as actually fixing the optimizer. There's also been clamoring for an
actual proposal, so here's one that (I hope) wouldn't be very difficult
to implemen.

My goal with this is to keep the coding aspect as simple as possible, so
that implementation and maintenance of this isn't a big burden. Towards
that end, these hints either tell the planner specifically how to handle
some aspect of a query, or they tell it to modify specific cost
estimates. My hope is that this information could be added to the
internal representation of a query without much pain, and that the
planner can then use that information when generating plans.

The syntax these hints is something arbitrary. I'm borrowing Oracle's
idea of embedding hints in comments, but we can use some other method if
desired. Right now I'm more concerned with getting the general idea
across.

Since this is such a controversial topic, I've left this at a 'rough
draft' stage - it's meant more as a framework for discussion than a
final proposal for implementation.

Forcing a Plan
--------------
These hints would outright force the planner to do things a certain way.

... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */

This would force the planner to access table via a seqscan or
index_name. For the index case, you can also specify if the access must
or must not be via a bitmap scan. If neither is specified, the planner
is free to choose either one.

Theoretically, we could also allow "ACCESS INDEX" without an index name,
which would simply enforce that a seqscan not be used, but I'm not sure
how useful that would be.

... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */

Force the specified join mechanism on the join. The first form would not
enforce a join order, it would only force table b to be joined to the
rest of the relations using the specified join type. The second form
would specify that a joins to b in that order, and optionally specify
what type of join to use.

... GROUP BY ... /* {HASH|SORT} AGGREGATE */

Specify how aggregation should be handled.

Cost Tweaking
-------------
It would also be useful to allow tweaking of planner cost estimates.
This would take the general form of

node operator value

where node would be a planner node/hint (ie: ACCESS INDEX), operator
would be +, -, *, /, and value would be the amount to change the
estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
cut the estimated cost of any index scan on a given table in half.

(I realize the syntax will probably need to change to avoid pain in the
grammar code.)

Unlike the hints above that are ment to force a certain behavior on an
operation, you could potentially have multiple cost hints in a single
location, ie:

FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
    JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */

The first comment block would apply to any joins against a, while the
second one would apply only to joins between a and b. The effects would
be cumulative, so this example means that any merge join against a gets
an added cost of 5000, unless it's a join with b (because +5000 + -5000
= 0). I think you could end up with odd cases if the second form just
over-rode the first, which is why it should be cummulative.
--
Jim Nasby                                            jim@nasby.net
EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)

pgsql-performance by date:

Previous
From: Tom Lane
Date:
Subject: Re: FW: Simple join optimized badly?
Next
From: Bruce Momjian
Date:
Subject: Re: Hints proposal