Re: Hints proposal - Mailing list pgsql-performance

From Bruce Momjian
Subject Re: Hints proposal
Date
Msg-id 200610121519.k9CFJA416544@momjian.us
Whole thread Raw
In response to Hints proposal  ("Jim C. Nasby" <jim@nasby.net>)
Responses Re: Hints proposal
List pgsql-performance
Because DB2 doesn't like hints, and the fact that they have gotten to a
point where they feel they do not need them, I feel we too can get to a
point where we don't need them either.  The question is whether we can
get there quickly enough for our userbase.

I perfer attacking the problem at the table definition level, like
something like "volatile", or adding to the existing table statistics.

---------------------------------------------------------------------------

Jim C. Nasby wrote:
> Posting here instead of hackers since this is where the thread got
> started...
>
> The argument has been made that producing a hints system will be as hard
> as actually fixing the optimizer. There's also been clamoring for an
> actual proposal, so here's one that (I hope) wouldn't be very difficult
> to implemen.
>
> My goal with this is to keep the coding aspect as simple as possible, so
> that implementation and maintenance of this isn't a big burden. Towards
> that end, these hints either tell the planner specifically how to handle
> some aspect of a query, or they tell it to modify specific cost
> estimates. My hope is that this information could be added to the
> internal representation of a query without much pain, and that the
> planner can then use that information when generating plans.
>
> The syntax these hints is something arbitrary. I'm borrowing Oracle's
> idea of embedding hints in comments, but we can use some other method if
> desired. Right now I'm more concerned with getting the general idea
> across.
>
> Since this is such a controversial topic, I've left this at a 'rough
> draft' stage - it's meant more as a framework for discussion than a
> final proposal for implementation.
>
> Forcing a Plan
> --------------
> These hints would outright force the planner to do things a certain way.
>
> ... FROM table /* ACCESS {SEQSCAN | [[NO] BITMAP] INDEX index_name} */
>
> This would force the planner to access table via a seqscan or
> index_name. For the index case, you can also specify if the access must
> or must not be via a bitmap scan. If neither is specified, the planner
> is free to choose either one.
>
> Theoretically, we could also allow "ACCESS INDEX" without an index name,
> which would simply enforce that a seqscan not be used, but I'm not sure
> how useful that would be.
>
> ... FROM a JOIN b /* {HASH|NESTED LOOP|MERGE} JOIN */ ON (...)
> ... FROM a JOIN b ON (...) /* [HASH|NESTED LOOP|MERGE] JOIN */
>
> Force the specified join mechanism on the join. The first form would not
> enforce a join order, it would only force table b to be joined to the
> rest of the relations using the specified join type. The second form
> would specify that a joins to b in that order, and optionally specify
> what type of join to use.
>
> ... GROUP BY ... /* {HASH|SORT} AGGREGATE */
>
> Specify how aggregation should be handled.
>
> Cost Tweaking
> -------------
> It would also be useful to allow tweaking of planner cost estimates.
> This would take the general form of
>
> node operator value
>
> where node would be a planner node/hint (ie: ACCESS INDEX), operator
> would be +, -, *, /, and value would be the amount to change the
> estimate by. So "ACCESS INDEX my_index / 2" would tell the planner to
> cut the estimated cost of any index scan on a given table in half.
>
> (I realize the syntax will probably need to change to avoid pain in the
> grammar code.)
>
> Unlike the hints above that are ment to force a certain behavior on an
> operation, you could potentially have multiple cost hints in a single
> location, ie:
>
> FROM a /* HASH JOIN * 1.1 NESTED LOOP JOIN * 2 MERGE JOIN + 5000 */
>     JOIN b ON (...) /* NESTED LOOP JOIN - 5000 */
>
> The first comment block would apply to any joins against a, while the
> second one would apply only to joins between a and b. The effects would
> be cumulative, so this example means that any merge join against a gets
> an added cost of 5000, unless it's a join with b (because +5000 + -5000
> = 0). I think you could end up with odd cases if the second form just
> over-rode the first, which is why it should be cummulative.
> --
> Jim Nasby                                            jim@nasby.net
> EnterpriseDB      http://enterprisedb.com      512.569.9461 (cell)
>
> ---------------------------(end of broadcast)---------------------------
> TIP 4: Have you searched our list archives?
>
>                http://archives.postgresql.org

--
  Bruce Momjian   bruce@momjian.us
  EnterpriseDB    http://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

pgsql-performance by date:

Previous
From: "Jim C. Nasby"
Date:
Subject: Hints proposal
Next
From: "Joshua Marsh"
Date:
Subject: Re: Hints proposal