Thread: Hints (was Poor performance using CTE)

Hints (was Poor performance using CTE)

From
Craig James
Date:
On Tue, Nov 20, 2012 at 3:53 PM, Tom Lane <tgl@sss.pgh.pa.us> wrote:
Craig Ringer <craig@2ndQuadrant.com> writes:
> On 11/21/2012 12:06 AM, Claudio Freire wrote:
>> I meant for postgres to do automatically. Rewriting as a join wouldn't
>> work as an optimization fence the way we're used to, but pushing
>> constraints upwards can only help (especially if highly selective).

> Because people are now used to using CTEs as query hints, it'd probably
> cause performance regressions in working queries. Perhaps more
> importantly, Pg would have to prove that doing so didn't change queries
> that invoked functions with side-effects to avoid changing the results
> of currently valid queries.

We could trivially arrange to keep the current semantics if the CTE
query contains any volatile functions (or of course if it's
INSERT/UPDATE/DELETE).  I think we'd also need to not optimize if
it's invoked from more than one place in the outer query.

I think the more interesting question is what cases wouldn't be covered
by such a rule.  Typically you need to use OFFSET 0 in situations where
the planner has guessed wrong about costs or rowcounts, and I think
people are likely using WITH for that as well.  Should we be telling
people that they ought to insert OFFSET 0 in WITH queries if they want
to be sure there's an optimization fence?

I'm probably beating a dead horse ... but isn't this just a hint?  Except that it's worse than a hint, because it's a hint in disguise and is undocumented.  As far as I can tell, there's no use for "OFFSET 0" except to act as an optimizer fence.

It's clearly an important need, given the nature of the dialog above (and many others that have passed through this mailing list).

Why not make an explicit hint syntax and document it? I've still don't understand why "hint" is a dirty word in Postgres.  There are a half-dozen or so ways in common use to circumvent or correct sub-optimal plans.

Craig James
 

                        regards, tom lane


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance

Re: Hints (was Poor performance using CTE)

From
Craig Ringer
Date:
On 11/21/2012 09:35 AM, Craig James wrote:
> Why not make an explicit hint syntax and document it? I've still don't
> understand why "hint" is a dirty word in Postgres.  There are a
> half-dozen or so ways in common use to circumvent or correct
> sub-optimal plans.
>

The reason usually given is that hints provide easy workarounds for
planner and stats issues, so people don't report problems or fix the
underlying problem.

Of course, if that's all there was to it, `OFFSET 0` would be made into
an error or warning, or ignored and not fenced.

The reality is, as you say, that there's a need, because the planner can
never be perfect - or rather, if it were nearly perfect, it'd take so
long to read the stats and calculate plans that everything would be
glacially slow anyway. The planner has to compromise, and so cases will
always arise where it needs a little help.

I think it's time to admit that and get the syntax in place for CTEs so
there's room to optimize them later, rather than cementing
CTEs-as-fences in forever as a Pg quirk.

--
 Craig Ringer                   http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services



Re: Hints (was Poor performance using CTE)

From
Shaun Thomas
Date:
On 11/20/2012 08:15 PM, Craig Ringer wrote:

> I think it's time to admit that and get the syntax in place for CTEs so
> there's room to optimize them later, rather than cementing
> CTEs-as-fences in forever as a Pg quirk.

I know I'm just some schmo, but I'd vote for this. I'm certainly guilty
of using OFFSET 0. Undocumented hints are still hints. As much as I
think they're a bad idea by cementing a certain plan that may not get
the benefits of future versions, non-intuitive side-effects by using
overloaded syntax are worse.

I've been using CTEs as temp tables because I know that's how they work.
But I'd be more than willing to modify my syntax one way or the other to
adopt non-materialized CTEs, provided there's some way to get the
current behavior.

--
Shaun Thomas
OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
312-444-8534
sthomas@optionshouse.com

______________________________________________

See http://www.peak6.com/email_disclaimer/ for terms and conditions related to this email


Re: Hints (was Poor performance using CTE)

From
"Kevin Grittner"
Date:
Craig Ringer wrote:
> On 11/21/2012 09:35 AM, Craig James wrote:
>> Why not make an explicit hint syntax and document it? I've still
>> don't understand why "hint" is a dirty word in Postgres. There are
>> a half-dozen or so ways in common use to circumvent or correct
>> sub-optimal plans.
>
> The reason usually given is that hints provide easy workarounds for
> planner and stats issues, so people don't report problems or fix
> the underlying problem.
>
> Of course, if that's all there was to it, `OFFSET 0` would be made
> into an error or warning, or ignored and not fenced.
>
> The reality is, as you say, that there's a need, because the
> planner can never be perfect - or rather, if it were nearly
> perfect, it'd take so long to read the stats and calculate plans
> that everything would be glacially slow anyway. The planner has to
> compromise, and so cases will always arise where it needs a little
> help.
>
> I think it's time to admit that and get the syntax in place for
> CTEs so there's room to optimize them later, rather than cementing
> CTEs-as-fences in forever as a Pg quirk.

It's a tough problem. Disguising and not documenting the available
optimizer hints leads to more reports on where the optimizer should
be smarter, and has spurred optimizer improvements. And many type of
hints would undoubtedly cause people to force what they *think* would
be the best plan in many cases where they are wrong, or become wrong
as data scales up. But it does seem odd every time I hear people
saying that they don't want to eliminate some optimization fence
because "they find it useful" while simultaneously arguing that we
don't have or want hints.

Having a way to coerce the optimizer from the plan it would take with
straightforward coding *is* a hint, and one down-side of hiding the
hints inside syntax mostly supported for other reasons is that people
who don't know about these clever devices can't do reasonable
refactoring of queries for readability without risking performance
regressions. Another down-side is that perfectly reasonable queries
ported from other databases that use hint syntax for hints run afoul
of the secret hints when trying to run queries on PostgreSQL, and get
performance potentially orders of magnitude worse than they expect.

I'm not sure what the best answer is, but as long as we have hints,
but only through OFFSET 0 or CTE usage, that should be documented.
Better, IMV, would be to identify what sorts of hints people actually
find useful, and use that as the basis for TODO items for optimizer
improvement as well as inventing clear ways to specify the desired
coercion. I liked the suggestion that a CTE which didn't need to be
materialized because of side-effects or multiple references have a
keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
specify that.

Regarding the above-mentioned benefits we would stand to lose by
having clear and documented hints, perhaps we could occasionally
solicit input on where people are finding hints useful to get ideas
on where we might want to improve the optimizer. As far as worrying
about people using hints to force a plan which is sub-optimal --
isn't that getting into nanny mode a bit too much?

-Kevin


Re: Hints (was Poor performance using CTE)

From
Heikki Linnakangas
Date:
On 21.11.2012 15:42, Kevin Grittner wrote:
> Better, IMV, would be to identify what sorts of hints people actually
> find useful, and use that as the basis for TODO items for optimizer
> improvement as well as inventing clear ways to specify the desired
> coercion. I liked the suggestion that a CTE which didn't need to be
> materialized because of side-effects or multiple references have a
> keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
> would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
> specify that.

Rather than telling the planner what to do or not to do, I'd much rather
have hints that give the planner more information about the tables and
quals involved in the query. A typical source of bad plans is when the
planner gets its cost estimates wrong. So rather than telling the
planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
the user could tell the planner that there are only 10 rows that match
the "a.id = b.id" qual. That gives the planner the information it needs
to choose the right plan on its own. That kind of hints would be much
less implementation specific and much more likely to still be useful, or
at least not outright counter-productive, in a future version with a
smarter planner.

You could also attach that kind of hints to tables and columns, which
would be more portable and nicer than decorating all queries.

- Heikki


Re: Hints (was Poor performance using CTE)

From
Craig James
Date:


On Wed, Nov 21, 2012 at 5:42 AM, Kevin Grittner <kgrittn@mail.com> wrote:

It's a tough problem. Disguising and not documenting the available
optimizer hints leads to more reports on where the optimizer should
be smarter, and has spurred optimizer improvements. ...
Regarding the above-mentioned benefits we would stand to lose by
having clear and documented hints, perhaps we could occasionally
solicit input on where people are finding hints useful to get ideas
on where we might want to improve the optimizer. As far as worrying
about people using hints to force a plan which is sub-optimal --
isn't that getting into nanny mode a bit too much?

Toward that end, the hint documentation (which is almost always viewed as HTML) could be prefaced by a strong suggestion to post performance questions in this group first, with links to the "subscribe" page and the "how to report performance problems" FAQ. The hint documentation could even be minimalistic; suggest to developers that they should post their problematic queries here before resorting to hints.  That would give the experts an opportunity to provide the normal advice.  The correct hint syntax would be suggested only when all other avenues failed.

Craig James
 

-Kevin

Re: Hints (was Poor performance using CTE)

From
Joe Conway
Date:
On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and
> quals involved in the query. A typical source of bad plans is when the
> planner gets its cost estimates wrong. So rather than telling the
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> the user could tell the planner that there are only 10 rows that match
> the "a.id = b.id" qual. That gives the planner the information it needs
> to choose the right plan on its own. That kind of hints would be much
> less implementation specific and much more likely to still be useful, or
> at least not outright counter-productive, in a future version with a
> smarter planner.
>
> You could also attach that kind of hints to tables and columns, which
> would be more portable and nicer than decorating all queries.

I like this idea, but also think that if we have a syntax to allow
hints, it would be nice to have a simple way to ignore all hints (yes, I
suppose I'm suggesting yet another GUC). That way after sprinkling your
SQL with hints, you could easily periodically (e.g. after a Postgres
upgrade) test what would happen if the hints were removed.

Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support




Re: Hints (was Poor performance using CTE)

From
Craig James
Date:


On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway <mail@joeconway.com> wrote:
On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and
> quals involved in the query. A typical source of bad plans is when the
> planner gets its cost estimates wrong. So rather than telling the
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> the user could tell the planner that there are only 10 rows that match
> the "a.id = b.id" qual. That gives the planner the information it needs
> to choose the right plan on its own. That kind of hints would be much
> less implementation specific and much more likely to still be useful, or
> at least not outright counter-productive, in a future version with a
> smarter planner.
>
> You could also attach that kind of hints to tables and columns, which
> would be more portable and nicer than decorating all queries.

I like this idea, but also think that if we have a syntax to allow
hints, it would be nice to have a simple way to ignore all hints (yes, I
suppose I'm suggesting yet another GUC). That way after sprinkling your
SQL with hints, you could easily periodically (e.g. after a Postgres
upgrade) test what would happen if the hints were removed.

Or a three-way choice: Allow, ignore, or generate an error.  That would allow developers to identify where hints are being used.

Craig
 

Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support



Re: Hints (was Poor performance using CTE)

From
Joe Conway
Date:
On 11/21/2012 09:28 AM, Craig James wrote:
>
>
> On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway <mail@joeconway.com
> <mailto:mail@joeconway.com>> wrote:
>
>     On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
>     > Rather than telling the planner what to do or not to do, I'd much
>     rather
>     > have hints that give the planner more information about the tables and
>     > quals involved in the query. A typical source of bad plans is when the
>     > planner gets its cost estimates wrong. So rather than telling the
>     > planner to use a nested loop join for "a INNER JOIN b ON a.id
>     <http://a.id> = b.id <http://b.id>",
>     > the user could tell the planner that there are only 10 rows that match
>     > the "a.id <http://a.id> = b.id <http://b.id>" qual. That gives the
>     planner the information it needs
>     > to choose the right plan on its own. That kind of hints would be much
>     > less implementation specific and much more likely to still be
>     useful, or
>     > at least not outright counter-productive, in a future version with a
>     > smarter planner.
>     >
>     > You could also attach that kind of hints to tables and columns, which
>     > would be more portable and nicer than decorating all queries.
>
>     I like this idea, but also think that if we have a syntax to allow
>     hints, it would be nice to have a simple way to ignore all hints (yes, I
>     suppose I'm suggesting yet another GUC). That way after sprinkling your
>     SQL with hints, you could easily periodically (e.g. after a Postgres
>     upgrade) test what would happen if the hints were removed.
>
>
> Or a three-way choice: Allow, ignore, or generate an error.  That would
> allow developers to identify where hints are being used.

+1

Joe


--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support




Re: Hints (was Poor performance using CTE)

From
Tom Lane
Date:
Craig James <cjames@emolecules.com> writes:
> On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway <mail@joeconway.com> wrote:
>> I like this idea, but also think that if we have a syntax to allow
>> hints, it would be nice to have a simple way to ignore all hints (yes, I
>> suppose I'm suggesting yet another GUC). That way after sprinkling your
>> SQL with hints, you could easily periodically (e.g. after a Postgres
>> upgrade) test what would happen if the hints were removed.

> Or a three-way choice: Allow, ignore, or generate an error.  That would
> allow developers to identify where hints are being used.

Throwing errors would likely prevent you from reaching all parts of your
application, thus preventing complete testing.  Much more sensible to
just log such queries.

            regards, tom lane


Re: Hints (was Poor performance using CTE)

From
Andreas Kretschmer
Date:
Craig James <cjames@emolecules.com> wrote:

>     On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
>     > Rather than telling the planner what to do or not to do, I'd much rather
>     > have hints that give the planner more information about the tables and
>     > quals involved in the query. A typical source of bad plans is when the
>     > planner gets its cost estimates wrong. So rather than telling the
>     > planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
>     > the user could tell the planner that there are only 10 rows that match
>     > the "a.id = b.id" qual. That gives the planner the information it needs
>     > to choose the right plan on its own. That kind of hints would be much
>     > less implementation specific and much more likely to still be useful, or
>     > at least not outright counter-productive, in a future version with a
>     > smarter planner.
>     >
>     > You could also attach that kind of hints to tables and columns, which
>     > would be more portable and nicer than decorating all queries.
>
>     I like this idea, but also think that if we have a syntax to allow
>     hints, it would be nice to have a simple way to ignore all hints (yes, I
>     suppose I'm suggesting yet another GUC). That way after sprinkling your
>     SQL with hints, you could easily periodically (e.g. after a Postgres
>     upgrade) test what would happen if the hints were removed.
>
>
> Or a three-way choice: Allow, ignore, or generate an error.  That would allow
> developers to identify where hints are being used.
>
> Craig

+1

I think, we HAVE a smart planner, but hints in this direction are okay,
and we need a simple way to make such hints obsolete - for/in the future.


Andreas
--
Really, I'm not out to destroy Microsoft. That will just be a completely
unintentional side effect.                              (Linus Torvalds)
"If I was god, I would recompile penguin with --enable-fly."   (unknown)
Kaufbach, Saxony, Germany, Europe.              N 51.05082°, E 13.56889°


Re: Hints - experiences from other rdbms

From
Willem Leenen
Date:


>
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and
> quals involved in the query. A typical source of bad plans is when the
> planner gets its cost estimates wrong. So rather than telling the
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> the user could tell the planner that there are only 10 rows that match
> the "a.id = b.id" qual.



Instead of gathering statistics for all possible joins ( and join orders) , in Oracle there is a functionality that can be switched on where the optimizer is given cardinality feedback for the chosen plans, so it can choose another plan if the same statement comes around.

Secondly, there is functionality to insert a hint into an SQL statement. That's very good for COTS apps where the statement can't be altered. Now I know that there's relatively not much COTS for the Postgresql, ( hence arguments like 'we should not implement hints so we're forcing people to solve the underlying problem' ), but as Postgresql will replace oracle in the lower end of the market, this functionality is usefull.


Re: Hints (was Poor performance using CTE)

From
Jeff Janes
Date:
On Wed, Nov 21, 2012 at 8:05 AM, Heikki Linnakangas
<hlinnakangas@vmware.com> wrote:
> On 21.11.2012 15:42, Kevin Grittner wrote:
>>
>> Better, IMV, would be to identify what sorts of hints people actually
>> find useful, and use that as the basis for TODO items for optimizer
>> improvement as well as inventing clear ways to specify the desired
>> coercion. I liked the suggestion that a CTE which didn't need to be
>> materialized because of side-effects or multiple references have a
>> keyword. Personally, I think that AS MATERIALIZED x (SELECT ...)
>> would be preferable to AS x (SELECT ... OFFSET 0) as the syntax to
>> specify that.
>
>
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and quals
> involved in the query. A typical source of bad plans is when the planner
> gets its cost estimates wrong. So rather than telling the planner to use a
> nested loop join for "a INNER JOIN b ON a.id = b.id", the user could tell
> the planner that there are only 10 rows that match the "a.id = b.id" qual.

For each a.id there are 10 b.id, or for each b.id there are 10 a.id?

> That gives the planner the information it needs to choose the right plan on
> its own. That kind of hints would be much less implementation specific and
> much more likely to still be useful, or at least not outright
> counter-productive, in a future version with a smarter planner.

When I run into unexpectedly poor performance, I have an intuitive
enough feel for my own data that I know what plan it ought to be
using.  Figuring out why it is not using it is very hard.  For one
thing, EXPLAIN tells you about the "winning" plan, but there is no
visibility into what ought to be the winning plan but isn't, so no way
to see why it isn't.    So you first have to use our existing non-hint
hints (enable_*, doing weird things with cost_*, CTE stuff) to trick
it into using the plan I want it to use, before I can figure out why
it isn't using it, before I could figure out what hints of the style
you are suggesting to supply to get it to use it.

So I think the type of hints you are suggesting would be about as hard
for the user to use as debugging the planner for the particular case
would be.  While the more traditional type of hint is easy to use,
because the end user understands their data more than they understand
the guts of the planner.


Cheers,

Jeff


Re: Hints (was Poor performance using CTE)

From
Heikki Linnakangas
Date:
On 22.11.2012 02:53, Jeff Janes wrote:
>> That gives the planner the information it needs to choose the right plan on
>> its own. That kind of hints would be much less implementation specific and
>> much more likely to still be useful, or at least not outright
>> counter-productive, in a future version with a smarter planner.
>
> When I run into unexpectedly poor performance, I have an intuitive
> enough feel for my own data that I know what plan it ought to be
> using.  Figuring out why it is not using it is very hard.  For one
> thing, EXPLAIN tells you about the "winning" plan, but there is no
> visibility into what ought to be the winning plan but isn't, so no way
> to see why it isn't.    So you first have to use our existing non-hint
> hints (enable_*, doing weird things with cost_*, CTE stuff) to trick
> it into using the plan I want it to use, before I can figure out why
> it isn't using it, before I could figure out what hints of the style
> you are suggesting to supply to get it to use it.

I'm sure that happens too, but my gut feeling is that more often the
EXPLAIN ANALYZE output reveals a bad estimate somewhere in the plan, and
the planner chooses a bad plan based on the bad estimate. If you hint
the planner by giving a better estimate for where the estimator got it
wrong, the planner will choose the desired plan.

- Heikki


Re: Hints (was Poor performance using CTE)

From
Gavin Flower
Date:
On 22/11/12 06:28, Craig James wrote:


On Wed, Nov 21, 2012 at 9:25 AM, Joe Conway <mail@joeconway.com> wrote:
On 11/21/2012 08:05 AM, Heikki Linnakangas wrote:
> Rather than telling the planner what to do or not to do, I'd much rather
> have hints that give the planner more information about the tables and
> quals involved in the query. A typical source of bad plans is when the
> planner gets its cost estimates wrong. So rather than telling the
> planner to use a nested loop join for "a INNER JOIN b ON a.id = b.id",
> the user could tell the planner that there are only 10 rows that match
> the "a.id = b.id" qual. That gives the planner the information it needs
> to choose the right plan on its own. That kind of hints would be much
> less implementation specific and much more likely to still be useful, or
> at least not outright counter-productive, in a future version with a
> smarter planner.
>
> You could also attach that kind of hints to tables and columns, which
> would be more portable and nicer than decorating all queries.

I like this idea, but also think that if we have a syntax to allow
hints, it would be nice to have a simple way to ignore all hints (yes, I
suppose I'm suggesting yet another GUC). That way after sprinkling your
SQL with hints, you could easily periodically (e.g. after a Postgres
upgrade) test what would happen if the hints were removed.

Or a three-way choice: Allow, ignore, or generate an error.  That would allow developers to identify where hints are being used.

Craig
 

Joe
--
Joe Conway
credativ LLC: http://www.credativ.us
Linux, PostgreSQL, and general Open Source
Training, Service, Consulting, & 24x7 Support



Or perhaps hints should have the pg version attached, so that they are automatically ignored when the pg version changed?  Problem may then become people reluctant to upgrade because their hints relate to a previous version!  Sigh...

Even requiring registration of hints and expiring them after a limited time period would not work - as people would simply automate the process of registration & application...


Cheers,
Gavin

Re: Hints (was Poor performance using CTE)

From
Cédric Villemain
Date:
Le mercredi 21 novembre 2012 17:34:02, Craig James a écrit :
> On Wed, Nov 21, 2012 at 5:42 AM, Kevin Grittner <kgrittn@mail.com> wrote:
> > It's a tough problem. Disguising and not documenting the available
> > optimizer hints leads to more reports on where the optimizer should
> > be smarter, and has spurred optimizer improvements. ...
> > Regarding the above-mentioned benefits we would stand to lose by
> > having clear and documented hints, perhaps we could occasionally
> > solicit input on where people are finding hints useful to get ideas
> > on where we might want to improve the optimizer. As far as worrying
> > about people using hints to force a plan which is sub-optimal --
> > isn't that getting into nanny mode a bit too much?
>
> Toward that end, the hint documentation (which is almost always viewed as
> HTML) could be prefaced by a strong suggestion to post performance
> questions in this group first, with links to the "subscribe" page and the
> "how to report performance problems" FAQ. The hint documentation could even
> be minimalistic; suggest to developers that they should post their
> problematic queries here before resorting to hints.  That would give the
> experts an opportunity to provide the normal advice.  The correct hint
> syntax would be suggested only when all other avenues failed.

We have hooks in PostgreSQL. We already have at least one extension which is
using that to change the planner behavior.

We can have a bit more hooks and try to improve the cost estimate, this part
of the code is known to be built by reports and human estimations, also the
9.2 version got heavy modifications in this area.

Let the 'Hints' be inside an extension thus we are able to track them and fix
the planner/costestimate issues.

I don't see why PostgreSQL needs 'Hints' *in-core*.
--
Cédric Villemain +33 (0)6 20 30 22 52
http://2ndQuadrant.fr/
PostgreSQL: Support 24x7 - Développement, Expertise et Formation

Attachment

Re: Hints (was Poor performance using CTE)

From
Scott Marlowe
Date:
On Fri, Nov 23, 2012 at 3:05 AM, Cédric Villemain
<cedric@2ndquadrant.com> wrote:
> Le mercredi 21 novembre 2012 17:34:02, Craig James a écrit :
>> On Wed, Nov 21, 2012 at 5:42 AM, Kevin Grittner <kgrittn@mail.com> wrote:
>> > It's a tough problem. Disguising and not documenting the available
>> > optimizer hints leads to more reports on where the optimizer should
>> > be smarter, and has spurred optimizer improvements. ...
>> > Regarding the above-mentioned benefits we would stand to lose by
>> > having clear and documented hints, perhaps we could occasionally
>> > solicit input on where people are finding hints useful to get ideas
>> > on where we might want to improve the optimizer. As far as worrying
>> > about people using hints to force a plan which is sub-optimal --
>> > isn't that getting into nanny mode a bit too much?
>>
>> Toward that end, the hint documentation (which is almost always viewed as
>> HTML) could be prefaced by a strong suggestion to post performance
>> questions in this group first, with links to the "subscribe" page and the
>> "how to report performance problems" FAQ. The hint documentation could even
>> be minimalistic; suggest to developers that they should post their
>> problematic queries here before resorting to hints.  That would give the
>> experts an opportunity to provide the normal advice.  The correct hint
>> syntax would be suggested only when all other avenues failed.
>
> We have hooks in PostgreSQL. We already have at least one extension which is
> using that to change the planner behavior.
>
> We can have a bit more hooks and try to improve the cost estimate, this part
> of the code is known to be built by reports and human estimations, also the
> 9.2 version got heavy modifications in this area.
>
> Let the 'Hints' be inside an extension thus we are able to track them and fix
> the planner/costestimate issues.
>
> I don't see why PostgreSQL needs 'Hints' *in-core*.

Here here!  PostgreSQL is well known for its extensibility and this is
the perfect place for hints.  That way they can get worked on without
becoming a crutch for every user and forcing the backend developers to
support what may or may not be a good idea syntax wise.  After a few
different people have banged some code out to make workable hint
syntaxes for their own use maybe then it will be time to revisit
adding hints to core.


Re: Hints (was Poor performance using CTE)

From
Craig Ringer
Date:
On 27/11/2012 3:42 PM, Scott Marlowe wrote:

> Here here!  PostgreSQL is well known for its extensibility and this is
> the perfect place for hints.
I agree with the sentiment and your concerns. However, this doesn't
solve the CTE problem.

Some people are relying on the planner's inability to push conditions
into / pull conditions out of CTEs, and otherwise re-arrange them. If
support for optimising into eligible CTEs (ie CTE terms that contain
only SELECT or VALUES and call no VOLATILE functions) then these
applications will potentially encounter serious performance regressions.

Should this feature never be added to Pg, making it different and
incompatible with other DBs that implement CTE optimisation, just
because some people are using it for a hacky hint like OFFSET 0?

Should these applications just be broken by the update, with people told
to add `OFFSET 0` or load some not-yet-existing hints module after
reporting the performance issue to the list?

I don't think either of those are acceptable. Sooner or later somebody's
going to want to add CTE optimisation, and I don't think that "you
can't" or "great, we'll do it and break everything" are acceptable
responses to any proposed patch someone might come up with to add that.

A GUC might be OK, as apps can always SET it before problem queries or
not-yet-ported code. It'd probably reduce the rate at which people fixed
their code considerably, though, going by past experience with
standard_conforming_strings, etc, but it'd work.

--
Craig Ringer


Re: Hints (was Poor performance using CTE)

From
Scott Marlowe
Date:
On Tue, Nov 27, 2012 at 7:17 PM, Craig Ringer <craig@2ndquadrant.com> wrote:
> On 27/11/2012 3:42 PM, Scott Marlowe wrote:
>
>> Here here!  PostgreSQL is well known for its extensibility and this is
>> the perfect place for hints.
>
> I agree with the sentiment and your concerns. However, this doesn't solve
> the CTE problem.
>
> Some people are relying on the planner's inability to push conditions into /
> pull conditions out of CTEs, and otherwise re-arrange them. If support for
> optimising into eligible CTEs (ie CTE terms that contain only SELECT or
> VALUES and call no VOLATILE functions) then these applications will
> potentially encounter serious performance regressions.
>
> Should this feature never be added to Pg, making it different and
> incompatible with other DBs that implement CTE optimisation, just because
> some people are using it for a hacky hint like OFFSET 0?

I'm strictly talking about any hinting mechanism being added being an
extension.  Fixing the planner so that optimizations can get cross the
CTE boundary seems the domain of back end hackers, not extensions.


Re: Hints (was Poor performance using CTE)

From
Gavin Flower
Date:
On 28/11/12 15:17, Craig Ringer wrote:
On 27/11/2012 3:42 PM, Scott Marlowe wrote:

Here here!  PostgreSQL is well known for its extensibility and this is
the perfect place for hints.
I agree with the sentiment and your concerns. However, this doesn't solve the CTE problem.

Some people are relying on the planner's inability to push conditions into / pull conditions out of CTEs, and otherwise re-arrange them. If support for optimising into eligible CTEs (ie CTE terms that contain only SELECT or VALUES and call no VOLATILE functions) then these applications will potentially encounter serious performance regressions.

Should this feature never be added to Pg, making it different and incompatible with other DBs that implement CTE optimisation, just because some people are using it for a hacky hint like OFFSET 0?

Should these applications just be broken by the update, with people told to add `OFFSET 0` or load some not-yet-existing hints module after reporting the performance issue to the list?

I don't think either of those are acceptable. Sooner or later somebody's going to want to add CTE optimisation, and I don't think that "you can't" or "great, we'll do it and break everything" are acceptable responses to any proposed patch someone might come up with to add that.

A GUC might be OK, as apps can always SET it before problem queries or not-yet-ported code. It'd probably reduce the rate at which people fixed their code considerably, though, going by past experience with standard_conforming_strings, etc, but it'd work.

--
Craig Ringer


I think it would be best to be something in the SQL for SELECT, as:
  1. One is more likely to find it by looking up the documentation for SELECT

  2. It could allow selective application within a SELECT: one could have several queries within the WITH clause: where all except one might benefit for optimisation, and the exception might cause problems
I have suggested a couple possible syntax paterns, but there may well be better alternative syntaxes.


Cheers,
Gavin