Thread: enforcing a plan (in brief)

enforcing a plan (in brief)

From
"Hicham G. Elmongui"
Date:
Is there a way to bypass the optimizer and to specify a plan to be executed?

Something like:



Limit  -> Nested Loop     -> Nested Loop         -> Seq Scan on tab00 t00        -> Index Scan using tab03_pkey on
tab03t03                 Index Cond: ("outer".id = t03.id)    -> Index Scan using tab01_pkey on tab01 t01
IndexCond: ("outer".id = t01.id)
 





Re: enforcing a plan (in brief)

From
Bruce Momjian
Date:
Hicham G. Elmongui wrote:
> Is there a way to bypass the optimizer and to specify a plan to be executed?
> 
> Something like:
> 
> 
> 
> Limit 
>   -> Nested Loop 
>      -> Nested Loop  
>         -> Seq Scan on tab00 t00 
>         -> Index Scan using tab03_pkey on tab03 t03  
>                 Index Cond: ("outer".id = t03.id)
>      -> Index Scan using tab01_pkey on tab01 t01  
>               Index Cond: ("outer".id = t01.id)

No, we feel that is of limited value.  If the optimizer isn't doing
things properly, we will fix it.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: enforcing a plan (in brief)

From
Sailesh Krishnamurthy
Date:
Hicham

For your experiments (VLDB ? :-) your best bet of specifically bolting
on a plan (if you can't convince the optimizer to do the right thing)
is to hack it in the query planner. 

I've done similar hacks in the past, but only in the TelegraphCQ code
and not in PostgreSQL. 

-- 
Pip-pip
Sailesh
http://www.cs.berkeley.edu/~sailesh




Re: enforcing a plan (in brief)

From
Neil Conway
Date:
On Thu, 2005-02-10 at 14:37 -0500, Bruce Momjian wrote:
> No, we feel that is of limited value.  If the optimizer isn't doing
> things properly, we will fix it.

I agree that improving the optimizer is the right answer for normal
usage, so I can't get excited about query-level plan hints, but I can
see the capability to instruct the planner being useful in an academic
context.

-Neil




Re: enforcing a plan (in brief)

From
pgsql@mohawksoft.com
Date:
> On Thu, 2005-02-10 at 14:37 -0500, Bruce Momjian wrote:
>> No, we feel that is of limited value.  If the optimizer isn't doing
>> things properly, we will fix it.
>
> I agree that improving the optimizer is the right answer for normal
> usage, so I can't get excited about query-level plan hints, but I can
> see the capability to instruct the planner being useful in an academic
> context.
>

I think that is sort of arrogant. Look at Oracle, you can give the planner
hints in the form of comments.

The idea that constructing a planner that will always do the best job is
like creating a program that can predict the weather. There are too many
subtle variations in datasets that are impossible to really evalute. I
posted a message last week called "One Big trend vs multiple smaller
trends." and you'll see what I mean.

Yea, on a simple data organization, you could make a great planner, but
someone who has studied the nature of their data can almost always toss
their hands up in frustration because the planner isn't working right.

I have had multiple issues with the inability to guide the planner on its
decisions. I'll give a couple examples:

A music database where the artist name is "Various Artists,"  given any
normal database of music recordings, "Various Artists" will be *THE* most
popular artist, usually close to almost half the data. Most of the time
I've had to turn off sequential scans for these queries. (I filter out
"various artists") Being able to say:

select * from cdtitles where artist = 'foo' /* index scan
ctitles_artist_ndx  */ ;

Would be helpful as disabling sequential scan isn't always the right think
either.

The whole "Query optimizer 8.0.1 (and 8.0)" series of posts show a
different problem.

It all comes down to that the planner *can not* be perfect, and thus will
always be lacking in some respect. This is because you can not anticipate
every physical data storage pattern, therefore, the analyzer will not
correctly characterize them, and the planner will not create an optimal
plan.

Allowing the user to suggest alternate query strategies is a good idea.






Re: enforcing a plan (in brief)

From
Tom Lane
Date:
pgsql@mohawksoft.com writes:
> I think that is sort of arrogant. Look at Oracle, you can give the planner
> hints in the form of comments.

Arrogant or not, that's the general view of the people who work on the
planner.

The real issue is not so much whether the planner will always get things
right --- it won't, and no one imagines that it will ever be perfect.
The real issue is that we have limited manpower, and designing and
implementing a useful hint facility is a nontrivial project.  (Not to
mention that maintaining such a thing in the face of frequent,
fundamental changes to the underlying planner and executor capabilities
would be an outright nightmare.)

The people who are actually doing the work think their time is more
usefully spent on improving the planner's intelligence than on devising
ways to override it.
        regards, tom lane


Re: enforcing a plan (in brief)

From
Bruce Momjian
Date:
Tom Lane wrote:
> pgsql@mohawksoft.com writes:
> > I think that is sort of arrogant. Look at Oracle, you can give the planner
> > hints in the form of comments.
> 
> Arrogant or not, that's the general view of the people who work on the
> planner.
> 
> The real issue is not so much whether the planner will always get things
> right --- it won't, and no one imagines that it will ever be perfect.
> The real issue is that we have limited manpower, and designing and
> implementing a useful hint facility is a nontrivial project.  (Not to
> mention that maintaining such a thing in the face of frequent,
> fundamental changes to the underlying planner and executor capabilities
> would be an outright nightmare.)

And the user maintenance of updating those hints for every release of
PostgreSQL as we improve the database engine.

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


Re: enforcing a plan (in brief)

From
Neil Conway
Date:
On Mon, 2005-02-14 at 22:56 -0500, Bruce Momjian wrote:
> And the user maintenance of updating those hints for every release of
> PostgreSQL as we improve the database engine.

... and maintaining those hints as the data changes over time.

But I think this thread has been hijacked toward a subject that has been
beaten to death in the past, and away from something that I think might
be worth exploring. IMHO, people deploying PostgreSQL for production use
are just one of the groups of users of this project. Another group are
those people using PostgreSQL in an academic environment. I think it
would be really cool to make it absolutely simple to use PostgreSQL as a
starting point for DBMS research. That would mean things like:

- good, thorough documentation of the internals (naturally this would
help attract OSS developers as well)

- APIs that allow people to drive the planner and executor
programmatically (as in the original question)

- plugin APIs that make it relatively easy to replace the implementation
of a subsystem whole-sale (if there's a cost to these APIs in terms of
complexity or performance, it is perhaps not worth doing)

(Of course, I'm partially guessing here -- but if those people who
actually _are_ using PostgreSQL in an academic context have some
additional ideas for how we can make your lives easier, I'd be curious
to hear them.)

-Neil




Re: enforcing a plan (in brief)

From
pgsql@mohawksoft.com
Date:
> pgsql@mohawksoft.com writes:
>> I think that is sort of arrogant. Look at Oracle, you can give the
>> planner
>> hints in the form of comments.
>
> Arrogant or not, that's the general view of the people who work on the
> planner.
>
> The real issue is not so much whether the planner will always get things
> right --- it won't, and no one imagines that it will ever be perfect.
> The real issue is that we have limited manpower, and designing and
> implementing a useful hint facility is a nontrivial project.  (Not to
> mention that maintaining such a thing in the face of frequent,
> fundamental changes to the underlying planner and executor capabilities
> would be an outright nightmare.)
>
> The people who are actually doing the work think their time is more
> usefully spent on improving the planner's intelligence than on devising
> ways to override it.
>

I know I come on strong, and I know I'm probably irritating in many ways,
however, I have been a PostgreSQL user since just after it was named
postgreSQL from Postgres95. I've seen a lot of changes, and almost all of
them have been quite good.

I have over 10 years of using it on various projects. While I have not
been lucky enough to get a gig in which I could contribute more, I do try
to contribute and sometimes it is quite difficult.

The one thing that I think you guys miss is actually using PostgreSQL in
some projects where the company and the deliverables don't give a rat's
ass about whether you use PostgreSQL or Oracle or something else. Over the
years I have beat my head against the walls suggesting features, most of
which eventually have come to PostgreSQL, but every one was a fight.

I have some well founded opinions about PostgreSQL hard earned from real
world situations. Agree or not, I have experience with this database and I
have hit many of its short comings.

One consistent problem is the planner not being able to handle this or
that scenario. At this stage, the *best* way to improve the planner is to
add the ability to place hints in the plan. It *is* good enough for 90% of
the types of queries you would ever want to do. I am dubious that you can
get it demonstrably better in the last 10% or so without making it worse.

Simple hints would go a HUGE way to improving the last 10%. Many of the
"Why doesn't PostgreSQL use my index" questions would go away. Most of the
time Tom spends looking at people's pg_stats info would drop. It would
actually save time.

As a PostgreSQL user, I can tell you with 100% confidence, if I had this
tool, I could do my job easier. I can also tell you that while I have
genuine appreciation for the current quality of the planner, I still would
like to be able to tailor queries specifically to test various approaches
for performance reasons.


Re: enforcing a plan (in brief)

From
Greg Stark
Date:
Neil Conway <neilc@samurai.com> writes:

> - good, thorough documentation of the internals (naturally this would
> help attract OSS developers as well)

I don't know what software you work with but the Postgres source is far and
away the best documented source I've had the pleasure to read. I think it's
challenging to jump into because it's a legitimately complex piece of
software, not because of any deficiency in the documentation.

> - plugin APIs that make it relatively easy to replace the implementation
> of a subsystem whole-sale (if there's a cost to these APIs in terms of
> complexity or performance, it is perhaps not worth doing)

And Postgres's extensibility features like plugin languages and indexing
methods are one of its strengths.

> - APIs that allow people to drive the planner and executor
> programmatically (as in the original question)

Actually, I think that would be a neat experiment. I've often wondered about
an environment where SQL is the source language and it's compiled statically
into data structures representing plans.

But you have to be careful, it would be easy to come up with nonsensical
plans, or even plans that would be infinite loops or cause crashes.

-- 
greg



Re: enforcing a plan (in brief)

From
Mark Kirkwood
Date:
Although this is all true, consider that adding hints will mean that the
Pg developers *never* get bug reports to drive the optimizer improvement
process. This will have the effect of stagnating its development. I
think this would be a bad thing :-)

As an aside note that DB2 UDB does not let you hint its optimizer
either...I have heard it argued (by a IBM acquaintance of mine) that
their optimizer is better than that other database's whose name begins
with O, precisely because of this (He is biased of coarse, but it is an
interesting point).

regards

Mark



pgsql@mohawksoft.com wrote:
> One consistent problem is the planner not being able to handle this or
> that scenario. At this stage, the *best* way to improve the planner is to
> add the ability to place hints in the plan. It *is* good enough for 90% of
> the types of queries you would ever want to do. I am dubious that you can
> get it demonstrably better in the last 10% or so without making it worse.
> 
> Simple hints would go a HUGE way to improving the last 10%. Many of the
> "Why doesn't PostgreSQL use my index" questions would go away. Most of the
> time Tom spends looking at people's pg_stats info would drop. It would
> actually save time.
> 
> As a PostgreSQL user, I can tell you with 100% confidence, if I had this
> tool, I could do my job easier. I can also tell you that while I have
> genuine appreciation for the current quality of the planner, I still would
> like to be able to tailor queries specifically to test various approaches
> for performance reasons.




Re: enforcing a plan (in brief)

From
"Zeugswetter Andreas DAZ SD"
Date:
> And the user maintenance of updating those hints for every release of
> PostgreSQL as we improve the database engine.

I don't think so. Basically an optimizer hint simply raises or lowers the cost
of an index, mandates a certain join order, allows or disallows a seq scan ...
Imho it is not so far from the things people currently do with the set seq_scan=
type of commands.
(I don't think actually giving a certain plan is a good idea)

A good optimizer hint system would imho not circumvent the optimizer, but only
give it hints. The hints should be very specifically aimed, like "an index on column x"
is going to be more expensive than you (the optimizer) think, if used with this query.
like: select /*+ avoid_index(atab atab_x0) */ * from atab ...

> > The people who are actually doing the work think their time is more
> > usefully spent on improving the planner's intelligence than on devising
> > ways to override it.

The subject of this mail and "override it" imho goes too far, I would like to
be able to give advice in the form of hints to the optimizer.

> One consistent problem is the planner not being able to handle this or
> that scenario. At this stage, the *best* way to improve the planner is to
> add the ability to place hints in the plan.                                       ^^^^ sql statement

I agree.

Andreas


Re: enforcing a plan (in brief)

From
Neil Conway
Date:
On Tue, 2005-02-15 at 02:38 -0500, Greg Stark wrote:
> I don't know what software you work with but the Postgres source is far and
> away the best documented source I've had the pleasure to read.

I agree the PostgreSQL source is very nice (for the most part), but I
think there could be more higher-level documentation of the internals.
For example, until a few days ago the access method API was completely
undocumented (in SGML, at least). Tom has now written some good docs for
it -- that's an example of the kind of improvement I'm talking about.
Having documents describing "how to add a new index type", "how to add a
new planner node", "how to add a new DML/DDL command", and so forth
would be cool.

-Neil