Thread: Optimizer picks an ineffient plan

Optimizer picks an ineffient plan

From
"Bupp Phillips"
Date:
I have a customer table that has the field CUSTOMER_ID as the primary key
(cust_pkkey), the table has 102,834 records in it.


The following select statement works fine:

select * from customer order by customer_id;
QUERY PLAN:
Index Scan using cust_pkkey on customer  (cost=0.00..5175.17 rows=102834
width=724)
Total runtime: 5999.47 msec

but...

select * from customer order by customer_id, first_name;
QUERY PLAN:
Sort(cost=142028.25..142028.25 rows=102834 width=724)
 -> Seq Scan on customer (cost=0.00..4617.34 rows=102834 width=724)
Total runtime: 19999.81 msec


It seems that the optimizer should be able to detect (in this instance at
least) that the first order by field is a primary key and should not
consider the other fields because it's pointless... the resultset will be in
<primary key> order.

NOTE:  I'm testing this on Postgresql 7.2 for Windows, so this my have
already been dealt with.


Thanks and keep up the great work!!



Re: Optimizer picks an ineffient plan

From
Greg Stark
Date:
"Bupp Phillips" <hello@noname.com> writes:

> but...
>
> select * from customer order by customer_id, first_name;
> QUERY PLAN:
> Sort(cost=142028.25..142028.25 rows=102834 width=724)
>  -> Seq Scan on customer (cost=0.00..4617.34 rows=102834 width=724)
> Total runtime: 19999.81 msec

Actually in this case the optimizer would likely still use a sequential scan
even if it had an index it thought it could use. If you're going to be reading
the whole table anyways it'll be faster to read it in order than to jump all
around even if you have to sort it.

However you do have a point. In this case I don't think postgres even
considers using the index. Even if it would decide not to use it in this case
there could conceivably be cases where it would want to use it.

However I'm not sure I see a lot of cases where this would come up. Even in
automatically generated code, which is the usual cause of redundant things
like this, I don't think I've seen this particular combination ever come up.


--
greg

Re: Optimizer picks an ineffient plan

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> "Bupp Phillips" <hello@noname.com> writes:
>> select * from customer order by customer_id, first_name;
>> [ where customer_id is the primary key ]

> However you do have a point. In this case I don't think postgres even
> considers using the index.

It will not, since the index does not appear to provide the correct sort
order.

> However I'm not sure I see a lot of cases where this would come up.

Yes, that's the real crux of the matter.  Should the optimizer spend
cycles on *every* query to detect cases where the user has written
useless sort keys?  I've got grave doubts that it's a win.  ISTM such
an optimization penalizes the folk who write their queries well to
benefit those who are careless.

            regards, tom lane

Re: Optimizer picks an ineffient plan

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> > However I'm not sure I see a lot of cases where this would come up.
>
> Yes, that's the real crux of the matter.  Should the optimizer spend
> cycles on *every* query to detect cases where the user has written
> useless sort keys?  I've got grave doubts that it's a win.  ISTM such
> an optimization penalizes the folk who write their queries well to
> benefit those who are careless.

Well I'm sure the same arguments were made 30 years ago about optimizing
compilers. But thankfully the optimizer-geeks won the argument. As a result
these days we can more or less write our code in whatever form is most
readable and flexible. We can spend our time worrying about algorithmic
improvements and design abstraction, and not worry that the extra layer of
abstraction will introduce redundant code or inefficient expressions.

Already today we see database-independent toolkits that write SQL queries for
you. They introduce needless subqueries and other inefficiencies willy-nilly.

I argue that if the performance of the database is important down to a
sub-second constant term per query, then you're talking about an OLTP system
where all the queries ought to be prepared and the plans cached. If you're
talking about a system where queries are constructed ad-hoc for every
execution then you should be talking about a DSS system running batch jobs
where an extra few seconds spent optimizing could save you hours.

All that said I'm not sure the case at hand is a great example. I don't think
it would be a big performance loss, but the added code complexity for nothing
might be annoying. I don't see how even automatically generated code is likely
to generate this situation.

--
greg

Re: Optimizer picks an ineffient plan

From
Tom Lane
Date:
Greg Stark <gsstark@mit.edu> writes:
> Tom Lane <tgl@sss.pgh.pa.us> writes:
>> Yes, that's the real crux of the matter.  Should the optimizer spend
>> cycles on *every* query to detect cases where the user has written
>> useless sort keys?  I've got grave doubts that it's a win.

> Well I'm sure the same arguments were made 30 years ago about optimizing
> compilers. But thankfully the optimizer-geeks won the argument.

Um ... I *am* an optimizer-geek.  You can find my name in the credits
for Bliss/11, which was the best optimizing compiler on the planet about
thirty years ago.  I stand by my comment that there's a tradeoff between
the potential gain from an optimization and the time spent to find it.

PG is at a disadvantage compared to typical compilation scenarios, in
that a compiler assumes its output will be executed many times, while
SQL queries often are planned and then executed but once.  There's been
some talk of working harder when planning a "prepared statement", but
so far I've not seen very many places where I'd really want to alter
the planner's behavior on that basis.

            regards, tom lane

Re: Optimizer picks an ineffient plan

From
Peter Childs
Date:
On Thu, 4 Sep 2003, Tom Lane wrote:

> Greg Stark <gsstark@mit.edu> writes:
> > Tom Lane <tgl@sss.pgh.pa.us> writes:
> >> Yes, that's the real crux of the matter.  Should the optimizer spend
> >> cycles on *every* query to detect cases where the user has written
> >> useless sort keys?  I've got grave doubts that it's a win.
>
> > Well I'm sure the same arguments were made 30 years ago about optimizing
> > compilers. But thankfully the optimizer-geeks won the argument.
>
> Um ... I *am* an optimizer-geek.  You can find my name in the credits
> for Bliss/11, which was the best optimizing compiler on the planet about
> thirty years ago.  I stand by my comment that there's a tradeoff between
> the potential gain from an optimization and the time spent to find it.
>
> PG is at a disadvantage compared to typical compilation scenarios, in
> that a compiler assumes its output will be executed many times, while
> SQL queries often are planned and then executed but once.  There's been
> some talk of working harder when planning a "prepared statement", but
> so far I've not seen very many places where I'd really want to alter
> the planner's behavior on that basis.
>

    An intresting point. Perhaps storing some stats on Views would
help. Maybe adding a cache facility for views would speed some things up.
I don't really see anything against storing stats on "Prepared
Statements" and Views like we do on Tables.
    Maybe indexs on View would be useful but keeping them uptodate
would be a hazard.

Peter Childs


Re: Optimizer picks an ineffient plan

From
Greg Stark
Date:
Tom Lane <tgl@sss.pgh.pa.us> writes:

> Um ... I *am* an optimizer-geek.

That explains so much :)

> I stand by my comment that there's a tradeoff between the potential gain
> from an optimization and the time spent to find it.

Well there are always tradoffs in engineering. I'm just trying to push a
little bit in one direction and make you rethink your assumptions.

In the past postgres users were largely OLTP systems (largely web sites)
running ad-hoc queries that are parsed and executed every time and interpolate
parameters into the query string. That's crap. With the new FE binary
protocol, and a bit of a push to the driver writers a good high performance
(and secure) system ought to be able to run entirely prepared queries that are
prepared once per backend process and executed thousands of times.

> PG is at a disadvantage compared to typical compilation scenarios, in
> that a compiler assumes its output will be executed many times, while
> SQL queries often are planned and then executed but once.  There's been
> some talk of working harder when planning a "prepared statement", but
> so far I've not seen very many places where I'd really want to alter
> the planner's behavior on that basis.

I think that's backwards actually. The queries where you would want to work
super extra hard, spending a few seconds or even minutes checking possible
plans are the ones that will run for hours. Those are more likely to be DSS
queries that are unprepared ad-hoc queries that will be executed only once.

For OLTP queries I think postgres can afford to not worry about small
(subsecond) constant-time optimizations even if they're unlikely to return big
benefits because OLTP systems should be running entirely prepared queries.

--
greg

Re: Optimizer picks an ineffient plan

From
"Bupp Phillips"
Date:
Well, it's unfortunate that you feel that way, because SQL Server handles it
correctly.


"Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
news:4375.1062643465@sss.pgh.pa.us...
> Greg Stark <gsstark@mit.edu> writes:
> > "Bupp Phillips" <hello@noname.com> writes:
> >> select * from customer order by customer_id, first_name;
> >> [ where customer_id is the primary key ]
>
> > However you do have a point. In this case I don't think postgres even
> > considers using the index.
>
> It will not, since the index does not appear to provide the correct sort
> order.
>
> > However I'm not sure I see a lot of cases where this would come up.
>
> Yes, that's the real crux of the matter.  Should the optimizer spend
> cycles on *every* query to detect cases where the user has written
> useless sort keys?  I've got grave doubts that it's a win.  ISTM such
> an optimization penalizes the folk who write their queries well to
> benefit those who are careless.
>
> regards, tom lane
>
> ---------------------------(end of broadcast)---------------------------
> TIP 8: explain analyze is your friend
>



Re: Optimizer picks an ineffient plan

From
Stephan Szabo
Date:
On Wed, 3 Sep 2003, Bupp Phillips wrote:

> Well, it's unfortunate that you feel that way, because SQL Server handles it
> correctly.

For some definition of correctly.  If you're in a system which gets
penalized .001 seconds for each query planning that uses a multi-column
order by and you do 100 million of them that this doesn't apply to, and
one that it does which save you 30 seconds, is that correct?


> "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> news:4375.1062643465@sss.pgh.pa.us...
> > Greg Stark <gsstark@mit.edu> writes:
> > > "Bupp Phillips" <hello@noname.com> writes:
> > >> select * from customer order by customer_id, first_name;
> > >> [ where customer_id is the primary key ]
> >
> > > However you do have a point. In this case I don't think postgres even
> > > considers using the index.
> >
> > It will not, since the index does not appear to provide the correct sort
> > order.
> >
> > > However I'm not sure I see a lot of cases where this would come up.
> >
> > Yes, that's the real crux of the matter.  Should the optimizer spend
> > cycles on *every* query to detect cases where the user has written
> > useless sort keys?  I've got grave doubts that it's a win.  ISTM such
> > an optimization penalizes the folk who write their queries well to
> > benefit those who are careless.
> >
> > regards, tom lane
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 8: explain analyze is your friend
> >
>
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 5: Have you checked our extensive FAQ?
>
>                http://www.postgresql.org/docs/faqs/FAQ.html
>


Re: Optimizer picks an ineffient plan

From
Greg Stark
Date:
Stephan Szabo <sszabo@megazone.bigpanda.com> writes:

> On Wed, 3 Sep 2003, Bupp Phillips wrote:
>
> > Well, it's unfortunate that you feel that way, because SQL Server handles it
> > correctly.
>
> For some definition of correctly.  If you're in a system which gets
> penalized .001 seconds for each query planning that uses a multi-column
> order by and you do 100 million of them that this doesn't apply to, and
> one that it does which save you 30 seconds, is that correct?

Uhm. Yes. Absolutely.

For OLTP systems like a web site if there's a single query anywhere in the
system that takes 30s the system is broken. Every time that query happens to
be hit a few times in a row the OLTP system will simply break. This isn't a
performance issue, it's outright broken.

Whereas a 1ms performance hit on every query plan will make virtually no
difference at all in performance and most importantly, it will work. At worst
it means provisioning a 1% faster machine. Even then only if my system isn't
preparing all these queries in advance, in which case I have bigger
performance and security issues than a 1ms per query hit.

For DSS systems handling queries that take minutes or hours to run the case is
even clearer. The 1ms hit is even less of an issue, and the 30s gain will
balloon and turn into minutes or hours of speedup.

I'm pretty sure this particular case was not one of the cases where people
said it just wasn't worth doing. This was just too hard. Solving it in a way
that integrates cleanly with postgres's aggregates will be very hard and
require someone who understands lots of different parts of postgres to spend
an awful lot of time thinking about the problem.

[This is one of the strength's of free software. There's no marketing
department providing checklists that have to be met even if there's no good
solution today. So instead of a bad solution that sticks around for a long
time, we'll one day (hopefully) have a good solution when someone figures out
how to do it.]

--
greg

Re: Optimizer picks an ineffient plan

From
Stephan Szabo
Date:
On 5 Sep 2003, Greg Stark wrote:

>
> Stephan Szabo <sszabo@megazone.bigpanda.com> writes:
>
> > On Wed, 3 Sep 2003, Bupp Phillips wrote:
> >
> > > Well, it's unfortunate that you feel that way, because SQL Server handles it
> > > correctly.
> >
> > For some definition of correctly.  If you're in a system which gets
> > penalized .001 seconds for each query planning that uses a multi-column
> > order by and you do 100 million of them that this doesn't apply to, and
> > one that it does which save you 30 seconds, is that correct?
>
> Uhm. Yes. Absolutely.

Even if all the people that don't write queries with redundant sort
columns pay the cost for those that do? In some ways it'd be nice if we
could control the planner more so that individual systems could make
determinations of whether this was useful or not.

From the below, I'm not sure we're talking about the same case any longer.
:)

> I'm pretty sure this particular case was not one of the cases where people
> said it just wasn't worth doing. This was just too hard. Solving it in a way
> that integrates cleanly with postgres's aggregates will be very hard and
> require someone who understands lots of different parts of postgres to spend
> an awful lot of time thinking about the problem.

I'm not sure how finding redundant sort columns due to unique constraints
really integrates with aggregates at all. Did we maybe cross conversation
with one of the aggregate discussions on min/max/count?


Re: Optimizer picks an ineffient plan

From
"Relaxin"
Date:
What the heck are you talking about?

SQLServer returns the query immediately (milliseconds) where as PGSQL
returns 6 seconds!!
So, yes I would say that SQLServer did it correctly in this situation.  The
optimizer did what it was suppose to do...find the quickest way to get me
the data.
Now if you guys here a PGSQL don't believe it's necessary to change the
optimizer for this purpose, that's fine, but don't start putting other DBMS
down because they DO handle this situation.


"Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote in message
news:20030904132334.E44205-100000@megazone.bigpanda.com...
> On Wed, 3 Sep 2003, Bupp Phillips wrote:
>
> > Well, it's unfortunate that you feel that way, because SQL Server
handles it
> > correctly.
>
> For some definition of correctly.  If you're in a system which gets
> penalized .001 seconds for each query planning that uses a multi-column
> order by and you do 100 million of them that this doesn't apply to, and
> one that it does which save you 30 seconds, is that correct?
>
>
> > "Tom Lane" <tgl@sss.pgh.pa.us> wrote in message
> > news:4375.1062643465@sss.pgh.pa.us...
> > > Greg Stark <gsstark@mit.edu> writes:
> > > > "Bupp Phillips" <hello@noname.com> writes:
> > > >> select * from customer order by customer_id, first_name;
> > > >> [ where customer_id is the primary key ]
> > >
> > > > However you do have a point. In this case I don't think postgres
even
> > > > considers using the index.
> > >
> > > It will not, since the index does not appear to provide the correct
sort
> > > order.
> > >
> > > > However I'm not sure I see a lot of cases where this would come up.
> > >
> > > Yes, that's the real crux of the matter.  Should the optimizer spend
> > > cycles on *every* query to detect cases where the user has written
> > > useless sort keys?  I've got grave doubts that it's a win.  ISTM such
> > > an optimization penalizes the folk who write their queries well to
> > > benefit those who are careless.
> > >
> > > regards, tom lane
> > >
> > > ---------------------------(end of
broadcast)---------------------------
> > > TIP 8: explain analyze is your friend
> > >
> >
> >
> >
> > ---------------------------(end of broadcast)---------------------------
> > TIP 5: Have you checked our extensive FAQ?
> >
> >                http://www.postgresql.org/docs/faqs/FAQ.html
> >
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: the planner will ignore your desire to choose an index scan if your
>       joining column's datatypes do not match
>



Re: Optimizer picks an ineffient plan

From
Stephan Szabo
Date:
On Thu, 4 Sep 2003, Relaxin wrote:

> What the heck are you talking about?

Whether or not trying every optimization is worthwhile in every potential
situation where it might apply, in theory.

> SQLServer returns the query immediately (milliseconds) where as PGSQL
> returns 6 seconds!!
> So, yes I would say that SQLServer did it correctly in this situation.  The
> optimizer did what it was suppose to do...find the quickest way to get me
> the data.

*If* you had other queries that took 6 seconds rather than milliseconds
because it was trying to work out if the optimization applied, would you
still think it was correct to be trying the optimization on those queries?
What about if it added 50 ms to every query you did with an order by, and
you never used redundant sort columns?

> Now if you guys here a PGSQL don't believe it's necessary to change the
> optimizer for this purpose, that's fine, but don't start putting other DBMS
> down because they DO handle this situation.

Where did I put down another DBMS? (I've left my text below).  I said (not
in so many words), some optimizations take time, always doing every
available optimization may not be "correct" because it may affect the
running time of other queries adversely.

I don't really know if this particular case warrents doing, I don't know
if the time involved in checking it is entirely negligible or not for
complicated queries.  I do know that there are people who use order by
however and do not use redundant sort columns and that if there is a
non-negligible cost, they're the ones that are really going to be paying
for it.  Tom believed that the cost was non-negligible for its benefit, if
someone else wants to do it and get numbers and the costs are negligible,
it's likely to get put in.

> "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote in message
> news:20030904132334.E44205-100000@megazone.bigpanda.com...
> > On Wed, 3 Sep 2003, Bupp Phillips wrote:
> >
> > > Well, it's unfortunate that you feel that way, because SQL Server
> handles it
> > > correctly.
> >
> > For some definition of correctly.  If you're in a system which gets
> > penalized .001 seconds for each query planning that uses a multi-column
> > order by and you do 100 million of them that this doesn't apply to, and
> > one that it does which save you 30 seconds, is that correct?


Re: Optimizer picks an ineffient plan

From
"Relaxin"
Date:
Regardless of your argument, all I know is that SQL Server handles it
correctly (giving me the data in the fastest possible way) and postgresql
doesn't.
And your argument about how it will increase other queries is pointless to
me. Will you still stand by this argument when the PG folks find a situation
where the optimizer creates a terrible plan and the only way to fix is to
add additional logic, based on what you are saying, it's not worth it
because it could add .001 seconds of additional processing.
 Again, like I said, if the PG folks don't see a need for this type of
optimization, then that's fine. What I'm stating is that SQLServer is a very
fast and robust database that also handles this, as you may call it, odd
situation.

Thanks


"Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote in message
news:20030905071003.P73820-100000@megazone.bigpanda.com...
>
> On Thu, 4 Sep 2003, Relaxin wrote:
>
> > What the heck are you talking about?
>
> Whether or not trying every optimization is worthwhile in every potential
> situation where it might apply, in theory.
>
> > SQLServer returns the query immediately (milliseconds) where as PGSQL
> > returns 6 seconds!!
> > So, yes I would say that SQLServer did it correctly in this situation.
The
> > optimizer did what it was suppose to do...find the quickest way to get
me
> > the data.
>
> *If* you had other queries that took 6 seconds rather than milliseconds
> because it was trying to work out if the optimization applied, would you
> still think it was correct to be trying the optimization on those queries?
> What about if it added 50 ms to every query you did with an order by, and
> you never used redundant sort columns?
>
> > Now if you guys here a PGSQL don't believe it's necessary to change the
> > optimizer for this purpose, that's fine, but don't start putting other
DBMS
> > down because they DO handle this situation.
>
> Where did I put down another DBMS? (I've left my text below).  I said (not
> in so many words), some optimizations take time, always doing every
> available optimization may not be "correct" because it may affect the
> running time of other queries adversely.
>
> I don't really know if this particular case warrents doing, I don't know
> if the time involved in checking it is entirely negligible or not for
> complicated queries.  I do know that there are people who use order by
> however and do not use redundant sort columns and that if there is a
> non-negligible cost, they're the ones that are really going to be paying
> for it.  Tom believed that the cost was non-negligible for its benefit, if
> someone else wants to do it and get numbers and the costs are negligible,
> it's likely to get put in.
>
> > "Stephan Szabo" <sszabo@megazone.bigpanda.com> wrote in message
> > news:20030904132334.E44205-100000@megazone.bigpanda.com...
> > > On Wed, 3 Sep 2003, Bupp Phillips wrote:
> > >
> > > > Well, it's unfortunate that you feel that way, because SQL Server
> > handles it
> > > > correctly.
> > >
> > > For some definition of correctly.  If you're in a system which gets
> > > penalized .001 seconds for each query planning that uses a
multi-column
> > > order by and you do 100 million of them that this doesn't apply to,
and
> > > one that it does which save you 30 seconds, is that correct?
>
>
> ---------------------------(end of broadcast)---------------------------
> TIP 7: don't forget to increase your free space map settings
>



Re: Optimizer picks an ineffient plan

From
Stephan Szabo
Date:
On Fri, 5 Sep 2003, Relaxin wrote:

> And your argument about how it will increase other queries is pointless to
> me. Will you still stand by this argument when the PG folks find a situation
> where the optimizer creates a terrible plan and the only way to fix is to
> add additional logic, based on what you are saying, it's not worth it
> because it could add .001 seconds of additional processing.

I've tried to say that you need to balance the cost of the optimization
against its potential gain in the cases it helps, the frequency of those
cases, the ability to solve the problems other ways and against the
development time that is used in it.  There isn't a push button with two
states, best and not best.

When you want to argue about an optimization you should think about:
 a) What does the optimization actually mean (actual specification, not
vague words)
 b) What cases is the optimization legal for (doesn't change results,
doesn't violate spec, etc)
 c) What cases does the optimization help (as separate from b in that some
cases may not actually be faster but the optimization is legal)
 d) What is the gain over the cases that the optimization helps
 e) What is the penalty over the cases that the optimization does not help

If someone feels strongly about it or feels that they have the time,
they can (and should be generally promoted to) attempt the optimization.
If the optimization is not expensive or has better than expected gains or
good side effects, it's likely to get accepted.