Thread: Good Optimization

Good Optimization

From
secret
Date:
   There is a simple way to optimize SQL queries involving joins to
PostgreSQL that I think should be handled by Postgre?  If one is joining
a tables a,b on attribute "x" and if one has something like x=3 then it
helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
The example below shoulds the radical speed gain of doing this, and I
think it isn't something real obvious to most people...
   Of course it could just be a common thing to do in SQL, anyway, just
thought I'd let you all know what I discovered.



Here is an example:

ftc=> explain select * from po,tickets where po_id=material_po and
po_id=8888 ;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=401.34 size=6146 width=158) ->  Index Scan using ipo_po_id_units on po  (cost=2.05 size=2
width=94) ->  Index Scan using itickets_mpou on tickets  (cost=199.64 size=70650
width=6
4)

EXPLAIN
ftc=>

ftc=> explain select * from po,tickets where po_id=material_po and
po_id=8888 an
d material_po=8888;
NOTICE:  QUERY PLAN:

Nested Loop  (cost=21.42 size=268 width=158) ->  Index Scan using ipo_po_id_units on po  (cost=2.05 size=2
width=94) ->  Index Scan using itickets_material_po on tickets  (cost=9.68
size=3073 wid
th=64)

EXPLAIN
ftc=>



Re: [SQL] Good Optimization

From
"Roderick A. Anderson"
Date:
On Wed, 7 Jul 1999, secret wrote:

> 
>     There is a simple way to optimize SQL queries involving joins to
> PostgreSQL that I think should be handled by Postgre?  If one is joining
> a tables a,b on attribute "x" and if one has something like x=3 then it
> helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
> The example below shoulds the radical speed gain of doing this, and I
> think it isn't something real obvious to most people...
> 

Also if the optimizer works similar to Oracle then the order of
where statements would also help in actual performance.  I'm mostly a
lurker on this list but have never seen anything about this issue.  When using Oracle, and won't it be nice when people
mentionPostgreSQL
 
functionality when comparing other databases, the statements at the
bottom should be the most restrictive and joins should be near the top.
select *   from po,tickets where po_id=material_po   and po_id=8888 ;

Would be the best.  Whereas
select *  from po,tickets where po_id=8888   and po_id=material_po;

would do the join then select those tuples that have a po_id of 8888.

Of course this is probably what PostgreSQL is doing already without a
requirement for the positioning.

Rod
--
Roderick A. Anderson
raanders@altoplanos.net               Altoplanos Information Systems, Inc.
Voice: 208.765.6149                            212 S. 11th Street, Suite 5
FAX: 208.664.5299                                  Coeur d'Alene, ID 83814



Re: [SQL] Good Optimization

From
Tom Lane
Date:
secret <secret@kearneydev.com> writes:
>     There is a simple way to optimize SQL queries involving joins to
> PostgreSQL that I think should be handled by Postgre?  If one is joining
> a tables a,b on attribute "x" and if one has something like x=3 then it
> helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
> The example below shoulds the radical speed gain of doing this, and I
> think it isn't something real obvious to most people...

How much *actual* speedup is there?  I don't trust the optimizer's
numbers as anything more than relative measures ;-)

I'm a bit surprised that you are getting a nested-loop plan and not
a merge or hash join.  With a merge join, at least, there ought not be
a large difference from providing the additional qual clause (I think).
What Postgres version are you using?
        regards, tom lane


Re: [SQL] Good Optimization

From
Tom Lane
Date:
"Roderick A. Anderson" <raanders@altoplanos.net> writes:
> Also if the optimizer works similar to Oracle then the order of
> where statements would also help in actual performance.  I'm mostly a
> lurker on this list but have never seen anything about this issue.
>    When using Oracle, and won't it be nice when people mention PostgreSQL
> functionality when comparing other databases, the statements at the
> bottom should be the most restrictive and joins should be near the top.

The Postgres optimizer doesn't care about the order of where clauses;
it'll pick out the clauses it thinks are most effective regardless
of the order you write them in.
        regards, tom lane


Re: [SQL] Good Optimization

From
"Roderick A. Anderson"
Date:
On Wed, 7 Jul 1999, Tom Lane wrote:

> The Postgres optimizer doesn't care about the order of where clauses;
> it'll pick out the clauses it thinks are most effective regardless
> of the order you write them in.
> 
>             regards, tom lane

As I figured it would be as soon as I opened my big mouth.

Rod
--
Roderick A. Anderson
raanders@altoplanos.net               Altoplanos Information Systems, Inc.
Voice: 208.765.6149                            212 S. 11th Street, Suite 5
FAX: 208.664.5299                                  Coeur d'Alene, ID 83814



Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
> Also if the optimizer works similar to Oracle then the order of
> where statements would also help in actual performance.  I'm mostly a
> lurker on this list but have never seen anything about this issue.
>    When using Oracle, and won't it be nice when people mention PostgreSQL
> functionality when comparing other databases, the statements at the
> bottom should be the most restrictive and joins should be near the top.
> 
>     select * 
>       from po,tickets
>      where po_id=material_po
>        and po_id=8888 ;
> 
> Would be the best.  Whereas
> 
>     select *
>       from po,tickets
>      where po_id=8888
>        and po_id=material_po;
> 
> would do the join then select those tuples that have a po_id of 8888.
> 
> Of course this is probably what PostgreSQL is doing already without a
> requirement for the positioning.

This is not your mother's database.  We don't care about statement
ordering.  :-)


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
wieck@debis.com (Jan Wieck)
Date:
>     There is a simple way to optimize SQL queries involving joins to
> PostgreSQL that I think should be handled by Postgre?  If one is joining
> a tables a,b on attribute "x" and if one has something like x=3 then it
> helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
> The example below shoulds the radical speed gain of doing this, and I
> think it isn't something real obvious to most people...
>
>     Of course it could just be a common thing to do in SQL, anyway, just
> thought I'd let you all know what I discovered.
>
>
>
> Here is an example:
> [...]

Hmmmmm,

    wouldn't   it   be  nice  to  do  such  things  automagically
    (rewriter? planner?).

    Even if I don't think that it is that  trivial  as  it  might
    look  like.  The  entire  queries  qualification  is  ONE big
    expression of cascaded  AND  and  OR  nodes.  Deciding  which
    constant  expressions  could  be added again with substituted
    Var nodes because there's  some  "VarX  ==  VarY"  and  don't
    falling into recursion traps isn't easy.

    But  no expression comparing two Var's of different RTE's can
    be put down into an  execution  subtree.  And  that  is  what
    reduces  the  amount  of  data to get joined. I think this is
    worth the efford.

    I think the best place for it will  be  at  the  top  of  the
    optimizer.    It's   more   an  optimization  issue  even  if
    implemented in rewriting technique.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
> >     There is a simple way to optimize SQL queries involving joins to
> > PostgreSQL that I think should be handled by Postgre?  If one is joining
> > a tables a,b on attribute "x" and if one has something like x=3 then it
> > helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
> > The example below shoulds the radical speed gain of doing this, and I
> > think it isn't something real obvious to most people...
> >
> >     Of course it could just be a common thing to do in SQL, anyway, just
> > thought I'd let you all know what I discovered.
> >
> >
> >
> > Here is an example:
> > [...]
> 
> Hmmmmm,
> 
>     wouldn't   it   be  nice  to  do  such  things  automagically
>     (rewriter? planner?).

Added to TODO:
* In WHERE x=3 AND x=y, add y=3              


--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     I think the best place for it will  be  at  the  top  of  the
>     optimizer.    It's   more   an  optimization  issue  even  if
>     implemented in rewriting technique.

I have thought for some time that we need an additional phase in between
the rewriter and the planner/optimizer proper.  This phase would be the
right place to do constant-subexpression folding, cnfify(), and any
other rearrangements of the parse tree that would be likely to help the
planning process.  So far, the only specific example I had in mind was
the index restriction clauses that are added for LIKE and ~ matches with
a constant initial pattern --- those are currently done by gram.y but
I think they need to happen much later.  If this idea of propagating
restriction clauses across a join is really worthwhile, then this new
optimization phase would be the right place for that too.

The reason I'd prefer to keep it separate from the rewriter is that the
rewriter is mostly concerned with functionality (implementing views etc)
whereas this new module would be concerned with optimization.  Obviously
you do not want to start worrying about optimization until you've got
the fully expanded query available.
        regards, tom lane


Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
> The reason I'd prefer to keep it separate from the rewriter is that the
> rewriter is mostly concerned with functionality (implementing views etc)
> whereas this new module would be concerned with optimization.  Obviously
> you do not want to start worrying about optimization until you've got
> the fully expanded query available.

Sounds good.

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
Patrik Kudo
Date:
Bruce Momjian wrote:
> 
> Added to TODO:
> 
>         * In WHERE x=3 AND x=y, add y=3

I don't know if I'm way off, but wouldn't removing "x=y" improve
performance further?

-- 
ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Känns det oklart? Fråga på!



Re: [SQL] Good Optimization

From
wieck@debis.com (Jan Wieck)
Date:
>
> Bruce Momjian wrote:
> >=20
> > Added to TODO:
> >=20
> >         * In WHERE x=3D3 AND x=3Dy, add y=3D3
>
> I don't know if I'm way off, but wouldn't removing "x=3Dy" improve
> performance further?

    Maybe  in  the simple case above. But it will probably change
    the result set if someone issues

        SELECT a.x, b.y FROM a, b WHERE a.x >= 10
                                    AND a.x <= 19
                                    AND a.x = b.x;

    which should then get rewritten into

        SELECT a.x, b.y FROM a, b WHERE a.x >= 10
                                    AND a.x <= 19
                                    AND a.x = b.x
                                    AND b.x >= 10
                                    AND b.x <= 19;

    This time the "a.x = b.x" is important!

    IMHO we're improving optimization more and more on  the  cost
    of  query  parse/rewrite/optimize/plan time. Thus performance
    of statements that EXECUTE fast slows  down  more  and  more.
    Isn't   it   time   to   think   about  some  (maybe  shared)
    "parsetree->plan" cache that provides ready to use  plans  if
    only Const values have changed?


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [SQL] Good Optimization

From
Patrik Kudo
Date:
Jan Wieck wrote:
> 
> >
> > Bruce Momjian wrote:
> > >=20
> > > Added to TODO:
> > >=20
> > >         * In WHERE x=3D3 AND x=3Dy, add y=3D3
> >
> > I don't know if I'm way off, but wouldn't removing "x=3Dy" improve
> > performance further?
> 
>     Maybe  in  the simple case above. But it will probably change
>     the result set if someone issues
> 
>         SELECT a.x, b.y FROM a, b WHERE a.x >= 10
>                                     AND a.x <= 19
>                                     AND a.x = b.x;
> 
>     which should then get rewritten into
> 
>         SELECT a.x, b.y FROM a, b WHERE a.x >= 10
>                                     AND a.x <= 19
>                                     AND a.x = b.x
>                                     AND b.x >= 10
>                                     AND b.x <= 19;
> 
>     This time the "a.x = b.x" is important!

Ouch... didn't think that far. However, if an cache is implemented
it might be worth the extra overhead to optimize special cases like
the one I commented on. Since most systems use a fairly limited set
of queries (I suppose...) most of the queries will be cached quite
fast.

BTW, I just love the work all of you are doing! Postgres is awsome!

/Kudo


Re: [SQL] Good Optimization

From
wieck@debis.com (Jan Wieck)
Date:
Patrik Kudo wrote:

> Jan Wieck wrote:
> >
> >     This time the "a.x = b.x" is important!
>
> Ouch... didn't think that far. However, if an cache is implemented
> it might be worth the extra overhead to optimize special cases like
> the one I commented on. Since most systems use a fairly limited set
> of queries (I suppose...) most of the queries will be cached quite
> fast.

    Exactly  the  fact  that  most  system  use  a limited set of
    queries made me think that a shared query->plan  cache,  that
    in  the best of all worlds isn't lost on backend termination,
    is worth the efford. And it would lower the  need  of  client
    side prepared execution plans as discussed sometimes.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [SQL] Good Optimization

From
Tom Lane
Date:
wieck@debis.com (Jan Wieck) writes:
>     IMHO we're improving optimization more and more on  the  cost
>     of  query  parse/rewrite/optimize/plan time. Thus performance
>     of statements that EXECUTE fast slows  down  more  and  more.
>     Isn't   it   time   to   think   about  some  (maybe  shared)
>     "parsetree->plan" cache that provides ready to use  plans  if
>     only Const values have changed?

Easier said than done, because the plan chosen by the optimizer may
well depend on the values of the constants.  For example, in
SELECT * FROM t1, t2 WHERE t1.a = t2.a AND t1.b < 12;

the optimizer will attempt to estimate the fraction of t1's rows that
will pass the restriction "t1.b < 12", and then it will choose the
type of join depending on how many rows it thinks there will be.
If we require plans to be chosen without dependence on the values of
constants, we will have to give up a great deal of optimization.

I do not object to letting the user specifically say PREPARE xyz...
and then using that prepared plan; there are plenty of times when
trading off planning time against getting a narrowly-tailored plan
is a useful thing to do.  But we mustn't pre-empt the user's choice.
Note also that in a PREPARE context, it is known which items are
substitutable parameters and which are plain constants, so some
amount of optimization can still go on.
        regards, tom lane


Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
[Charset iso-8859-1 unsupported, filtering to ASCII...]
> Bruce Momjian wrote:
> > 
> > Added to TODO:
> > 
> >         * In WHERE x=3 AND x=y, add y=3
> 
> I don't know if I'm way off, but wouldn't removing "x=y" improve
> performance further?

Let me rewrite it.  TODO list updated:
* In WHERE tab1.x=3 AND tab1.x=tab2.y, add tab2.y=3

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
>     IMHO we're improving optimization more and more on  the  cost
>     of  query  parse/rewrite/optimize/plan time. Thus performance
>     of statements that EXECUTE fast slows  down  more  and  more.
>     Isn't   it   time   to   think   about  some  (maybe  shared)
>     "parsetree->plan" cache that provides ready to use  plans  if
>     only Const values have changed?

TODO list has:
* Cache most recent query plan(s?)

--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
secret
Date:
Tom Lane wrote:

> secret <secret@kearneydev.com> writes:
> >     There is a simple way to optimize SQL queries involving joins to
> > PostgreSQL that I think should be handled by Postgre?  If one is joining
> > a tables a,b on attribute "x" and if one has something like x=3 then it
> > helps A LOT to say: a.x=3 and b.x=3 in addition to saying a.x=b.x ...
> > The example below shoulds the radical speed gain of doing this, and I
> > think it isn't something real obvious to most people...
>
> How much *actual* speedup is there?  I don't trust the optimizer's
> numbers as anything more than relative measures ;-)
>
> I'm a bit surprised that you are getting a nested-loop plan and not
> a merge or hash join.  With a merge join, at least, there ought not be
> a large difference from providing the additional qual clause (I think).
> What Postgres version are you using?
>
>                         regards, tom lane
   The actual performance difference is HUGE.  Hours vs minutes or Minutes vs
Seconds...

David Secret
MIS Director
Kearney Development Co., Inc.




Re: [SQL] Good Optimization

From
secret
Date:
Bruce Momjian wrote:

> >     IMHO we're improving optimization more and more on  the  cost
> >     of  query  parse/rewrite/optimize/plan time. Thus performance
> >     of statements that EXECUTE fast slows  down  more  and  more.
> >     Isn't   it   time   to   think   about  some  (maybe  shared)
> >     "parsetree->plan" cache that provides ready to use  plans  if
> >     only Const values have changed?
>
> TODO list has:
>
>         * Cache most recent query plan(s?)
>
> --
>   Bruce Momjian                        |  http://www.op.net/~candle
>   maillist@candle.pha.pa.us            |  (610) 853-3000
>   +  If your life is a hard drive,     |  830 Blythe Avenue
>   +  Christ can be your backup.        |  Drexel Hill, Pennsylvania 19026
   Why not let the user select how much time he wants the DB to spend on
optimization?  maybe an explicit "FAST SELECT ..." will skip the costly steps
for subsecond queries, or "OPTIMIZE SELECT" for the long(hour) queries that
really need a lot of help :)... Or even a SET OPTION= for turning it on/off?

--David




Re: [SQL] Good Optimization

From
Bruce Momjian
Date:
>     Why not let the user select how much time he wants the DB to spend on
> optimization?  maybe an explicit "FAST SELECT ..." will skip the costly steps
> for subsecond queries, or "OPTIMIZE SELECT" for the long(hour) queries that
> really need a lot of help :)... Or even a SET OPTION= for turning it on/off?

In most cases, the optimizer knows much better how long to optimize, and
the time it spends is usually very small.  Most time is spent in
executor.  Pre-6.5 optimizer had problems, so this was not true then.
--  Bruce Momjian                        |  http://www.op.net/~candle maillist@candle.pha.pa.us            |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: [SQL] Good Optimization

From
Tom Lane
Date:
secret <secret@kearneydev.com> writes:
> Tom Lane wrote:
>> How much *actual* speedup is there?  I don't trust the optimizer's
>> numbers as anything more than relative measures ;-)
>> 
>> I'm a bit surprised that you are getting a nested-loop plan and not
>> a merge or hash join.  With a merge join, at least, there ought not be
>> a large difference from providing the additional qual clause (I think).
>> What Postgres version are you using?

> The actual performance difference is HUGE.  Hours vs minutes or Minutes vs
> Seconds...

Well, yeah, it could be huge in a nested-loop scenario; in a mergejoin
I think it would make little difference.

Actually, if the inner path is indexed then this shouldn't make any
difference for a nestloop either; each probe into the inner path *ought*
to be using the value of the current outer tuple's join variable as an
indexqual constraint, which would have the same limiting effect as the
explicit restriction you propose adding.  There is code in the optimizer
that claims to be making that happen.  Sounds like it is broken :-(

What are the data types of po_id and material_po in your example?
And, again, which Postgres version exactly?
        regards, tom lane


Re: [SQL] Good Optimization

From
wieck@debis.com (Jan Wieck)
Date:
Patrik Kudo wrote:

> Jan Wieck wrote:
> >
> >     This time the "a.x = b.x" is important!
>
> Ouch... didn't think that far. However, if an cache is implemented
> it might be worth the extra overhead to optimize special cases like
> the one I commented on. Since most systems use a fairly limited set
> of queries (I suppose...) most of the queries will be cached quite
> fast.

    Exactly  the  fact  that  most  system  use  a limited set of
    queries made me think that a shared query->plan  cache,  that
    in  the best of all worlds isn't lost on backend termination,
    is worth the efford. And it would lower the  need  of  client
    side prepared execution plans as discussed sometimes.


Jan

--

#======================================================================#
# It's easier to get forgiveness for being wrong than for being right. #
# Let's break this rule - forgive me.                                  #
#========================================= wieck@debis.com (Jan Wieck) #

Re: [SQL] Good Optimization

From
Patrik Kudo
Date:
Bruce Momjian wrote:
> 
> Added to TODO:
> 
>         * In WHERE x=3 AND x=y, add y=3

I don't know if I'm way off, but wouldn't removing "x=y" improve
performance further?

-- 
ech`echo xiun|tr nu oc|sed 'sx\([sx]\)\([xoi]\)xo un\2\1 is xg'`ol
Känns det oklart? Fråga på!