Thread: Any optimizations to the join code in 7.1?

Any optimizations to the join code in 7.1?

From
Mike Mascari
Date:
Hello.

I have a particular query which performs a 15-way join; I believe in 
normalization ;-). Under 7.0.3, using the defaults where GEQO is 
enabled after 11, the query (which returns 1 row) takes 10 seconds. 
With GEQO turned off, it takes 18 seconds. Naturally I intend to 
upgrade as soon as possible, but I looked through the change log and 
didn't see anything specific WRT large joins. I was wondering if any 
work had been done in that area for 7.1. I realize you can only 
squeeze so much blood from stone, but....

Thanks for any info,

Mike Mascari
mascarm@mascari.com



Re: Any optimizations to the join code in 7.1?

From
Tom Lane
Date:
Mike Mascari <mascarm@mascari.com> writes:
> I have a particular query which performs a 15-way join;

You should read 
http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html
        regards, tom lane


Re: Any optimizations to the join code in 7.1?

From
Joel Burton
Date:
On Wed, 25 Apr 2001, Tom Lane wrote:

> Mike Mascari <mascarm@mascari.com> writes:
> > I have a particular query which performs a 15-way join;
> 
> You should read 
> http://www.postgresql.org/devel-corner/docs/postgres/explicit-joins.html

I was recently poring over this page myself, as I've been working w/some
larger-than-usual queries.

Two questions:

1) it appears (from my tests) that SELECT * FROM
  CREATE VIEW joined as  SELECT p.id,         p.pname,         c.cname  FROM   p  LEFT OUTER JOIN c using (id)
  gives the same answer as SELECT * FROM
  CREATE VIEW nested  SELECT p.id,         p.pname,         (select c.cname from c where c.id = p.id)  FROM   p
  However, I often am writing VIEWs that will be used by developers  in  a front-end system. Usually, this view might
have30 items in the  select clause, but the developer using it is likely to only as for  four or five items. In this
case,I often prefer the  subquery form because it appears that
 
  SELECT id, pname FROM joined
  is more complicated than
  SELECT id, pname FROM nested
  as the first has to perform the join, and the second doesn't.
  Is this actually correct?

2) The explicit-joins help suggests that manual structuring and  experimentation might help -- has anyone written (or
could anyone write) anthing about where to start in guessing what  join order might be optimal?
 


-- 
Joel Burton   <jburton@scw.org>
Director of Information Systems, Support Center of Washington



Re: Any optimizations to the join code in 7.1?

From
Tom Lane
Date:
Joel Burton <jburton@scw.org> writes:
> 1) it appears (from my tests) that SELECT * FROM

>    CREATE VIEW joined as
>    SELECT p.id,
>           p.pname,
>           c.cname
>    FROM   p
>    LEFT OUTER JOIN c using (id)

>    gives the same answer as SELECT * FROM

>    CREATE VIEW nested
>    SELECT p.id,
>           p.pname,
>           (select c.cname from c where c.id = p.id)
>    FROM   p

Only if c.id is a unique column (ie, there are always 0 or 1 matches in
c for any given p.id).  Otherwise the subselect form will fail.

>    However, I often am writing VIEWs that will be used by developers
>    in  a front-end system. Usually, this view might have 30 items in the
>    select clause, but the developer using it is likely to only as for
>    four or five items. In this case, I often prefer the
>    subquery form because it appears that
>    SELECT id, pname FROM joined
>    is more complicated than
>    SELECT id, pname FROM nested
>    as the first has to perform the join, and the second doesn't.

>    Is this actually correct?

This approach is probably reasonable if the cname field of the view
result is seldom wanted at all, and never used as a WHERE constraint.
You'd get a very nonoptimal plan if someone did
select * from nested where cname like 'foo%'

since the planner has no way to use the LIKE constraint to limit the
rows fetched from p.  In the JOIN format, on the other hand, I think
the constraint could be exploited.

Also bear in mind that the subselect form is essentially forcing the
join to be done via a nested loop.  If you have an index on c.id then
this may not be too bad, but without one the performance will be
horrid.  Even with an index, nested loop with inner indexscan is not
the join method of choice if you are retrieving a lot of rows.

> 2) The explicit-joins help suggests that manual structuring and
>    experimentation might help -- has anyone written (or could
>    anyone write) anthing about where to start in guessing what
>    join order might be optimal?

The obvious starting point is the plan produced by the planner from an
unconstrained query.  Even if you don't feel like trying to improve it,
you could cut the time to reproduce the plan quite a bit --- just CROSS
JOIN a few of the relation pairs that are joined first in the
unconstrained plan.
        regards, tom lane


Re: Any optimizations to the join code in 7.1?

From
Joel Burton
Date:
On Wed, 25 Apr 2001, Tom Lane wrote:

> > 2) The explicit-joins help suggests that manual structuring and
> >    experimentation might help -- has anyone written (or could
> >    anyone write) anthing about where to start in guessing what
> >    join order might be optimal?
> 
> The obvious starting point is the plan produced by the planner from an
> unconstrained query.  Even if you don't feel like trying to improve it,
> you could cut the time to reproduce the plan quite a bit --- just CROSS
> JOIN a few of the relation pairs that are joined first in the
> unconstrained plan.

In other words, let it do the work, and steal the credit for
ourselves. :-)

Thanks, Tom. I appreciate your answers to my questions.



In other DB systems I've used, some find that for this original query:
 SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';

that this version 
 SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';

has slower performance than SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';

because it can reduce b before any join. 

Is it safe to assume that this is a valid optimization in PostgreSQL?


If this whole thing were a view, except w/o the WHERE clause, and we were
querying the view w/the b.name WHERE clause, would we still see a
performance boost from the right arrangement? (ie, does our criteria get
pushed down early enough in the joining process?)


TIA,
-- 
Joel Burton   <jburton@scw.org>
Director of Information Systems, Support Center of Washington



Re: Any optimizations to the join code in 7.1?

From
Tom Lane
Date:
Joel Burton <jburton@scw.org> writes:
> In other DB systems I've used, some find that for this original query:
>   SELECT * FROM a, b WHERE a.id=b.id AND b.name = 'foo';
> that this version 
>   SELECT * FROM a JOIN b USING (id) WHERE b.name = 'foo';
> has slower performance than
>   SELECT * FROM b JOIN a USING (id) WHERE b.name = 'foo';
> because it can reduce b before any join. 

> Is it safe to assume that this is a valid optimization in PostgreSQL?

In general, that'd be a waste of time --- our planner considers the same
set of plans in either case.

However, it could make a difference if the planner thinks that the two
choices (a outer or b outer) have exactly the same cost.  In that case
the order you wrote them in will influence which plan actually gets
picked; and if the planner's estimate is wrong --- ie, there really is a
considerable difference in the costs --- then you could see a change in
performance depending on which way you wrote it.  That's a pretty
unusual circumstance, maybe, but it just happens that I'm in the middle
of looking at a planning bug wherein exactly this behavior occurs...

> If this whole thing were a view, except w/o the WHERE clause, and we were
> querying the view w/the b.name WHERE clause, would we still see a
> performance boost from the right arrangement? (ie, does our criteria get
> pushed down early enough in the joining process?)

Shouldn't make a difference; AFAIK the WHERE clause will get pushed down
as far as possible, independently of whether a view is involved or you
wrote it out the hard way.
        regards, tom lane


RE: Any optimizations to the join code in 7.1?

From
Mike Mascari
Date:
Sorry for the delay in the response. It took be a day to get 
everything upgraded to 7.1. To restate the problem -  in 7.0 with 
GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it 
took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed 
and shocked at this damned impressive improvement in planning 
speed....until I actually used the explicit JOIN syntax described in 
11.2. Instanteous results! Instantaneous.....

Thanks a bunch,
(still in shock)

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From:    Tom Lane [SMTP:tgl@sss.pgh.pa.us]
Sent:    Wednesday, April 25, 2001 12:42 PM
To:    mascarm@mascari.com
Cc:    'pgsql-hackers@postgresql.org'
Subject:    Re: [HACKERS] Any optimizations to the join code in 7.1?

Mike Mascari <mascarm@mascari.com> writes:
> I have a particular query which performs a 15-way join;

You should read
http://www.postgresql.org/devel-corner/docs/postgres/explicit-join  
s.html
        regards, tom lane




Re: Any optimizations to the join code in 7.1?

From
Bruce Momjian
Date:
You can thank Tom Lane for most/all of our optimization improvements.

> Sorry for the delay in the response. It took be a day to get 
> everything upgraded to 7.1. To restate the problem -  in 7.0 with 
> GEQO enabled, a 15-way join took 10 seconds. With GEQO disabled it 
> took 18 seconds. 7.1 out of the box took only 2 seconds! I was amazed 
> and shocked at this damned impressive improvement in planning 
> speed....until I actually used the explicit JOIN syntax described in 
> 11.2. Instanteous results! Instantaneous.....
> 
> Thanks a bunch,
> (still in shock)
> 
> Mike Mascari
> mascarm@mascari.com
> 
> -----Original Message-----
> From:    Tom Lane [SMTP:tgl@sss.pgh.pa.us]
> Sent:    Wednesday, April 25, 2001 12:42 PM
> To:    mascarm@mascari.com
> Cc:    'pgsql-hackers@postgresql.org'
> Subject:    Re: [HACKERS] Any optimizations to the join code in 7.1?
> 
> Mike Mascari <mascarm@mascari.com> writes:
> > I have a particular query which performs a 15-way join;
> 
> You should read
> http://www.postgresql.org/devel-corner/docs/postgres/explicit-join  
> s.html
> 
>             regards, tom lane
> 
> 
> 
> ---------------------------(end of broadcast)---------------------------
> TIP 2: you can get off all lists at once with the unregister command
>     (send "unregister YourEmailAddressHere" to majordomo@postgresql.org)
> 

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
853-3000+  If your life is a hard drive,     |  830 Blythe Avenue +  Christ can be your backup.        |  Drexel Hill,
Pennsylvania19026
 


Re: Any optimizations to the join code in 7.1?

From
Thomas Lockhart
Date:
> ... 7.1 out of the box took only 2 seconds! I was amazed
> and shocked at this damned impressive improvement in planning
> speed....until I actually used the explicit JOIN syntax described in
> 11.2. Instanteous results! Instantaneous.....

But it is possible, under many circumstances, for query optimization to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and your
choice of table order.

Presumably we will be interested in making these two forms of inner join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?
                     - Thomas


RE: Re: Any optimizations to the join code in 7.1?

From
Mike Mascari
Date:
What would be nice, and I don't know how it would be done or what the 
syntax would be, would be a feature that allows PostgreSQL to skip 
not only the parsing stage, but the planning stage as well. Then, 
when the data has changed dramatically enough to warrant it, as you 
point out, a command can be issued to 'refresh' the query plan. My 
15-way join has expanded to a 19-way join and is still instantaneous, 
albeit on a very small set of data. Before 7.1, the query would 
simply have taken far too long, and I would have had to denormalize 
the database for performance purposes. With the explicit join syntax, 
it allows me to design the database 'the right way'. I basically used 
EXPLAIN SELECT... to determine the explicit join order, so as the 
data changes, its something I'll have to do on occassion to ensure 
good performance, but at least its now possible. :-)

Mike Mascari
mascarm@mascari.com

-----Original Message-----
From:    Thomas Lockhart [SMTP:lockhart@alumni.caltech.edu]
Sent:    Friday, April 27, 2001 9:49 PM
To:    mascarm@mascari.com; 'Tom Lane'
Cc:    'pgsql-hackers@postgresql.org'
Subject:    [HACKERS] Re: Any optimizations to the join code in 7.1?

> ... 7.1 out of the box took only 2 seconds! I was amazed
> and shocked at this damned impressive improvement in planning
> speed....until I actually used the explicit JOIN syntax described 
in
> 11.2. Instanteous results! Instantaneous.....

But it is possible, under many circumstances, for query optimization 
to
be a benefit for a many-table query. The docs indicate that explicit
join syntax bypasses that, even for inner joins, so you may find that
this syntax is a net loss in performance depending on the query and 
your
choice of table order.

Presumably we will be interested in making these two forms of inner 
join
equivalent in behavior in a future release. Tom, what are the
impediments we might encounter in doing this?
                     - Thomas

---------------------------(end of broadcast)---------------------  
------
TIP 4: Don't 'kill -9' the postmaster


Re: Re: Any optimizations to the join code in 7.1?

From
Tom Lane
Date:
Thomas Lockhart <lockhart@alumni.caltech.edu> writes:
> But it is possible, under many circumstances, for query optimization to
> be a benefit for a many-table query. The docs indicate that explicit
> join syntax bypasses that, even for inner joins, so you may find that
> this syntax is a net loss in performance depending on the query and your
> choice of table order.

> Presumably we will be interested in making these two forms of inner join
> equivalent in behavior in a future release. Tom, what are the
> impediments we might encounter in doing this?

I don't think there are any real technical problems in the way; it's
simply an implementation choice not to treat INNER JOIN the same as an
implicit join list.  I did it that way in 7.1 mainly as a flyer, to see
how many people would think it's a feature vs. how many think it's a
bug.  The votes aren't all in yet, but here we have Mike apparently
pretty pleased with it, while I recall at least one other person who
was not happy with the 7.1 behavior.
        regards, tom lane