Thread: Optimization via explicit JOINs

Optimization via explicit JOINs

From
David Olbersen
Date:
Greetings, I've been toying aroudn with postgres 7.1beta5's ability to control the planner via explicitely JOINing
tables.I then (just for giggles) compare the difference in the EXPLAIN results.
 
 I'm no super-mondo-DBA or anything, but in my two attempts so far, the numbers I get out of EXPLAIN have been about
1/2as small.
 
 Below are two EXPLAIN results, am I correct in reading that one is indeed "twice as fast" as the other? I say twice as
fastbecause the top-most cost in the first query is 58.62, but in the second one it's only 32.09. Am I reading this
correctly?

-- First EXPLAIN --
Sort  (cost=58.62..58.62 rows=14 width=60) ->  Nested Loop  (cost=0.00..58.35 rows=14)   ->  Nested Loop
(cost=0.00..29.99rows=14)     ->  Seq Scan on playlist p  (cost=0.00..1.61 rows=14)     ->  Index Scan using songs_pkey
onsongs s (cost=0.00..2.01 rows=1)   ->  Index Scan using artists_pkey on artists a (cost=0.00..2.01 rows=1)
 


-- Second EXPLAIN --
Sort  (cost=32.09..32.09 rows=1) ->  Nested Loop  (cost=0.00..32.08 rows=1)   ->  Nested Loop  (cost=0.00..30.06
rows=1)    ->  Seq Scan on playlist p  (cost=0.00..1.61 rows=14)     ->  Index Scan using songs_pkey on songs s
(cost=0.00..2.02rows=1)   ->  Index Scan using artists_pkey on artists a  (cost=0.00..2.01 rows=1)
 

-- Dave



Re: Optimization via explicit JOINs

From
Stephan Szabo
Date:
On Fri, 9 Mar 2001, David Olbersen wrote:

> Greetings,
>   I've been toying aroudn with postgres 7.1beta5's ability to control the
>   planner via explicitely JOINing tables. I then (just for giggles) compare the
>   difference in the EXPLAIN results.
> 
>   I'm no super-mondo-DBA or anything, but in my two attempts so far, the numbers
>   I get out of EXPLAIN have been about 1/2 as small.
> 
>   Below are two EXPLAIN results, am I correct in reading that one is indeed
>   "twice as fast" as the other? I say twice as fast because the top-most cost in
>   the first query is 58.62, but in the second one it's only 32.09. Am I reading
>   this correctly?

Not entirely.  Those are only estimates, so they don't entirely line up
with reality.  Also, I notice the first estimates 14 rows and the second
1,  which is probably why the estimate is higher.  In practice it probably
won't be significantly different.



Re: Optimization via explicit JOINs

From
David Olbersen
Date:
On Fri, 9 Mar 2001, Stephan Szabo wrote:

->Not entirely.  Those are only estimates, so they don't entirely line up
->with reality.  Also, I notice the first estimates 14 rows and the second
->1,  which is probably why the estimate is higher.  In practice it probably
->won't be significantly different.

So really I'm just getting back estimations of cost and rows returned?
Incidentally, both queries returned the same data set, that's a Good Thing (tm).

-- Dave



Re: Optimization via explicit JOINs

From
Stephan Szabo
Date:
> On Fri, 9 Mar 2001, Stephan Szabo wrote:
> 
> ->Not entirely.  Those are only estimates, so they don't entirely line up
> ->with reality.  Also, I notice the first estimates 14 rows and the second
> ->1,  which is probably why the estimate is higher.  In practice it probably
> ->won't be significantly different.
> 
> So really I'm just getting back estimations of cost and rows returned?
> Incidentally, both queries returned the same data set, that's a Good Thing (tm).

Yeah, explain is mostly ofr showing what it's going to do and a little bit
of why it thinks it's a good idea.  Hmm, what were the two queries anyway?




Re: Optimization via explicit JOINs

From
David Olbersen
Date:
On Fri, 9 Mar 2001, Stephan Szabo wrote:

-> Hmm, what were the two queries anyway?

The "slower" query
----------------------------
SELECT to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, s.name                as title, a.name
asartist, s.length              as length
 
FROM playlist p, songs    s, artists  a
WHERE p.waiting   = TRUE          AND p.song_id   = s.song_id     AND s.artist_id = a.artist_id
ORDER BY p.item_id

The "faster" query
----------------------------
SELECT to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, s.name                as title, s.length
aslength, a.name                as artist
 
FROM playlist p JOIN songs s USING (song_id), artists  a
WHERE p.waiting   = TRUE          AND p.song_id   = s.song_id     AND s.artist_id = a.artist_id
ORDER BY p.item_id;

Notice how the only difference is in the FROM clause?
-- Dave



Re: Optimization via explicit JOINs

From
Stephan Szabo
Date:
On Fri, 9 Mar 2001, David Olbersen wrote:

> On Fri, 9 Mar 2001, Stephan Szabo wrote:
> 
> -> Hmm, what were the two queries anyway?
> 
> The "slower" query
> ----------------------------
> SELECT
>   to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in,
>   s.name                as title,
>   a.name                as artist,
>   s.length              as length
> FROM
>   playlist p,
>   songs    s,
>   artists  a
> WHERE
>   p.waiting   = TRUE          AND
>   p.song_id   = s.song_id     AND
>   s.artist_id = a.artist_id
> ORDER BY p.item_id
> 
> The "faster" query
> ----------------------------
> SELECT
>   to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in,
>   s.name                as title,
>   s.length              as length,
>   a.name                as artist
> FROM
>   playlist p JOIN songs s USING (song_id),
>   artists  a
> WHERE
>   p.waiting   = TRUE          AND
>   p.song_id   = s.song_id     AND
>   s.artist_id = a.artist_id
> ORDER BY p.item_id;
> 
> Notice how the only difference is in the FROM clause?

Yeah.  It's getting the same plan, just a slightly different number
of estimated rows (14 and 1) from the join of p to s.
As a question, how many rows does
select * from playlist p join songs s using (song_id) where
p.waiting=TRUE;
actually result in?



Re: Optimization via explicit JOINs

From
David Olbersen
Date:
On Fri, 9 Mar 2001, Stephan Szabo wrote:

->As a question, how many rows does
->select * from playlist p join songs s using (song_id) where
->p.waiting=TRUE;
->actually result in?

Well it depends. Most of the time that playlist table is "empty" (no rows where
waiting = TRUE), however users can (in a round about way) insert into that
table, so that there could be anywhere from 10, to 2,342, to more.

Why do you ask?

(The reason those plans chose 14 was because, at the time, there were 14 rows in
playlist)

-- Dave



Re: Optimization via explicit JOINs

From
Stephan Szabo
Date:
Darn.  Well, one of the queries picked that 1 row was going to survive
the nested loop step and the other said 14.  I was wondering which one
was closer to being correct at that time.

On Fri, 9 Mar 2001, David Olbersen wrote:

> On Fri, 9 Mar 2001, Stephan Szabo wrote:
> 
> ->As a question, how many rows does
> ->select * from playlist p join songs s using (song_id) where
> ->p.waiting=TRUE;
> ->actually result in?
> 
> Well it depends. Most of the time that playlist table is "empty" (no rows where
> waiting = TRUE), however users can (in a round about way) insert into that
> table, so that there could be anywhere from 10, to 2,342, to more.
> 
> Why do you ask?
> 
> (The reason those plans chose 14 was because, at the time, there were 14 rows in
> playlist)



Re: Optimization via explicit JOINs

From
Stephan Szabo
Date:
Actually, just thought of something else.  If you remove
the probably redundant p.song_id=s.song_id from the second 
query (since the join ... using should do that) does it
change the explain output?

On Fri, 9 Mar 2001, David Olbersen wrote:

> On Fri, 9 Mar 2001, Stephan Szabo wrote:
> 
> ->As a question, how many rows does
> ->select * from playlist p join songs s using (song_id) where
> ->p.waiting=TRUE;
> ->actually result in?
> 
> Well it depends. Most of the time that playlist table is "empty" (no rows where
> waiting = TRUE), however users can (in a round about way) insert into that
> table, so that there could be anywhere from 10, to 2,342, to more.
> 
> Why do you ask?
> 
> (The reason those plans chose 14 was because, at the time, there were 14 rows in
> playlist)



Re: Optimization via explicit JOINs

From
Tom Lane
Date:
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes:
> Actually, just thought of something else.  If you remove
> the probably redundant p.song_id=s.song_id from the second 
> query (since the join ... using should do that) does it
> change the explain output?

I was just about to point that out.  The WHERE clause *is* redundant
with the JOIN ... USING clause, but the planner will not recognize that,
and accordingly will multiply the estimated selectivity of the two
clauses together.  So the output row count for the JOIN form is
misleadingly small.  If you remove the redundant WHERE clause then you
should get identical planning estimates for both forms of the query.

The planner does actually recognize and discard duplicate qualifiers
that appear in the same place (eg, WHERE p.song_id=s.song_id AND
p.song_id=s.song_id) but it's not so smart about qualifiers that are
in different parts of the query...
        regards, tom lane