Thread: Optimization via explicit JOINs
Greetings, I've been toying aroudn with postgres 7.1beta5's ability to control the planner via explicitely JOINing tables.I then (just for giggles) compare the difference in the EXPLAIN results. I'm no super-mondo-DBA or anything, but in my two attempts so far, the numbers I get out of EXPLAIN have been about 1/2as small. Below are two EXPLAIN results, am I correct in reading that one is indeed "twice as fast" as the other? I say twice as fastbecause the top-most cost in the first query is 58.62, but in the second one it's only 32.09. Am I reading this correctly? -- First EXPLAIN -- Sort (cost=58.62..58.62 rows=14 width=60) -> Nested Loop (cost=0.00..58.35 rows=14) -> Nested Loop (cost=0.00..29.99rows=14) -> Seq Scan on playlist p (cost=0.00..1.61 rows=14) -> Index Scan using songs_pkey onsongs s (cost=0.00..2.01 rows=1) -> Index Scan using artists_pkey on artists a (cost=0.00..2.01 rows=1) -- Second EXPLAIN -- Sort (cost=32.09..32.09 rows=1) -> Nested Loop (cost=0.00..32.08 rows=1) -> Nested Loop (cost=0.00..30.06 rows=1) -> Seq Scan on playlist p (cost=0.00..1.61 rows=14) -> Index Scan using songs_pkey on songs s (cost=0.00..2.02rows=1) -> Index Scan using artists_pkey on artists a (cost=0.00..2.01 rows=1) -- Dave
On Fri, 9 Mar 2001, David Olbersen wrote: > Greetings, > I've been toying aroudn with postgres 7.1beta5's ability to control the > planner via explicitely JOINing tables. I then (just for giggles) compare the > difference in the EXPLAIN results. > > I'm no super-mondo-DBA or anything, but in my two attempts so far, the numbers > I get out of EXPLAIN have been about 1/2 as small. > > Below are two EXPLAIN results, am I correct in reading that one is indeed > "twice as fast" as the other? I say twice as fast because the top-most cost in > the first query is 58.62, but in the second one it's only 32.09. Am I reading > this correctly? Not entirely. Those are only estimates, so they don't entirely line up with reality. Also, I notice the first estimates 14 rows and the second 1, which is probably why the estimate is higher. In practice it probably won't be significantly different.
On Fri, 9 Mar 2001, Stephan Szabo wrote: ->Not entirely. Those are only estimates, so they don't entirely line up ->with reality. Also, I notice the first estimates 14 rows and the second ->1, which is probably why the estimate is higher. In practice it probably ->won't be significantly different. So really I'm just getting back estimations of cost and rows returned? Incidentally, both queries returned the same data set, that's a Good Thing (tm). -- Dave
> On Fri, 9 Mar 2001, Stephan Szabo wrote: > > ->Not entirely. Those are only estimates, so they don't entirely line up > ->with reality. Also, I notice the first estimates 14 rows and the second > ->1, which is probably why the estimate is higher. In practice it probably > ->won't be significantly different. > > So really I'm just getting back estimations of cost and rows returned? > Incidentally, both queries returned the same data set, that's a Good Thing (tm). Yeah, explain is mostly ofr showing what it's going to do and a little bit of why it thinks it's a good idea. Hmm, what were the two queries anyway?
On Fri, 9 Mar 2001, Stephan Szabo wrote: -> Hmm, what were the two queries anyway? The "slower" query ---------------------------- SELECT to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, s.name as title, a.name asartist, s.length as length FROM playlist p, songs s, artists a WHERE p.waiting = TRUE AND p.song_id = s.song_id AND s.artist_id = a.artist_id ORDER BY p.item_id The "faster" query ---------------------------- SELECT to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, s.name as title, s.length aslength, a.name as artist FROM playlist p JOIN songs s USING (song_id), artists a WHERE p.waiting = TRUE AND p.song_id = s.song_id AND s.artist_id = a.artist_id ORDER BY p.item_id; Notice how the only difference is in the FROM clause? -- Dave
On Fri, 9 Mar 2001, David Olbersen wrote: > On Fri, 9 Mar 2001, Stephan Szabo wrote: > > -> Hmm, what were the two queries anyway? > > The "slower" query > ---------------------------- > SELECT > to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, > s.name as title, > a.name as artist, > s.length as length > FROM > playlist p, > songs s, > artists a > WHERE > p.waiting = TRUE AND > p.song_id = s.song_id AND > s.artist_id = a.artist_id > ORDER BY p.item_id > > The "faster" query > ---------------------------- > SELECT > to_char( p.insertion_time, 'HH:MI AM MM/DD' ) as time_in, > s.name as title, > s.length as length, > a.name as artist > FROM > playlist p JOIN songs s USING (song_id), > artists a > WHERE > p.waiting = TRUE AND > p.song_id = s.song_id AND > s.artist_id = a.artist_id > ORDER BY p.item_id; > > Notice how the only difference is in the FROM clause? Yeah. It's getting the same plan, just a slightly different number of estimated rows (14 and 1) from the join of p to s. As a question, how many rows does select * from playlist p join songs s using (song_id) where p.waiting=TRUE; actually result in?
On Fri, 9 Mar 2001, Stephan Szabo wrote: ->As a question, how many rows does ->select * from playlist p join songs s using (song_id) where ->p.waiting=TRUE; ->actually result in? Well it depends. Most of the time that playlist table is "empty" (no rows where waiting = TRUE), however users can (in a round about way) insert into that table, so that there could be anywhere from 10, to 2,342, to more. Why do you ask? (The reason those plans chose 14 was because, at the time, there were 14 rows in playlist) -- Dave
Darn. Well, one of the queries picked that 1 row was going to survive the nested loop step and the other said 14. I was wondering which one was closer to being correct at that time. On Fri, 9 Mar 2001, David Olbersen wrote: > On Fri, 9 Mar 2001, Stephan Szabo wrote: > > ->As a question, how many rows does > ->select * from playlist p join songs s using (song_id) where > ->p.waiting=TRUE; > ->actually result in? > > Well it depends. Most of the time that playlist table is "empty" (no rows where > waiting = TRUE), however users can (in a round about way) insert into that > table, so that there could be anywhere from 10, to 2,342, to more. > > Why do you ask? > > (The reason those plans chose 14 was because, at the time, there were 14 rows in > playlist)
Actually, just thought of something else. If you remove the probably redundant p.song_id=s.song_id from the second query (since the join ... using should do that) does it change the explain output? On Fri, 9 Mar 2001, David Olbersen wrote: > On Fri, 9 Mar 2001, Stephan Szabo wrote: > > ->As a question, how many rows does > ->select * from playlist p join songs s using (song_id) where > ->p.waiting=TRUE; > ->actually result in? > > Well it depends. Most of the time that playlist table is "empty" (no rows where > waiting = TRUE), however users can (in a round about way) insert into that > table, so that there could be anywhere from 10, to 2,342, to more. > > Why do you ask? > > (The reason those plans chose 14 was because, at the time, there were 14 rows in > playlist)
Stephan Szabo <sszabo@megazone23.bigpanda.com> writes: > Actually, just thought of something else. If you remove > the probably redundant p.song_id=s.song_id from the second > query (since the join ... using should do that) does it > change the explain output? I was just about to point that out. The WHERE clause *is* redundant with the JOIN ... USING clause, but the planner will not recognize that, and accordingly will multiply the estimated selectivity of the two clauses together. So the output row count for the JOIN form is misleadingly small. If you remove the redundant WHERE clause then you should get identical planning estimates for both forms of the query. The planner does actually recognize and discard duplicate qualifiers that appear in the same place (eg, WHERE p.song_id=s.song_id AND p.song_id=s.song_id) but it's not so smart about qualifiers that are in different parts of the query... regards, tom lane