Thread: Can someone explain the problem with this select

Can someone explain the problem with this select

From
Richard Ray
Date:
Allow me to demonstrate my pitiful SQL knowledge
I have tables documents and comments
If I run join and list doc_nums the query is quite fast
If I run join and use subselect the query is extremely slow
Can someone offer analysis

Thanks
Richard

dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10;
QUERYPLAN
 
----------------------------------------------------------------------------------------------------------------------
Limit (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 
 
rows=10 loops=1)   ->  Seq Scan on documents  (cost=0.00..909333.85 rows=9014885 width=13) 
(actual time=0.008..0.027 rows=10 loops=1) Total runtime: 0.125 ms
(3 rows)

dcc=#



dcc=#  EXPLAIN ANALYZE select * from documents left outer join comments 
on (documents.doc_num = comments.doc_num) where documents.doc_num in 

('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865');

QUERY PLAN

------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
HashLeft Join  (cost=21.23..61.54 rows=10 width=444) (actual 
 
time=0.507..0.574 rows=10 loops=1)   Hash Cond: ("outer".doc_num = "inner".doc_num)   ->  Bitmap Heap Scan on documents
(cost=20.03..60.28 rows=10 
 
width=361) (actual time=0.397..0.432 rows=10 loops=1)         Recheck Cond: ((doc_num = '105364107'::bpchar) OR
(doc_num= 
 
'105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = 
'105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = 
'102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = 
'014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = 
'105419865'::bpchar))         ->  BitmapOr  (cost=20.03..20.03 rows=10 width=0) (actual 
time=0.383..0.383 rows=0 loops=1)               ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=1)                     Index Cond: (doc_num =
'105364107'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.039..0.039 rows=1 loops=1)                     Index Cond: (doc_num =
'105513059'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)                     Index Cond: (doc_num =
'105513095'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)                     Index Cond: (doc_num =
'105513112'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1)                     Index Cond: (doc_num =
'105585627'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1)                     Index Cond: (doc_num =
'102933195'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1)                     Index Cond: (doc_num =
'014650340'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)                     Index Cond: (doc_num =
'014650361'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1)                     Index Cond: (doc_num =
'014650362'::bpchar)              ->  Bitmap Index Scan on documents_pkey  (cost=0.00..2.00 
 
rows=1 width=0) (actual time=0.035..0.035 rows=1 loops=1)                     Index Cond: (doc_num =
'105419865'::bpchar)  ->  Hash  (cost=1.16..1.16 rows=16 width=83) (actual time=0.080..0.080 
 
rows=16 loops=1)         ->  Seq Scan on comments  (cost=0.00..1.16 rows=16 width=83) 
(actual time=0.005..0.037 rows=16 loops=1) Total runtime: 0.775 ms
(28 rows)

dcc=#



dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on 
(documents.doc_num = comments.doc_num) where documents.doc_num in (select 
doc_num from documents limit 10);

QUERY PLAN

--------------------------------------------------------------------------------
--------------------------------------------------------------------------------
-- Merge IN Join  (cost=100000002.19..136154797.93 rows=10 width=654) 
(actual time
=23.534..2216180.550 rows=10 loops=1)   Merge Cond: ("outer".doc_num = "inner".doc_num)   ->  Merge Left Join
(cost=0.00..36129585.92rows=10083868 width=654) 
 
(actual time=23.239..2188733.430 rows=6696218 loops=1)         Merge Cond: ("outer".doc_num = "inner".doc_num)
-> Index Scan using documents_pkey on documents 
 
(cost=0.00..35723277.
60 rows=10083868 width=569) (actual time=6.845..2107300.767 rows=6695853 
loops=1
)         ->  Index Scan using doc_num_idx on comments 
(cost=0.00..377203.50 row
s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1)   ->  Sort  (cost=100000002.19..100000002.22
rows=10width=13) (actual 
 
time=0.2
52..0.293 rows=10 loops=1)         Sort Key: "IN_subquery".doc_num         ->  Limit  (cost=100000000.00..100000001.92
rows=10width=13) 
 
(actual t
ime=0.019..0.128 rows=10 loops=1)               ->  Seq Scan on documents  (cost=100000000.00..101940460.68 
rows=
10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1) Total runtime: 2216180.973 ms
(11 rows)

dcc=#


Re: Can someone explain the problem with this select

From
Tom Lane
Date:
Richard Ray <rray@mstc.state.ms.us> writes:
> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on 
> (documents.doc_num = comments.doc_num) where documents.doc_num in (select 
> doc_num from documents limit 10);
> [ is slow ]

This isn't your fault, it's an optimizer limitation: PG < 8.2 can't
reorder outer joins relative to regular joins, and the IN with a
sub-select is a kind of regular join.  So it's forming the whole
outer-join result and then joining to the sub-select :-(

This is fixed in 8.2, released today, so perhaps upgrading is the
thing for you to do.  Alternatively, you can contort the query to
get the IN restriction inside the outer join:

select * from (select * from documents  where documents.doc_num in (select doc_num from documents limit 10)) ss left
outerjoin comments on (ss.doc_num = comments.doc_num);
 
        regards, tom lane


Re: Can someone explain the problem with this select

From
Richard Broersma Jr
Date:
> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on 
> (documents.doc_num = comments.doc_num) where documents.doc_num in (select 
> doc_num from documents limit 10);

This query is preforming the join on all records of your two tables.  After all of the that
exhaustive work is done, it this filter out the records you want.  you should preform a filtered
select first and then use those results in you left join.  I guess the lesson you can learn from
this example is that you should try to filter your data set to get it as small as possible before
you do anything else with it.

select       *

from       (        select doc_num from documents limit 10       ) as D1
left outer join       comments
on       (D1.doc_num = comments.doc_num)
;

Regards,

Richard Broersma Jr.


Re: Can someone explain the problem with this select

From
Richard Ray
Date:
I've been foolin with this for a couple of days
Sometimes you just have to ask
Thanks gentlemen

On Tue, 5 Dec 2006, Tom Lane wrote:

> Richard Ray <rray@mstc.state.ms.us> writes:
>> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on
>> (documents.doc_num = comments.doc_num) where documents.doc_num in (select
>> doc_num from documents limit 10);
>> [ is slow ]
>
> This isn't your fault, it's an optimizer limitation: PG < 8.2 can't
> reorder outer joins relative to regular joins, and the IN with a
> sub-select is a kind of regular join.  So it's forming the whole
> outer-join result and then joining to the sub-select :-(
>
> This is fixed in 8.2, released today, so perhaps upgrading is the
> thing for you to do.  Alternatively, you can contort the query to
> get the IN restriction inside the outer join:
>
> select * from
>  (select * from documents
>   where documents.doc_num in (select doc_num from documents limit 10)) ss
>  left outer join comments on (ss.doc_num = comments.doc_num);
>
>             regards, tom lane
>


Re: Can someone explain the problem with this select

From
Ted Allen
Date:
Hey Ray,

I'm by no means a guru but here is my simple analysis.  In the first 
query, the 10 "documents" specified 'IN' the in are first selected from 
the "documents" table.  Then, those 10 rows are joined with the 
"comments" table. 

In the second query, every row in the "documents" table is joined with 
the "comments" table, which took forever according to the Explain 
Analyse.  Then, the results of the complete join of those two tables 
from are compared against the sub-select. 

I'm guessing (emphasis on guessing) that the query planner chose this 
approach because it does not know how many rows the sub-select will 
return so it does the join and then checks those results against that 
sub-select.

Another approach may be to do this.

Create Temp table limit_documents as select doc_num from documents limit 10;
select * from  limit_documents left outer join comments on 
(limit_documents.doc_num = comments.doc_num);


Hope that helps,
Ted

Richard Ray wrote:
> Allow me to demonstrate my pitiful SQL knowledge
> I have tables documents and comments
> If I run join and list doc_nums the query is quite fast
> If I run join and use subselect the query is extremely slow
> Can someone offer analysis
>
> Thanks
> Richard
>
> dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10;
>                                                       QUERY PLAN
>
----------------------------------------------------------------------------------------------------------------------
 
>
>  Limit  (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 
> rows=10 loops=1)
>    ->  Seq Scan on documents  (cost=0.00..909333.85 rows=9014885 
> width=13) (actual time=0.008..0.027 rows=10 loops=1)
>  Total runtime: 0.125 ms
> (3 rows)
>
> dcc=#
>
>
>
> dcc=#  EXPLAIN ANALYZE select * from documents left outer join 
> comments on (documents.doc_num = comments.doc_num) where 
> documents.doc_num in 
>
('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865');

>
>
> QUERY PLAN
>
------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------

>
>  Hash Left Join  (cost=21.23..61.54 rows=10 width=444) (actual 
> time=0.507..0.574 rows=10 loops=1)
>    Hash Cond: ("outer".doc_num = "inner".doc_num)
>    ->  Bitmap Heap Scan on documents  (cost=20.03..60.28 rows=10 
> width=361) (actual time=0.397..0.432 rows=10 loops=1)
>          Recheck Cond: ((doc_num = '105364107'::bpchar) OR (doc_num = 
> '105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = 
> '105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = 
> '102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = 
> '014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = 
> '105419865'::bpchar))
>          ->  BitmapOr  (cost=20.03..20.03 rows=10 width=0) (actual 
> time=0.383..0.383 rows=0 loops=1)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.059..0.059 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105364107'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.039..0.039 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513059'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513095'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105513112'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105585627'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '102933195'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650340'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650361'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '014650362'::bpchar)
>                ->  Bitmap Index Scan on documents_pkey  
> (cost=0.00..2.00 rows=1 width=0) (actual time=0.035..0.035 rows=1 
> loops=1)
>                      Index Cond: (doc_num = '105419865'::bpchar)
>    ->  Hash  (cost=1.16..1.16 rows=16 width=83) (actual 
> time=0.080..0.080 rows=16 loops=1)
>          ->  Seq Scan on comments  (cost=0.00..1.16 rows=16 width=83) 
> (actual time=0.005..0.037 rows=16 loops=1)
>  Total runtime: 0.775 ms
> (28 rows)
>
> dcc=#
>
>
>
> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments 
> on (documents.doc_num = comments.doc_num) where documents.doc_num in 
> (select doc_num from documents limit 10);
>
> QUERY PLAN
>
> -------------------------------------------------------------------------------- 
>
> -------------------------------------------------------------------------------- 
>
> -- 
>  Merge IN Join  (cost=100000002.19..136154797.93 rows=10 width=654) 
> (actual time
> =23.534..2216180.550 rows=10 loops=1)
>    Merge Cond: ("outer".doc_num = "inner".doc_num)
>    ->  Merge Left Join  (cost=0.00..36129585.92 rows=10083868 
> width=654) (actual
>  time=23.239..2188733.430 rows=6696218 loops=1)
>          Merge Cond: ("outer".doc_num = "inner".doc_num)
>          ->  Index Scan using documents_pkey on documents 
> (cost=0.00..35723277.
> 60 rows=10083868 width=569) (actual time=6.845..2107300.767 
> rows=6695853 loops=1
> )
>          ->  Index Scan using doc_num_idx on comments 
> (cost=0.00..377203.50 row
> s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1)
>    ->  Sort  (cost=100000002.19..100000002.22 rows=10 width=13) 
> (actual time=0.2
> 52..0.293 rows=10 loops=1)
>          Sort Key: "IN_subquery".doc_num
>          ->  Limit  (cost=100000000.00..100000001.92 rows=10 width=13) 
> (actual t
> ime=0.019..0.128 rows=10 loops=1)
>                ->  Seq Scan on documents  
> (cost=100000000.00..101940460.68 rows=
> 10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1)
>  Total runtime: 2216180.973 ms
> (11 rows)
>
> dcc=#
>
> ---------------------------(end of broadcast)---------------------------
> TIP 9: In versions below 8.0, the planner will ignore your desire to
>       choose an index scan if your joining column's datatypes do not
>       match
>


-- 

*Edward Allen*
Software Engineer
Black Duck Software, Inc.

tallen@blackducksoftware.com <mailto:tallen@blackducksoftware.com>
T +1.781.891.5100 x133
F +1.781.891.5145
http://www.blackducksoftware.com