Thread: Can someone explain the problem with this select
Allow me to demonstrate my pitiful SQL knowledge I have tables documents and comments If I run join and list doc_nums the query is quite fast If I run join and use subselect the query is extremely slow Can someone offer analysis Thanks Richard dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10; QUERYPLAN ---------------------------------------------------------------------------------------------------------------------- Limit (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 rows=10 loops=1) -> Seq Scan on documents (cost=0.00..909333.85 rows=9014885 width=13) (actual time=0.008..0.027 rows=10 loops=1) Total runtime: 0.125 ms (3 rows) dcc=# dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on (documents.doc_num = comments.doc_num) where documents.doc_num in ('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865'); QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ HashLeft Join (cost=21.23..61.54 rows=10 width=444) (actual time=0.507..0.574 rows=10 loops=1) Hash Cond: ("outer".doc_num = "inner".doc_num) -> Bitmap Heap Scan on documents (cost=20.03..60.28 rows=10 width=361) (actual time=0.397..0.432 rows=10 loops=1) Recheck Cond: ((doc_num = '105364107'::bpchar) OR (doc_num= '105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = '105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = '102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = '014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = '105419865'::bpchar)) -> BitmapOr (cost=20.03..20.03 rows=10 width=0) (actual time=0.383..0.383 rows=0 loops=1) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.059..0.059 rows=1 loops=1) Index Cond: (doc_num = '105364107'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.039..0.039 rows=1 loops=1) Index Cond: (doc_num = '105513059'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1) Index Cond: (doc_num = '105513095'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: (doc_num = '105513112'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 loops=1) Index Cond: (doc_num = '105585627'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1) Index Cond: (doc_num = '102933195'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 loops=1) Index Cond: (doc_num = '014650340'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: (doc_num = '014650361'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 loops=1) Index Cond: (doc_num = '014650362'::bpchar) -> Bitmap Index Scan on documents_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.035..0.035 rows=1 loops=1) Index Cond: (doc_num = '105419865'::bpchar) -> Hash (cost=1.16..1.16 rows=16 width=83) (actual time=0.080..0.080 rows=16 loops=1) -> Seq Scan on comments (cost=0.00..1.16 rows=16 width=83) (actual time=0.005..0.037 rows=16 loops=1) Total runtime: 0.775 ms (28 rows) dcc=# dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on (documents.doc_num = comments.doc_num) where documents.doc_num in (select doc_num from documents limit 10); QUERY PLAN -------------------------------------------------------------------------------- -------------------------------------------------------------------------------- -- Merge IN Join (cost=100000002.19..136154797.93 rows=10 width=654) (actual time =23.534..2216180.550 rows=10 loops=1) Merge Cond: ("outer".doc_num = "inner".doc_num) -> Merge Left Join (cost=0.00..36129585.92rows=10083868 width=654) (actual time=23.239..2188733.430 rows=6696218 loops=1) Merge Cond: ("outer".doc_num = "inner".doc_num) -> Index Scan using documents_pkey on documents (cost=0.00..35723277. 60 rows=10083868 width=569) (actual time=6.845..2107300.767 rows=6695853 loops=1 ) -> Index Scan using doc_num_idx on comments (cost=0.00..377203.50 row s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1) -> Sort (cost=100000002.19..100000002.22 rows=10width=13) (actual time=0.2 52..0.293 rows=10 loops=1) Sort Key: "IN_subquery".doc_num -> Limit (cost=100000000.00..100000001.92 rows=10width=13) (actual t ime=0.019..0.128 rows=10 loops=1) -> Seq Scan on documents (cost=100000000.00..101940460.68 rows= 10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1) Total runtime: 2216180.973 ms (11 rows) dcc=#
Richard Ray <rray@mstc.state.ms.us> writes: > dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on > (documents.doc_num = comments.doc_num) where documents.doc_num in (select > doc_num from documents limit 10); > [ is slow ] This isn't your fault, it's an optimizer limitation: PG < 8.2 can't reorder outer joins relative to regular joins, and the IN with a sub-select is a kind of regular join. So it's forming the whole outer-join result and then joining to the sub-select :-( This is fixed in 8.2, released today, so perhaps upgrading is the thing for you to do. Alternatively, you can contort the query to get the IN restriction inside the outer join: select * from (select * from documents where documents.doc_num in (select doc_num from documents limit 10)) ss left outerjoin comments on (ss.doc_num = comments.doc_num); regards, tom lane
> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on > (documents.doc_num = comments.doc_num) where documents.doc_num in (select > doc_num from documents limit 10); This query is preforming the join on all records of your two tables. After all of the that exhaustive work is done, it this filter out the records you want. you should preform a filtered select first and then use those results in you left join. I guess the lesson you can learn from this example is that you should try to filter your data set to get it as small as possible before you do anything else with it. select * from ( select doc_num from documents limit 10 ) as D1 left outer join comments on (D1.doc_num = comments.doc_num) ; Regards, Richard Broersma Jr.
I've been foolin with this for a couple of days Sometimes you just have to ask Thanks gentlemen On Tue, 5 Dec 2006, Tom Lane wrote: > Richard Ray <rray@mstc.state.ms.us> writes: >> dcc=# EXPLAIN ANALYZE select * from documents left outer join comments on >> (documents.doc_num = comments.doc_num) where documents.doc_num in (select >> doc_num from documents limit 10); >> [ is slow ] > > This isn't your fault, it's an optimizer limitation: PG < 8.2 can't > reorder outer joins relative to regular joins, and the IN with a > sub-select is a kind of regular join. So it's forming the whole > outer-join result and then joining to the sub-select :-( > > This is fixed in 8.2, released today, so perhaps upgrading is the > thing for you to do. Alternatively, you can contort the query to > get the IN restriction inside the outer join: > > select * from > (select * from documents > where documents.doc_num in (select doc_num from documents limit 10)) ss > left outer join comments on (ss.doc_num = comments.doc_num); > > regards, tom lane >
Hey Ray, I'm by no means a guru but here is my simple analysis. In the first query, the 10 "documents" specified 'IN' the in are first selected from the "documents" table. Then, those 10 rows are joined with the "comments" table. In the second query, every row in the "documents" table is joined with the "comments" table, which took forever according to the Explain Analyse. Then, the results of the complete join of those two tables from are compared against the sub-select. I'm guessing (emphasis on guessing) that the query planner chose this approach because it does not know how many rows the sub-select will return so it does the join and then checks those results against that sub-select. Another approach may be to do this. Create Temp table limit_documents as select doc_num from documents limit 10; select * from limit_documents left outer join comments on (limit_documents.doc_num = comments.doc_num); Hope that helps, Ted Richard Ray wrote: > Allow me to demonstrate my pitiful SQL knowledge > I have tables documents and comments > If I run join and list doc_nums the query is quite fast > If I run join and use subselect the query is extremely slow > Can someone offer analysis > > Thanks > Richard > > dcc=# EXPLAIN ANALYZE select doc_num from documents limit 10; > QUERY PLAN > ---------------------------------------------------------------------------------------------------------------------- > > Limit (cost=0.00..1.01 rows=10 width=13) (actual time=0.013..0.061 > rows=10 loops=1) > -> Seq Scan on documents (cost=0.00..909333.85 rows=9014885 > width=13) (actual time=0.008..0.027 rows=10 loops=1) > Total runtime: 0.125 ms > (3 rows) > > dcc=# > > > > dcc=# EXPLAIN ANALYZE select * from documents left outer join > comments on (documents.doc_num = comments.doc_num) where > documents.doc_num in > ('105364107','105513059','105513095','105513112','105585627','102933195','014650340','014650361','014650362','105419865'); > > > QUERY PLAN > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ > > Hash Left Join (cost=21.23..61.54 rows=10 width=444) (actual > time=0.507..0.574 rows=10 loops=1) > Hash Cond: ("outer".doc_num = "inner".doc_num) > -> Bitmap Heap Scan on documents (cost=20.03..60.28 rows=10 > width=361) (actual time=0.397..0.432 rows=10 loops=1) > Recheck Cond: ((doc_num = '105364107'::bpchar) OR (doc_num = > '105513059'::bpchar) OR (doc_num = '105513095'::bpchar) OR (doc_num = > '105513112'::bpchar) OR (doc_num = '105585627'::bpchar) OR (doc_num = > '102933195'::bpchar) OR (doc_num = '014650340'::bpchar) OR (doc_num = > '014650361'::bpchar) OR (doc_num = '014650362'::bpchar) OR (doc_num = > '105419865'::bpchar)) > -> BitmapOr (cost=20.03..20.03 rows=10 width=0) (actual > time=0.383..0.383 rows=0 loops=1) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.059..0.059 rows=1 > loops=1) > Index Cond: (doc_num = '105364107'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.039..0.039 rows=1 > loops=1) > Index Cond: (doc_num = '105513059'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 > loops=1) > Index Cond: (doc_num = '105513095'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 > loops=1) > Index Cond: (doc_num = '105513112'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.034..0.034 rows=1 > loops=1) > Index Cond: (doc_num = '105585627'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 > loops=1) > Index Cond: (doc_num = '102933195'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.036..0.036 rows=1 > loops=1) > Index Cond: (doc_num = '014650340'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 > loops=1) > Index Cond: (doc_num = '014650361'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.031..0.031 rows=1 > loops=1) > Index Cond: (doc_num = '014650362'::bpchar) > -> Bitmap Index Scan on documents_pkey > (cost=0.00..2.00 rows=1 width=0) (actual time=0.035..0.035 rows=1 > loops=1) > Index Cond: (doc_num = '105419865'::bpchar) > -> Hash (cost=1.16..1.16 rows=16 width=83) (actual > time=0.080..0.080 rows=16 loops=1) > -> Seq Scan on comments (cost=0.00..1.16 rows=16 width=83) > (actual time=0.005..0.037 rows=16 loops=1) > Total runtime: 0.775 ms > (28 rows) > > dcc=# > > > > dcc=# EXPLAIN ANALYZE select * from documents left outer join comments > on (documents.doc_num = comments.doc_num) where documents.doc_num in > (select doc_num from documents limit 10); > > QUERY PLAN > > -------------------------------------------------------------------------------- > > -------------------------------------------------------------------------------- > > -- > Merge IN Join (cost=100000002.19..136154797.93 rows=10 width=654) > (actual time > =23.534..2216180.550 rows=10 loops=1) > Merge Cond: ("outer".doc_num = "inner".doc_num) > -> Merge Left Join (cost=0.00..36129585.92 rows=10083868 > width=654) (actual > time=23.239..2188733.430 rows=6696218 loops=1) > Merge Cond: ("outer".doc_num = "inner".doc_num) > -> Index Scan using documents_pkey on documents > (cost=0.00..35723277. > 60 rows=10083868 width=569) (actual time=6.845..2107300.767 > rows=6695853 loops=1 > ) > -> Index Scan using doc_num_idx on comments > (cost=0.00..377203.50 row > s=311612 width=85) (actual time=16.368..6984.365 rows=243797 loops=1) > -> Sort (cost=100000002.19..100000002.22 rows=10 width=13) > (actual time=0.2 > 52..0.293 rows=10 loops=1) > Sort Key: "IN_subquery".doc_num > -> Limit (cost=100000000.00..100000001.92 rows=10 width=13) > (actual t > ime=0.019..0.128 rows=10 loops=1) > -> Seq Scan on documents > (cost=100000000.00..101940460.68 rows= > 10083868 width=13) (actual time=0.010..0.055 rows=10 loops=1) > Total runtime: 2216180.973 ms > (11 rows) > > dcc=# > > ---------------------------(end of broadcast)--------------------------- > TIP 9: In versions below 8.0, the planner will ignore your desire to > choose an index scan if your joining column's datatypes do not > match > -- *Edward Allen* Software Engineer Black Duck Software, Inc. tallen@blackducksoftware.com <mailto:tallen@blackducksoftware.com> T +1.781.891.5100 x133 F +1.781.891.5145 http://www.blackducksoftware.com