Thread: Query Plan - Bitmap Index Scan and Views
Hi, Is there any inherent benefit of using a the IN operator versus joining a temporary table? Should they offer near equal performance? It appears bitmap scan's aren't done when matching across a small temporary table. I have a temporary table with 5 integers in it that I'm matching against mildly complex view that has 5 joins. I've analyzed the database after the temporary table was created. Matching against the temporary table takes: 36492.836 ms. Matching using the IN operator with the same content takes: 2.732 ms. These measurements are after the query has been run a few times, so the data should be in cache. It would appear that the temporary table's join isn't evaluated deep enough in the query plan to prevent the more expensive joins from running, is there a way for force it? Could some setting be wrong that telling the planner to make this decision? The same thing happens when I perform the join without the view. select * from foo; oid -------- 161007 161008 161000 161009 161002 (5 rows) Plan for IN match: =# explain analyze select * from crawled_url_full_view where crawled_url_full_view.oid in (161007, 161008, 161000, 161009, 161002); QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------- Hash Left Join (cost=14.50..94.11 rows=5 width=538) (actual time=1.025..1.522 rows=5 loops=1) Hash Cond: ("outer".classification_set_id = "inner".id) Join Filter: ("outer".classification_set_id IS NOT NULL) -> Hash Left Join (cost=13.30..92.86 rows=5 width=526) (actual time=0.794..1.251 rows=5 loops=1) Hash Cond: ("outer".charset_id = "inner".id) Join Filter: ("outer".charset_id IS NOT NULL) -> Hash Left Join (cost=12.21..91.70 rows=5 width=515) (actual time=0.631..1.048 rows=5 loops=1) Hash Cond: ("outer".http_error_description_id = "inner".id) Join Filter: ("outer".http_error_description_id IS NOT NULL) -> Hash Left Join (cost=11.13..90.59 rows=5 width=472) (actual time=0.488..0.868 rows=5 loops=1) Hash Cond: ("outer".content_type_id = "inner".id) Join Filter: ("outer".content_type_id IS NOT NULL) -> Nested Loop Left Join (cost=10.02..89.41 rows=5 width=443) (actual time=0.244..0.578 rows=5 loops=1) Join Filter: ("outer".redirect_url_id IS NOT NULL) -> Nested Loop Left Join (cost=10.02..59.56 rows=5 width=339) (actual time=0.225..0.488 rows=5 loops=1) -> Bitmap Heap Scan on crawled_url (cost=10.02..29.71 rows=5 width=235) (actual time=0.170..0.217 rows=5 loops=1) Recheck Cond: ((oid = 161007) OR (oid = 161008) OR (oid = 161000) OR (oid = 161009) OR (oid = 161002)) -> BitmapOr (cost=10.02..10.02 rows=5 width=0) (actual time=0.137..0.137 rows=0 loops=1) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.061..0.061 rows=1 loops=1) Index Cond: (oid = 161007) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (oid = 161008) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.013..0.013 rows=1 loops=1) Index Cond: (oid = 161000) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.014..0.014 rows=1 loops=1) Index Cond: (oid = 161009) -> Bitmap Index Scan on crawled_url_pkey (cost=0.00..2.00 rows=1 width=0) (actual time=0.012..0.012 rows=1 loops=1) Index Cond: (oid = 161002) -> Index Scan using url_pkey on url (cost=0.00..5.96 rows=1 width=108) (actual time=0.031..0.036 rows=1 loops=5) Index Cond: (url.url_id = "outer".url_id) -> Index Scan using url_pkey on url r1 (cost=0.00..5.96 rows=1 width=108) (actual time=0.004..0.004 rows=0 loops=5) Index Cond: (r1.url_id = "outer".redirect_url_id) -> Hash (cost=1.09..1.09 rows=9 width=33) (actual time=0.130..0.130 rows=9 loops=1) -> Seq Scan on content_types (cost=0.00..1.09 rows=9 width=33) (actual time=0.017..0.062 rows=9 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=47) (actual time=0.088..0.088 rows=6 loops=1) -> Seq Scan on http_error_descriptions (cost=0.00..1.06 rows=6 width=47) (actual time=0.010..0.040 rows=6 loops=1) -> Hash (cost=1.08..1.08 rows=8 width=15) (actual time=0.103..0.103 rows=8 loops=1) -> Seq Scan on charsets (cost=0.00..1.08 rows=8 width=15) (actual time=0.011..0.048 rows=8 loops=1) -> Hash (cost=1.16..1.16 rows=16 width=16) (actual time=0.175..0.175 rows=16 loops=1) -> Seq Scan on classification_sets (cost=0.00..1.16 rows=16 width=16) (actual time=0.012..0.088 rows=16 loops=1) Total runtime: 2.743 ms (41 rows) Plan for temp table match: =# explain analyze select * from foo, crawled_url_full_view where crawled_url_full_view.oid = foo.oid; QUERY PLAN ------------------------------------------------------------------------ ------------------------------------------------------------------------ ------------------ Hash IN Join (cost=35667.15..145600.71 rows=5 width=538) (actual time=22371.445..36482.823 rows=5 loops=1) Hash Cond: ("outer".oid = "inner".oid) -> Hash Left Join (cost=35666.09..143698.61 rows=380198 width=538) (actual time=9901.782..35218.758 rows=360531 loops=1) Hash Cond: ("outer".classification_set_id = "inner".id) Join Filter: ("outer".classification_set_id IS NOT NULL) -> Hash Left Join (cost=35664.89..140493.61 rows=380198 width=526) (actual time=9901.456..32363.212 rows=360531 loops=1) Hash Cond: ("outer".charset_id = "inner".id) Join Filter: ("outer".charset_id IS NOT NULL) -> Hash Left Join (cost=35663.79..135684.27 rows=380198 width=515) (actual time=9901.257..29400.189 rows=360531 loops=1) Hash Cond: ("outer".http_error_description_id = "inner".id) Join Filter: ("outer".http_error_description_id IS NOT NULL) -> Hash Left Join (cost=35662.71..133782.19 rows=380198 width=472) (actual time=9901.080..26691.473 rows=360531 loops=1) Hash Cond: ("outer".content_type_id = "inner".id) Join Filter: ("outer".content_type_id IS NOT NULL) -> Hash Left Join (cost=35661.60..128972.84 rows=380198 width=443) (actual time=9900.802..23743.323 rows=360531 loops=1) Hash Cond: ("outer".redirect_url_id = "inner".url_id) Join Filter: ("outer".redirect_url_id IS NOT NULL) -> Hash Left Join (cost=17830.80..66680.80 rows=380198 width=339) (actual time=4592.701..14466.994 rows=360531 loops=1) Hash Cond: ("outer".url_id = "inner".url_id) -> Seq Scan on crawled_url (cost=0.00..10509.98 rows=380198 width=235) (actual time=0.026..2976.911 rows=360531 loops=1) -> Hash (cost=10627.04..10627.04 rows=377104 width=108) (actual time=4591.703..4591.703 rows=382149 loops=1) -> Seq Scan on url (cost=0.00..10627.04 rows=377104 width=108) (actual time=0.041..2142.702 rows=382149 loops=1) -> Hash (cost=10627.04..10627.04 rows=377104 width=108) (actual time=5307.540..5307.540 rows=382149 loops=1) -> Seq Scan on url r1 (cost=0.00..10627.04 rows=377104 width=108) (actual time=0.138..2503.577 rows=382149 loops=1) -> Hash (cost=1.09..1.09 rows=9 width=33) (actual time=0.144..0.144 rows=9 loops=1) -> Seq Scan on content_types (cost=0.00..1.09 rows=9 width=33) (actual time=0.020..0.068 rows=9 loops=1) -> Hash (cost=1.06..1.06 rows=6 width=47) (actual time=0.108..0.108 rows=6 loops=1) -> Seq Scan on http_error_descriptions (cost=0.00..1.06 rows=6 width=47) (actual time=0.015..0.049 rows=6 loops=1) -> Hash (cost=1.08..1.08 rows=8 width=15) (actual time=0.129..0.129 rows=8 loops=1) -> Seq Scan on charsets (cost=0.00..1.08 rows=8 width=15) (actual time=0.014..0.058 rows=8 loops=1) -> Hash (cost=1.16..1.16 rows=16 width=16) (actual time=0.234..0.234 rows=16 loops=1) -> Seq Scan on classification_sets (cost=0.00..1.16 rows=16 width=16) (actual time=0.014..0.107 rows=16 loops=1) -> Hash (cost=1.05..1.05 rows=5 width=4) (actual time=0.092..0.092 rows=5 loops=1) -> Seq Scan on foo (cost=0.00..1.05 rows=5 width=4) (actual time=0.022..0.044 rows=5 loops=1) Total runtime: 36492.836 ms (35 rows) Definition of the view: create view crawled_url_full_view as select crawled_url.*, url.url, r1.url as redirect_url, content_types.type as content_type, http_error_descriptions.error as http_error_description, charsets.name as charset, classification_sets.name as classification_set from crawled_url left join url on url.url_id = crawled_url.url_id left join url as r1 on (r1.url_id = crawled_url.redirect_url_id and crawled_url.redirect_url_id is not null) left join content_types on (content_types.id = crawled_url.content_type_id and crawled_url.content_type_id is not null) left join http_error_descriptions on (http_error_descriptions.id = crawled_url.http_error_description_id and crawled_url.http_error_description_id is not null) left join charsets on (charsets.id = crawled_url.charset_id and crawled_url.charset_id is not null) left join classification_sets on (classification_sets.id = crawled_url.classification_set_id and crawled_url.classification_set_id is not null); Version is: PostgreSQL 8.1.4 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 4.0.1 20050727 (Red Hat 4.0.1-5) work_mem=30000 shared_buffers=5000 effective_cache_size=15000 Thanks for any help, Rusty -- Rusty Conover InfoGears Inc. Web: http://www.infogears.com
Rusty Conover <rconover@infogears.com> writes: > Is there any inherent benefit of using a the IN operator versus > joining a temporary table? Should they offer near equal performance? > It appears bitmap scan's aren't done when matching across a small > temporary table. I believe the problem you're facing is that existing PG releases don't know how to rearrange join order in the face of outer joins, and your view is full of outer joins. So the join against the temp table happens after forming the full output of the view, whereas you desperately need it to happen at the bottom of the join stack. CVS tip (8.2-to-be) has some ability to rearrange outer joins, and I'm interested to know whether it's smart enough to fix your problem. But you have not provided enough info to let someone else duplicate your test case. Would you be willing to download CVS or a recent nightly snapshot and see what it does with your problem? regards, tom lane
On Aug 4, 2006, at 8:15 PM, Tom Lane wrote:
Rusty Conover <rconover@infogears.com> writes:Is there any inherent benefit of using a the IN operator versusjoining a temporary table? Should they offer near equal performance?It appears bitmap scan's aren't done when matching across a smalltemporary table.I believe the problem you're facing is that existing PG releasesdon't know how to rearrange join order in the face of outer joins,and your view is full of outer joins. So the join against the temptable happens after forming the full output of the view, whereas youdesperately need it to happen at the bottom of the join stack.CVS tip (8.2-to-be) has some ability to rearrange outer joins, andI'm interested to know whether it's smart enough to fix your problem.But you have not provided enough info to let someone else duplicateyour test case. Would you be willing to download CVS or a recentnightly snapshot and see what it does with your problem?regards, tom lane
Absolutely, I'll attempt to run the test against the current CVS HEAD.
Do I need to pg_dump and restore from 8.1.4?
What other information would be helpful in the meantime?
Thanks,
Rusty
--
Rusty Conover
InfoGears Inc.
Rusty Conover <rconover@infogears.com> writes: > Absolutely, I'll attempt to run the test against the current CVS HEAD. > Do I need to pg_dump and restore from 8.1.4? Yup, fraid so. regards, tom lane