Using bitmap index scans-more efficient - Mailing list pgsql-sql
From | Kyle Bateman |
---|---|
Subject | Using bitmap index scans-more efficient |
Date | |
Msg-id | 44DF6582.4000806@actarg.com Whole thread Raw |
Responses |
Re: Using bitmap index scans-more efficient
|
List | pgsql-sql |
How can I use bitmap index scans more effectively? (version 8.1.0) I have a financial ledger (actually a view, grouping several other tables) containing about a million records. Each record contains an account code and a project code. I can query for all the transactions belonging to any single project code and it is very fast and efficient (milliseconds/project). But projects are organized in a hierarchical structure, so I also need to query the ledger for transactions belonging to a particular project and/or all its progeny. Depending on the method, this is taking several seconds to several minutes per project. For testing purposes, I'll present results using a smaller version of the ledger with the following query times: It is most efficient to enumerate the group of projects using "in" (0.144 seconds) select * from ledger where proj in (4737,4789,4892,4893,4894,4895,4933,4934,4935); ---------------------------------------------------------------------------Nested Loop Left Join (cost=19.73..4164.10 rows=7width=85) -> Nested Loop (cost=19.73..4139.08 rows=7 width=81) -> Nested Loop (cost=19.73..4100.07 rows=7width=63) -> Bitmap Heap Scan on apinv_items i (cost=19.73..1185.71 rows=487 width=55) Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR(proj = 4933) OR (proj = 4934 ) OR (proj = 4935)) Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> BitmapOr (cost=19.73..19.73 rows=495 width=0) -> Bitmap IndexScan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4737) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4789) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19rows=55 width=0) Index Cond: (proj = 4892) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj= 4893) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4894) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4895) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4933) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55width=0) Index Cond: (proj = 4934) -> Bitmap Index Scan oni_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4935) -> Index Scan using apinv_hdr_pkey on apinv_hdr h (cost=0.00..5.97 rows=1 width=21) Index Cond:(("outer".vendid = h.vendid) AND (("outer".invnum)::text = (h.invnum)::text)) -> Index Scan using vend_org_pkeyon vend_org v (cost=0.00..5.56 rows=1 width=26) Index Cond: (v.org_id = "outer".vendid) -> SeqScan on acct a (cost=0.00..3.54 rows=1 width=4) Filter: ((code)::text = 'ap'::text) --------------------------------------------------------------------------- Problem is, the project list has to be hard-coded into the SQL statement. What I really need is transactions belonging to "project 4737 and all its progeny." So I've tried using a many-to-many table proj_prog that describes which projects are progeny of which other projects. Unfortunately, the query time then goes up by a factor of 6 (to 0.85 seconds). Examples: select * from ledger where proj = any (array(select prog_id from proj_prog where proj_id = 4737)); select * fromledger where proj = any (array[4737,4789,4892,4893,4894,4895,4933,4934,4935]);" ---------------------------------------------------------------------------Nested Loop Left Join (cost=13584.99..17647.39rows=850 width=85) InitPlan -> Index Scan using proj_prog_pkey on proj_prog (cost=0.00..38.04rows=21 width=4) Index Cond: (proj_id = 4737) -> Merge Join (cost=13543.42..17565.44 rows=850width=81) Merge Cond: ("outer".vendid = "inner".org_id) -> Merge Join (cost=13543.42..17405.05 rows=850width=63) Merge Cond: (("outer".vendid = "inner".vendid) AND (("outer".invnum)::text = "inner"."?column10?")) -> Index Scan using apinv_hdr_pkey on apinv_hdr h (cost=0.00..3148.16 rows=51016 width=21) -> Sort (cost=13543.42..13693.47 rows=60020 width=55) Sort Key: i.vendid, (i.invnum)::text -> Seq Scan on apinv_items i (cost=0.00..7197.27 rows=60020 width=55) Filter: (((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) AND (proj = ANY ($0))) -> Index Scan using vend_org_pkey on vend_org v (cost=0.00..145.52 rows=1799 width=26) -> Materialize (cost=3.54..3.55rows=1 width=4) -> Seq Scan on acct a (cost=0.00..3.54 rows=1 width=4) Filter: ((code)::text= 'ap'::text) --------------------------------------------------------------------------- The worst case is the following types of queries (about 5 seconds): select * from ledger where proj in (select prog_id from proj_prog where proj_id = 4737); select l.* from ledger l, proj_progp where l.proj = p.prog_id and p.proj_id = 4737; ---------------------------------------------------------------------------Hash Join (cost=19032.47..23510.23 rows=6 width=85) Hash Cond: ("outer".proj = "inner".prog_id) -> Nested Loop Left Join (cost=18994.38..23378.41 rows=1700 width=85) -> Hash Join (cost=18990.84..23340.87 rows=1700 width=81) Hash Cond: ("outer".vendid = "inner".org_id) -> Merge Join (cost=18935.35..23255.64 rows=1700 width=63) Merge Cond: (("outer".vendid= "inner".vendid) AND (("outer".invnum)::text = "inner"."?column10?")) -> Index Scanusing apinv_hdr_pkey on apinv_hdr h (cost=0.00..3148.16 rows=51016 width=21) -> Sort (cost=18935.35..19235.45rows=120041 width=55) Sort Key: i.vendid, (i.invnum)::text -> Seq Scan on apinv_items i (cost=0.00..4152.99 rows=120041 width=55) Filter:((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> Hash (cost=50.99..50.99rows=1799 width=26) -> Seq Scan on vend_org v (cost=0.00..50.99 rows=1799 width=26) -> Materialize (cost=3.54..3.55 rows=1 width=4) -> Seq Scan on acct a (cost=0.00..3.54 rows=1width=4) Filter: ((code)::text = 'ap'::text) -> Hash (cost=38.04..38.04 rows=21 width=4) -> Index Scan using proj_prog_pkey on proj_prog p (cost=0.00..38.04 rows=21 width=4) Index Cond: (proj_id= 4737) --------------------------------------------------------------------------- I would like to be able to get the best performance like in the first query but without having to enumerate the projects (i.e. using a single query). The secret seems to be the bitmap index scans. Any ideas about whether/how this can be done? Thanks! Kyle Bateman --------------------------------------------------------------------------- BTW, The ledger view is built roughly as follows: create view rp_v_api as select h.adate as adate, (i.price * i.quant)::numeric(14,2) as amount, substring(v.org_name from 1 for 40) as descr, i.proj as proj, i.acct as acct, 1 as cr_proj, a.acct_id as cr_acct from ( apinv_hdr h join apinv_items i on i.vendid = h.vendid and i.invnum =h.invnum join vend_org v on v.org_id = h.vendid left join acct a on a.code = 'ap' ) where i.status in ('en','cl','pd');