Re: Using bitmap index scans-more efficient - Mailing list pgsql-sql
From | Kyle Bateman |
---|---|
Subject | Re: Using bitmap index scans-more efficient |
Date | |
Msg-id | 44E2438F.8000105@actarg.com Whole thread Raw |
In response to | Re: Using bitmap index scans-more efficient (Tom Lane <tgl@sss.pgh.pa.us>) |
List | pgsql-sql |
Tom Lane wrote: >Kyle Bateman <kyle@actarg.com> writes: > > >>But I'm assuming that using an interval-encoded project tree, I would >>have to do something like the following to get a progency group: >> >> > > > >>select * from ledger l, proj p where p.proj_id = l.proj and p.left > >>1234 and p.right < 2345; >> >> > >btree has no idea about the constraint (that I imagine exists) that left ><= right. If you're just doing a simple index on (left, right) then the >above query requires scanning all index entries with left > 1234. It >would probably help to say > >select * from ledger l, proj p where p.proj_id = l.proj and > p.left > 1234 and p.left < 2345 and p.right < 2345; > >so that you can constrain the range of "left" values scanned. > > Thanks for the replies, Tom and Florian. My problem is not that it is difficult (or costly) to determine the progeny of a given project. I can determine this in about 90 msec regardless of whether I use an adjacency model, interval-encoding, or materialized path (current implementation). The problem is, when I try to extract the ledger entries belonging to that progeny from a set of a million records, it seems to need to process all million records rather than being able to index right into the ones I want. I'm not very good at reading explain output, but I tried to set up the query Tom suggests by creating an interval-encoded project table (proj_int) and then joining it to my ledger like so: select l.* from ledger l, proj_int i where l.proj = i.proj_id and i.lft >= 5283 and i.lft < 5300 and i.rgt <= 5300; On my mini-test-ledger of 100,000 entries, this takes the longest time (5 seconds) with the following explain output: Hash Join (cost=19018.46..23411.52 rows=14 width=85) Hash Cond: ("outer".proj = "inner".proj_id) -> Nested Loop LeftJoin (cost=18994.38..23378.41 rows=1700 width=85) -> Hash Join (cost=18990.84..23340.87 rows=1700 width=81) Hash Cond: ("outer".vendid = "inner".org_id) -> Merge Join (cost=18935.35..23255.64rows=1700 width=63) Merge Cond: (("outer".vendid = "inner".vendid) AND (("outer".invnum)::text = "inner"."?column10?")) -> Index Scan using apinv_hdr_pkey on apinv_hdr h (cost=0.00..3148.16 rows=51016 width=21) -> Sort (cost=18935.35..19235.45 rows=120041 width=55) Sort Key: i.vendid, (i.invnum)::text -> Seq Scan on apinv_itemsi (cost=0.00..4152.99 rows=120041 width=55) Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> Hash (cost=50.99..50.99 rows=1799 width=26) -> Seq Scan on vend_org v (cost=0.00..50.99 rows=1799 width=26) -> Materialize (cost=3.54..3.55 rows=1 width=4) -> Seq Scan on acct a (cost=0.00..3.54rows=1 width=4) Filter: ((code)::text = 'ap'::text) -> Hash (cost=24.06..24.06 rows=10width=4) -> Bitmap Heap Scan on proj_int i (cost=2.26..24.06 rows=10 width=4) Recheck Cond: ((lft >= 5283) AND (lft < 5300) AND (rgt <= 5300)) -> Bitmap Index Scan on i_proj_int_lft_rgt (cost=0.00..2.26 rows=10 width=0) Index Cond: ((lft >= 5283) AND (lft < 5300) AND (rgt <= 5300)) That is roughly equivalent to my materialized path method: select l.* from ledger l where l.proj in (select proj_id from proj_v where 4737 = any(ppath)); And is quite slow compared to 150 msec when enumerating the progeny projects like so: select l.* from ledger l where l.proj in (4737,4789,4892,4893,4894,4895,4933,4934,4935); Nested Loop Left Join (cost=19.73..4164.10 rows=7 width=85) -> Nested Loop (cost=19.73..4139.08 rows=7 width=81) -> Nested Loop (cost=19.73..4100.07 rows=7 width=63) -> Bitmap Heap Scan on apinv_items i (cost=19.73..1185.71 rows=487 width=55) Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR (proj = 4933) OR (proj = 4934) OR (proj = 4935)) Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> BitmapOr (cost=19.73..19.73 rows=495 width=0) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4737) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4789) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4892) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4893) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4894) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4895) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4933) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4934) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4935) -> Index Scan usingapinv_hdr_pkey on apinv_hdr h (cost=0.00..5.97 rows=1 width=21) Index Cond: (("outer".vendid = h.vendid) AND (("outer".invnum)::text = (h.invnum)::text)) -> Index Scan using vend_org_pkey on vend_org v (cost=0.00..5.56 rows=1 width=26) Index Cond: (v.org_id = "outer".vendid) Nested Loop Left Join (cost=19.73..4164.10 rows=7 width=85) -> Nested Loop (cost=19.73..4139.08 rows=7 width=81) -> Nested Loop (cost=19.73..4100.07 rows=7 width=63) -> Bitmap Heap Scan on apinv_items i (cost=19.73..1185.71 rows=487 width=55) Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR (proj = 4933) OR (proj = 4934) OR (proj = 4935)) Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> BitmapOr (cost=19.73..19.73 rows=495 width=0) -> Bitmap Index Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4737) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4789) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4892) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4893) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4894) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4895) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4933) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4934) -> BitmapIndex Scan on i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4935) -> Index Scan usingapinv_hdr_pkey on apinv_hdr h (cost=0.00..5.97 rows=1 width=21) Index Cond: (("outer".vendid = h.vendid) AND (("outer".invnum)::text = (h.invnum)::text)) -> Index Scan using vend_org_pkey on vend_org v (cost=0.00..5.56 rows=1 width=26) Index Cond: (v.org_id = "outer".vendid) -> Seq Scan on acct a (cost=0.00..3.54rows=1 width=4) Filter: ((code)::text = 'ap'::text) -> Seq Scan on acct a (cost=0.00..3.54 rows=1width=4) Filter: ((code)::text = 'ap'::text) I'm still stumped...