Re: Using bitmap index scans-more efficient - Mailing list pgsql-sql
| From | Kyle Bateman |
|---|---|
| Subject | Re: Using bitmap index scans-more efficient |
| Date | |
| Msg-id | 44E2438F.8000105@actarg.com Whole thread Raw |
| In response to | Re: Using bitmap index scans-more efficient (Tom Lane <tgl@sss.pgh.pa.us>) |
| List | pgsql-sql |
Tom Lane wrote:
>Kyle Bateman <kyle@actarg.com> writes:
>
>
>>But I'm assuming that using an interval-encoded project tree, I would
>>have to do something like the following to get a progency group:
>>
>>
>
>
>
>>select * from ledger l, proj p where p.proj_id = l.proj and p.left >
>>1234 and p.right < 2345;
>>
>>
>
>btree has no idea about the constraint (that I imagine exists) that left
><= right. If you're just doing a simple index on (left, right) then the
>above query requires scanning all index entries with left > 1234. It
>would probably help to say
>
>select * from ledger l, proj p where p.proj_id = l.proj and
> p.left > 1234 and p.left < 2345 and p.right < 2345;
>
>so that you can constrain the range of "left" values scanned.
>
>
Thanks for the replies, Tom and Florian.
My problem is not that it is difficult (or costly) to determine the
progeny of a given project. I can determine this in about 90 msec
regardless of whether I use an adjacency model, interval-encoding, or
materialized path (current implementation). The problem is, when I try
to extract the ledger entries belonging to that progeny from a set of a
million records, it seems to need to process all million records rather
than being able to index right into the ones I want.
I'm not very good at reading explain output, but I tried to set up the
query Tom suggests by creating an interval-encoded project table
(proj_int) and then joining it to my ledger like so:
select l.* from ledger l, proj_int i where l.proj = i.proj_id and i.lft >= 5283 and i.lft < 5300 and i.rgt <= 5300;
On my mini-test-ledger of 100,000 entries, this takes the longest time
(5 seconds) with the following explain output:
Hash Join (cost=19018.46..23411.52 rows=14 width=85) Hash Cond: ("outer".proj = "inner".proj_id) -> Nested Loop
LeftJoin (cost=18994.38..23378.41 rows=1700 width=85) -> Hash Join (cost=18990.84..23340.87 rows=1700
width=81) Hash Cond: ("outer".vendid = "inner".org_id) -> Merge Join
(cost=18935.35..23255.64rows=1700 width=63) Merge Cond: (("outer".vendid = "inner".vendid) AND
(("outer".invnum)::text = "inner"."?column10?")) -> Index Scan using apinv_hdr_pkey on apinv_hdr h
(cost=0.00..3148.16 rows=51016 width=21) -> Sort (cost=18935.35..19235.45 rows=120041
width=55) Sort Key: i.vendid, (i.invnum)::text -> Seq Scan on
apinv_itemsi
(cost=0.00..4152.99 rows=120041 width=55) Filter: ((status = 'en'::bpchar) OR
(status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) -> Hash (cost=50.99..50.99 rows=1799 width=26)
-> Seq Scan on vend_org v (cost=0.00..50.99
rows=1799 width=26) -> Materialize (cost=3.54..3.55 rows=1 width=4) -> Seq Scan on acct a
(cost=0.00..3.54rows=1 width=4) Filter: ((code)::text = 'ap'::text) -> Hash (cost=24.06..24.06
rows=10width=4) -> Bitmap Heap Scan on proj_int i (cost=2.26..24.06 rows=10
width=4) Recheck Cond: ((lft >= 5283) AND (lft < 5300) AND (rgt <=
5300)) -> Bitmap Index Scan on i_proj_int_lft_rgt
(cost=0.00..2.26 rows=10 width=0) Index Cond: ((lft >= 5283) AND (lft < 5300) AND
(rgt <= 5300))
That is roughly equivalent to my materialized path method:
select l.* from ledger l where l.proj in (select proj_id from proj_v
where 4737 = any(ppath));
And is quite slow compared to 150 msec when enumerating the progeny
projects like so:
select l.* from ledger l where l.proj in
(4737,4789,4892,4893,4894,4895,4933,4934,4935);
Nested Loop Left Join (cost=19.73..4164.10 rows=7 width=85) -> Nested Loop (cost=19.73..4139.08 rows=7 width=81)
-> Nested Loop (cost=19.73..4100.07 rows=7 width=63) -> Bitmap Heap Scan on apinv_items i
(cost=19.73..1185.71 rows=487 width=55) Recheck Cond: ((proj = 4737) OR (proj = 4789) OR
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR
(proj = 4933) OR (proj = 4934) OR (proj = 4935)) Filter: ((status = 'en'::bpchar) OR (status =
'cl'::bpchar) OR (status = 'pd'::bpchar)) -> BitmapOr (cost=19.73..19.73 rows=495 width=0)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4737) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4789) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4892) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4893) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4894) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4895) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4933) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4934) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4935) -> Index Scan
usingapinv_hdr_pkey on apinv_hdr h
(cost=0.00..5.97 rows=1 width=21) Index Cond: (("outer".vendid = h.vendid) AND
(("outer".invnum)::text = (h.invnum)::text)) -> Index Scan using vend_org_pkey on vend_org v
(cost=0.00..5.56 rows=1 width=26) Index Cond: (v.org_id = "outer".vendid) Nested Loop Left
Join (cost=19.73..4164.10 rows=7 width=85) -> Nested Loop (cost=19.73..4139.08 rows=7 width=81) -> Nested
Loop (cost=19.73..4100.07 rows=7 width=63) -> Bitmap Heap Scan on apinv_items i
(cost=19.73..1185.71 rows=487 width=55) Recheck Cond: ((proj = 4737) OR (proj = 4789) OR
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR
(proj = 4933) OR (proj = 4934) OR (proj = 4935)) Filter: ((status = 'en'::bpchar) OR (status =
'cl'::bpchar) OR (status = 'pd'::bpchar)) -> BitmapOr (cost=19.73..19.73 rows=495 width=0)
-> Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4737) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4789) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4892) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4893) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4894) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4895) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4933) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4934) ->
BitmapIndex Scan on i_apinv_items_proj
(cost=0.00..2.19 rows=55 width=0) Index Cond: (proj = 4935) -> Index Scan
usingapinv_hdr_pkey on apinv_hdr h
(cost=0.00..5.97 rows=1 width=21) Index Cond: (("outer".vendid = h.vendid) AND
(("outer".invnum)::text = (h.invnum)::text)) -> Index Scan using vend_org_pkey on vend_org v
(cost=0.00..5.56 rows=1 width=26) Index Cond: (v.org_id = "outer".vendid) -> Seq Scan on acct a
(cost=0.00..3.54rows=1 width=4) Filter: ((code)::text = 'ap'::text) -> Seq Scan on acct a (cost=0.00..3.54
rows=1width=4) Filter: ((code)::text = 'ap'::text)
I'm still stumped...