Re: Using bitmap index scans-more efficient - Mailing list pgsql-sql

From Kyle Bateman
Subject Re: Using bitmap index scans-more efficient
Date
Msg-id 44E2438F.8000105@actarg.com
Whole thread Raw
In response to Re: Using bitmap index scans-more efficient  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-sql
Tom Lane wrote:

>Kyle Bateman <kyle@actarg.com> writes:
>  
>
>>But I'm assuming that using an interval-encoded project tree, I would 
>>have to do something like the following to get a progency group:
>>    
>>
>
>  
>
>>select * from ledger l, proj p where p.proj_id = l.proj and p.left > 
>>1234 and p.right < 2345;
>>    
>>
>
>btree has no idea about the constraint (that I imagine exists) that left
><= right.  If you're just doing a simple index on (left, right) then the
>above query requires scanning all index entries with left > 1234.  It
>would probably help to say
>
>select * from ledger l, proj p where p.proj_id = l.proj and
>  p.left > 1234 and p.left < 2345 and p.right < 2345;
>
>so that you can constrain the range of "left" values scanned.
>  
>
Thanks for the replies, Tom and Florian.

My problem is not that it is difficult (or costly) to determine the 
progeny of a given project.  I can determine this in about 90 msec 
regardless of whether I use an adjacency model, interval-encoding, or 
materialized path (current implementation).  The problem is, when I try 
to extract the ledger entries belonging to that progeny from a set of a 
million records, it seems to need to process all million records rather 
than being able to index right into the ones I want.

I'm not very good at reading explain output, but I tried to set up the 
query Tom suggests by creating an interval-encoded project table 
(proj_int) and then joining it to my ledger like so:

select l.* from ledger l, proj_int i where l.proj = i.proj_id and i.lft >= 5283 and i.lft < 5300 and i.rgt <= 5300;

On my mini-test-ledger of 100,000 entries, this takes the longest time 
(5 seconds) with the following explain output:
Hash Join  (cost=19018.46..23411.52 rows=14 width=85)  Hash Cond: ("outer".proj = "inner".proj_id)  ->  Nested Loop
LeftJoin  (cost=18994.38..23378.41 rows=1700 width=85)        ->  Hash Join  (cost=18990.84..23340.87 rows=1700
width=81)             Hash Cond: ("outer".vendid = "inner".org_id)              ->  Merge Join
(cost=18935.35..23255.64rows=1700 width=63)                    Merge Cond: (("outer".vendid = "inner".vendid) AND 
 
(("outer".invnum)::text = "inner"."?column10?"))                    ->  Index Scan using apinv_hdr_pkey on apinv_hdr h

(cost=0.00..3148.16 rows=51016 width=21)                    ->  Sort  (cost=18935.35..19235.45 rows=120041 
width=55)                          Sort Key: i.vendid, (i.invnum)::text                          ->  Seq Scan on
apinv_itemsi  
 
(cost=0.00..4152.99 rows=120041 width=55)                                Filter: ((status = 'en'::bpchar) OR 
(status = 'cl'::bpchar) OR (status = 'pd'::bpchar))              ->  Hash  (cost=50.99..50.99 rows=1799 width=26)
            ->  Seq Scan on vend_org v  (cost=0.00..50.99 
 
rows=1799 width=26)        ->  Materialize  (cost=3.54..3.55 rows=1 width=4)              ->  Seq Scan on acct a
(cost=0.00..3.54rows=1 width=4)                    Filter: ((code)::text = 'ap'::text)  ->  Hash  (cost=24.06..24.06
rows=10width=4)        ->  Bitmap Heap Scan on proj_int i  (cost=2.26..24.06 rows=10 
 
width=4)              Recheck Cond: ((lft >= 5283) AND (lft < 5300) AND (rgt <= 
5300))              ->  Bitmap Index Scan on i_proj_int_lft_rgt  
(cost=0.00..2.26 rows=10 width=0)                    Index Cond: ((lft >= 5283) AND (lft < 5300) AND 
(rgt <= 5300))

That is roughly equivalent to my materialized path method:

select l.* from ledger l where l.proj    in (select proj_id from proj_v 
where 4737 = any(ppath));


And is quite slow compared to 150 msec when enumerating the progeny 
projects like so:

select l.* from ledger l where l.proj in 
(4737,4789,4892,4893,4894,4895,4933,4934,4935);
Nested Loop Left Join  (cost=19.73..4164.10 rows=7 width=85)  ->  Nested Loop  (cost=19.73..4139.08 rows=7 width=81)
   ->  Nested Loop  (cost=19.73..4100.07 rows=7 width=63)              ->  Bitmap Heap Scan on apinv_items i  
 
(cost=19.73..1185.71 rows=487 width=55)                    Recheck Cond: ((proj = 4737) OR (proj = 4789) OR 
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR 
(proj = 4933) OR (proj = 4934) OR (proj = 4935))                    Filter: ((status = 'en'::bpchar) OR (status = 
'cl'::bpchar) OR (status = 'pd'::bpchar))                    ->  BitmapOr  (cost=19.73..19.73 rows=495 width=0)
                ->  Bitmap Index Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4737)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4789)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4892)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4893)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4894)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4895)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4933)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4934)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4935)              ->  Index Scan
usingapinv_hdr_pkey on apinv_hdr h  
 
(cost=0.00..5.97 rows=1 width=21)                    Index Cond: (("outer".vendid = h.vendid) AND 
(("outer".invnum)::text = (h.invnum)::text))        ->  Index Scan using vend_org_pkey on vend_org v  
(cost=0.00..5.56 rows=1 width=26)              Index Cond: (v.org_id = "outer".vendid) Nested Loop Left 
Join  (cost=19.73..4164.10 rows=7 width=85)  ->  Nested Loop  (cost=19.73..4139.08 rows=7 width=81)        ->  Nested
Loop (cost=19.73..4100.07 rows=7 width=63)              ->  Bitmap Heap Scan on apinv_items i  
 
(cost=19.73..1185.71 rows=487 width=55)                    Recheck Cond: ((proj = 4737) OR (proj = 4789) OR 
(proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895) OR 
(proj = 4933) OR (proj = 4934) OR (proj = 4935))                    Filter: ((status = 'en'::bpchar) OR (status = 
'cl'::bpchar) OR (status = 'pd'::bpchar))                    ->  BitmapOr  (cost=19.73..19.73 rows=495 width=0)
                ->  Bitmap Index Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4737)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4789)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4892)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4893)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4894)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4895)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4933)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4934)                          ->
BitmapIndex Scan on i_apinv_items_proj  
 
(cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4935)              ->  Index Scan
usingapinv_hdr_pkey on apinv_hdr h  
 
(cost=0.00..5.97 rows=1 width=21)                    Index Cond: (("outer".vendid = h.vendid) AND 
(("outer".invnum)::text = (h.invnum)::text))        ->  Index Scan using vend_org_pkey on vend_org v  
(cost=0.00..5.56 rows=1 width=26)              Index Cond: (v.org_id = "outer".vendid)  ->  Seq Scan on acct a
(cost=0.00..3.54rows=1 width=4)        Filter: ((code)::text = 'ap'::text)  ->  Seq Scan on acct a  (cost=0.00..3.54
rows=1width=4)        Filter: ((code)::text = 'ap'::text)
 

I'm still stumped...



pgsql-sql by date:

Previous
From: "Aaron Bono"
Date:
Subject: Re: The Right Way to manage schemas in SCM systems
Next
From: Kyle Bateman
Date:
Subject: Re: Using bitmap index scans-more efficient