Using bitmap index scans-more efficient - Mailing list pgsql-sql

From Kyle Bateman
Subject Using bitmap index scans-more efficient
Date
Msg-id 44DF6582.4000806@actarg.com
Whole thread Raw
Responses Re: Using bitmap index scans-more efficient
List pgsql-sql
How can I use bitmap index scans more effectively? (version 8.1.0)

I have a financial ledger (actually a view, grouping several other tables) 
containing about a million records.  Each record contains an account code and 
a project code.  I can query for all the transactions belonging to any single
project code and it is very fast and efficient (milliseconds/project).

But projects are organized in a hierarchical structure, so I also need to 
query the ledger for transactions belonging to a particular project and/or all
its progeny.  Depending on the method, this is taking several seconds to
several minutes per project.

For testing purposes, I'll present results using a smaller version of the 
ledger with the following query times:


It is most efficient to enumerate the group of projects using "in" (0.144 seconds)
 select * from ledger where proj in (4737,4789,4892,4893,4894,4895,4933,4934,4935);

---------------------------------------------------------------------------Nested Loop Left Join  (cost=19.73..4164.10
rows=7width=85)  ->  Nested Loop  (cost=19.73..4139.08 rows=7 width=81)        ->  Nested Loop  (cost=19.73..4100.07
rows=7width=63)              ->  Bitmap Heap Scan on apinv_items i  (cost=19.73..1185.71 rows=487 width=55)
      Recheck Cond: ((proj = 4737) OR (proj = 4789) OR (proj = 4892) OR (proj = 4893) OR (proj = 4894) OR (proj = 4895)
OR(proj = 4933) OR (proj = 4934
 
) OR (proj = 4935))                    Filter: ((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status =
'pd'::bpchar))                   ->  BitmapOr  (cost=19.73..19.73 rows=495 width=0)                          ->  Bitmap
IndexScan on i_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj =
4737)                         ->  Bitmap Index Scan on i_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)
                   Index Cond: (proj = 4789)                          ->  Bitmap Index Scan on i_apinv_items_proj
(cost=0.00..2.19rows=55 width=0)                                Index Cond: (proj = 4892)                          ->
BitmapIndex Scan on i_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)                                Index Cond:
(proj= 4893)                          ->  Bitmap Index Scan on i_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)
                           Index Cond: (proj = 4894)                          ->  Bitmap Index Scan on
i_apinv_items_proj (cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4895)
              ->  Bitmap Index Scan on i_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)
   Index Cond: (proj = 4933)                          ->  Bitmap Index Scan on i_apinv_items_proj  (cost=0.00..2.19
rows=55width=0)                                Index Cond: (proj = 4934)                          ->  Bitmap Index Scan
oni_apinv_items_proj  (cost=0.00..2.19 rows=55 width=0)                                Index Cond: (proj = 4935)
     ->  Index Scan using apinv_hdr_pkey on apinv_hdr h  (cost=0.00..5.97 rows=1 width=21)                    Index
Cond:(("outer".vendid = h.vendid) AND (("outer".invnum)::text = (h.invnum)::text))        ->  Index Scan using
vend_org_pkeyon vend_org v  (cost=0.00..5.56 rows=1 width=26)              Index Cond: (v.org_id = "outer".vendid)  ->
SeqScan on acct a  (cost=0.00..3.54 rows=1 width=4)        Filter: ((code)::text = 'ap'::text)
 
---------------------------------------------------------------------------

Problem is, the project list has to be hard-coded into the SQL statement. 
What I really need is transactions belonging to "project 4737 and all its progeny."
So I've tried using a many-to-many table proj_prog that describes which projects 
are progeny of which other projects.  Unfortunately, the query time then goes up 
by a factor of 6 (to 0.85 seconds).

Examples: select * from ledger where proj = any (array(select prog_id from proj_prog where proj_id = 4737)); select *
fromledger where proj = any (array[4737,4789,4892,4893,4894,4895,4933,4934,4935]);"
 

---------------------------------------------------------------------------Nested Loop Left Join
(cost=13584.99..17647.39rows=850 width=85)  InitPlan    ->  Index Scan using proj_prog_pkey on proj_prog
(cost=0.00..38.04rows=21 width=4)          Index Cond: (proj_id = 4737)  ->  Merge Join  (cost=13543.42..17565.44
rows=850width=81)        Merge Cond: ("outer".vendid = "inner".org_id)        ->  Merge Join  (cost=13543.42..17405.05
rows=850width=63)              Merge Cond: (("outer".vendid = "inner".vendid) AND (("outer".invnum)::text =
"inner"."?column10?"))             ->  Index Scan using apinv_hdr_pkey on apinv_hdr h  (cost=0.00..3148.16 rows=51016
width=21)             ->  Sort  (cost=13543.42..13693.47 rows=60020 width=55)                    Sort Key: i.vendid,
(i.invnum)::text                   ->  Seq Scan on apinv_items i  (cost=0.00..7197.27 rows=60020 width=55)
           Filter: (((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar)) AND (proj = ANY
($0)))       ->  Index Scan using vend_org_pkey on vend_org v  (cost=0.00..145.52 rows=1799 width=26)  ->  Materialize
(cost=3.54..3.55rows=1 width=4)        ->  Seq Scan on acct a  (cost=0.00..3.54 rows=1 width=4)              Filter:
((code)::text= 'ap'::text)
 

---------------------------------------------------------------------------

The worst case is the following types of queries (about 5 seconds):
 select * from ledger where proj in (select prog_id from proj_prog where proj_id = 4737); select l.* from ledger l,
proj_progp where l.proj = p.prog_id and p.proj_id = 4737;
 

---------------------------------------------------------------------------Hash Join  (cost=19032.47..23510.23 rows=6
width=85) Hash Cond: ("outer".proj = "inner".prog_id)  ->  Nested Loop Left Join  (cost=18994.38..23378.41 rows=1700
width=85)       ->  Hash Join  (cost=18990.84..23340.87 rows=1700 width=81)              Hash Cond: ("outer".vendid =
"inner".org_id)             ->  Merge Join  (cost=18935.35..23255.64 rows=1700 width=63)                    Merge Cond:
(("outer".vendid= "inner".vendid) AND (("outer".invnum)::text = "inner"."?column10?"))                    ->  Index
Scanusing apinv_hdr_pkey on apinv_hdr h  (cost=0.00..3148.16 rows=51016 width=21)                    ->  Sort
(cost=18935.35..19235.45rows=120041 width=55)                          Sort Key: i.vendid, (i.invnum)::text
            ->  Seq Scan on apinv_items i  (cost=0.00..4152.99 rows=120041 width=55)
Filter:((status = 'en'::bpchar) OR (status = 'cl'::bpchar) OR (status = 'pd'::bpchar))              ->  Hash
(cost=50.99..50.99rows=1799 width=26)                    ->  Seq Scan on vend_org v  (cost=0.00..50.99 rows=1799
width=26)       ->  Materialize  (cost=3.54..3.55 rows=1 width=4)              ->  Seq Scan on acct a  (cost=0.00..3.54
rows=1width=4)                    Filter: ((code)::text = 'ap'::text)  ->  Hash  (cost=38.04..38.04 rows=21 width=4)
   ->  Index Scan using proj_prog_pkey on proj_prog p  (cost=0.00..38.04 rows=21 width=4)              Index Cond:
(proj_id= 4737)
 
---------------------------------------------------------------------------

I would like to be able to get the best performance like in the first query but 
without having to enumerate the projects (i.e. using a single query).
The secret seems to be the bitmap index scans.

Any ideas about whether/how this can be done?

Thanks!

Kyle Bateman



---------------------------------------------------------------------------
BTW, The ledger view is built roughly as follows:

create view rp_v_api as   select       h.adate                                 as adate,       (i.price *
i.quant)::numeric(14,2)     as amount,       substring(v.org_name from 1 for 40)     as descr, 
 
       i.proj                  as proj,       i.acct                  as acct,       1                       as
cr_proj,      a.acct_id               as cr_acct 
 
   from (       apinv_hdr               h       join apinv_items        i       on i.vendid     = h.vendid and i.invnum
=h.invnum       join vend_org           v       on v.org_id     = h.vendid       left join acct          a       on
a.code      = 'ap'   )   where i.status in ('en','cl','pd');
 




pgsql-sql by date:

Previous
From: Andrew Sullivan
Date:
Subject: Re: Breaking up a query
Next
From: Florian Weimer
Date:
Subject: Re: Using bitmap index scans-more efficient