Thread: bitmap scan issues 8.1 devel

From:
"Merlin Moncure"
Date:

Hello,
Doing some testing on upcoming 8.1 devel and am having serious issues
with new bitmap index scan feature.  It is easy to work around (just
disable it) but IMO the planner is using it when a regular index scan
should be strongly favored.  The performance of the bitmapscan in my
usage is actually quite a bit worse than a full sequential scan.

here is a query which does this:
explain analyze execute
data1_read_next_product_structure_file_0('012241', '', '', '002', 1);

Here is the 8.0/bitmap off plan:
Limit  (cost=0.00..45805.23 rows=5722 width=288) (actual
time=0.070..0.072 rows=1 loops=1)
   ->  Index Scan using product_structure_file_pkey on
product_structure_file  (cost=0.00..45805.23 rows=5722 width=288)
(actual time=0.063..0.063 row
s=1 loops=1)
         Index Cond: ((ps_parent_code)::text >= ($1)::text)
         Filter: ((((ps_parent_code)::text > ($1)::text) OR
(ps_group_code >= $2)) AND (((ps_parent_code)::text > ($1)::text) OR
(ps_group_code > $2)
OR ((ps_section_code)::text >= ($3)::text)) AND (((ps_parent_code)::text
> ($1)::text) OR (ps_group_code > $2) OR ((ps_section_code)::text >
($3)::tex
t) OR ((ps_seq_no)::smallint > $4)))
 Total runtime: 0.185 ms

Here is the 8.1 with bitamp on:
Limit  (cost=3768.32..3782.63 rows=5722 width=288) (actual
time=2287.488..2287.490 rows=1 loops=1)
   ->  Sort  (cost=3768.32..3782.63 rows=5722 width=288) (actual
time=2287.480..2287.480 rows=1 loops=1)
         Sort Key: ps_parent_code, ps_group_code, ps_section_code,
ps_seq_no
         ->  Bitmap Heap Scan on product_structure_file
(cost=187.84..3411.20 rows=5722 width=288) (actual time=19.977..514.532
rows=47355 loops=1)
               Recheck Cond: ((ps_parent_code)::text >= ($1)::text)
               Filter: ((((ps_parent_code)::text > ($1)::text) OR
(ps_group_code >= $2)) AND (((ps_parent_code)::text > ($1)::text) OR
(ps_group_code
> $2) OR ((ps_section_code)::text >= ($3)::text)) AND
(((ps_parent_code)::text > ($1)::text) OR (ps_group_code > $2) OR
((ps_section_code)::text > ($3
)::text) OR ((ps_seq_no)::smallint > $4)))
               ->  Bitmap Index Scan on product_structure_file_pkey
(cost=0.00..187.84 rows=18239 width=0) (actual time=19.059..19.059
rows=47356 loo
ps=1)
                     Index Cond: ((ps_parent_code)::text >= ($1)::text)
 Total runtime: 2664.034 ms


Here is the prepared statement definition:
prepare data1_read_next_product_structure_file_0 (character varying,
character, character varying, int4, int4)
    as select 1::int4, * from data1.product_structure_file
    where ps_parent_code >= $1 and
        (ps_parent_code >  $1 or  ps_group_code >= $2) and
        (ps_parent_code >  $1 or  ps_group_code >  $2 or
ps_section_code >= $3) and
        (ps_parent_code >  $1 or  ps_group_code >  $2 or
ps_section_code >  $3 or  ps_seq_no >  $4)
    order by ps_parent_code, ps_group_code, ps_section_code,
ps_seq_no
    limit $5

Aside: this is the long way of writing
select 1::int4, * from data1.product_structure_file where
(ps_parent_code, ps_group_code, ps_section_code, ps_seq_no) > ($1, $2,
$3, $4) limit %5

which is allowed in pg but returns the wrong answer.

Merlin

From:
Tom Lane
Date:

"Merlin Moncure" <> writes:
> Doing some testing on upcoming 8.1 devel and am having serious issues
> with new bitmap index scan feature.  It is easy to work around (just
> disable it) but IMO the planner is using it when a regular index scan
> should be strongly favored.

I think blaming the bitmap code is the wrong response.  What I see in
your example is that the planner doesn't know what the LIMIT value is,
and accordingly is favoring a plan that isn't going to get blown out of
the water if the LIMIT is large.  I'd suggest not parameterizing the
LIMIT.

(But hmm ... I wonder if we could use estimate_expression_value for
LIMIT items, instead of handling only simple Consts as the code does
now?)

            regards, tom lane

From:
Josh Berkus
Date:

Merlin,

>    ->  Index Scan using product_structure_file_pkey on
> product_structure_file  (cost=0.00..45805.23 rows=5722 width=288)
> (actual time=0.063..0.063 row
> s=1 loops=1)

It appears that your DB is estimating the number of rows returned much too
high (5722 instead of 1).  Please raise the statistics on all columns to
about 500, analyze, and try your test again.

Thanks!

--
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco