Thread: BUG #2075: Strange choice of bitmap-index-scan

BUG #2075: Strange choice of bitmap-index-scan

From
"Arjen"
Date:
The following bug has been logged online:

Bug reference:      2075
Logged by:          Arjen
Email address:      acmmailing@tweakers.net
PostgreSQL version: 8.1.0
Operating system:   Gentoo linux (2.6.11 kernel)
Description:        Strange choice of bitmap-index-scan
Details:

I haven't tried a minimal test-case, but this table:

tweakers=# \d pwprodukten
                                     Table "public.pwprodukten"
     Column     |          Type          |                        Modifiers
----------------+------------------------+----------------------------------
------------------------
 id             | integer                | not null default
nextval('pwprodukten_id_seq'::regclass)
 upid           | character varying(40)  | not null default ''::character
varying
 naam           | character varying(110) | not null
 cat2           | smallint               | not null default 0::smallint
 grafiek        | character(1)           | default 'N'::bpchar
 trend          | smallint               | default 0::smallint
 image          | smallint               | default 0::smallint
 meuk           | smallint               | default 0::smallint
 views          | smallint               | default 0::smallint
 popuindex      | smallint               | default 0::smallint
 url            | character varying(255) |
 infoid         | smallint               | default 0::smallint
 zichtbaar      | boolean                | not null default false
 havereview     | character(1)           | default 'N'::bpchar
 prerelease     | character(1)           | not null default 'N'::bpchar
 havebenchmarks | character(1)           | default 'N'::bpchar
Indexes:
    "pwprodukten_pkey" PRIMARY KEY, btree (id)
    "pwprodukten_cat2_popuindex" btree (cat2, popuindex)
    "pwprodukten_cat2_zichtbaar" btree (cat2, zichtbaar)
    "pwprodukten_infoid" btree (infoid)
    "pwprodukten_upid2" btree (upid)
    "pwprodukten_zichtbaar" btree (zichtbaar)

With this query:
SELECT
*
FROM pwprodukten pr
WHERE pr.Cat2 = 51
AND pr.Zichtbaar = 'true';

yields this plan:
                                           QUERY PLAN
----------------------------------------------------------------------------
---------------------
 Bitmap Heap Scan on pwprodukten pr  (cost=5.62..9.63 rows=144 width=134)
   Recheck Cond: (cat2 = 51)
   Filter: zichtbaar
   ->  BitmapAnd  (cost=5.62..5.62 rows=1 width=0)
         ->  Bitmap Index Scan on pwprodukten_cat2_popuindex
(cost=0.00..2.50 rows=144 width=0)
               Index Cond: (cat2 = 51)
         ->  Bitmap Index Scan on pwprodukten_cat2_zichtbaar
(cost=0.00..2.86 rows=144 width=0)
               Index Cond: ((cat2 = 51) AND (zichtbaar = true))

So, it uses the correct index, but somehow decides to also use the other
cat2_... index, which it doesn't need of course.

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Tom Lane
Date:
"Arjen" <acmmailing@tweakers.net> writes:
>    ->  BitmapAnd  (cost=5.62..5.62 rows=1 width=0)
>          ->  Bitmap Index Scan on pwprodukten_cat2_popuindex
> (cost=0.00..2.50 rows=144 width=0)
>                Index Cond: (cat2 = 51)
>          ->  Bitmap Index Scan on pwprodukten_cat2_zichtbaar
> (cost=0.00..2.86 rows=144 width=0)
>                Index Cond: ((cat2 = 51) AND (zichtbaar = true))

Hmmm ... I can reproduce that if *all* the rows in the table have
zichtbaar = true (or at least the ANALYZE stats say so) ... is that
the case in your data?

            regards, tom lane

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Tom Lane
Date:
"Arjen" <acmmailing@tweakers.net> writes:
> So, it uses the correct index, but somehow decides to also use the other
> cat2_... index, which it doesn't need of course.

I've tweaked the heuristics in choose_bitmap_and to (hopefully) work a
bit better in scenarios like this.  Thanks for the example.

            regards, tom lane

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Arjen van der Meijden
Date:
Hi Tom,

The "zichtbaar" as false is indeed a very rare case and appearantly
isn't occuring right now. There are indeed 46631 rows in total, and all
46631 have the "zichtbaar" as true. Which reminds me to adjust the index
anyway ;-)

It appears to be happening if the fraction of zichtbaar's is small
enough. With 1 and 8 as false, it happens, with 27 as false its not
happening.

Best regards,

Arjen

Tom Lane wrote:
> "Arjen" <acmmailing@tweakers.net> writes:
>>    ->  BitmapAnd  (cost=5.62..5.62 rows=1 width=0)
>>          ->  Bitmap Index Scan on pwprodukten_cat2_popuindex
>> (cost=0.00..2.50 rows=144 width=0)
>>                Index Cond: (cat2 = 51)
>>          ->  Bitmap Index Scan on pwprodukten_cat2_zichtbaar
>> (cost=0.00..2.86 rows=144 width=0)
>>                Index Cond: ((cat2 = 51) AND (zichtbaar = true))
>
> Hmmm ... I can reproduce that if *all* the rows in the table have
> zichtbaar = true (or at least the ANALYZE stats say so) ... is that
> the case in your data?
>
>             regards, tom lane
>

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Tom Lane
Date:
Arjen van der Meijden <acmmailing@tweakers.net> writes:
> I found another example, in case you're interested:

Did you apply the patch?

            regards, tom lane

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Arjen van der Meijden
Date:
Tom Lane wrote:
> Arjen van der Meijden <acmmailing@tweakers.net> writes:
>> I found another example, in case you're interested:
>
> Did you apply the patch?

No, it was just another test-case I stumbled upon. And I'd rather nog
install development versions of postgresql on the machine in question.
If you'd really like to know whether anything helped, I'll get a
cvs-version on another machine and test these queries on that installation.

Best regards,

Arjen

Re: BUG #2075: Strange choice of bitmap-index-scan

From
Arjen van der Meijden
Date:
Hi Tom,

I found another example, in case you're interested:

This query:
SELECT * FROM meuktracker m
    JOIN pwproduktrel p ON tabel = 'm' AND tabelid = m.id
WHERE m.id = (select min(id) from meuktracker where id > 7810);

It works ok if the subquery is replaced by the actual result, but this
one yields:

  Nested Loop  (cost=5.08..13.06 rows=9 width=1153)
    InitPlan
      ->  Result  (cost=0.73..0.74 rows=1 width=0)
            InitPlan
              ->  Limit  (cost=0.00..0.73 rows=1 width=4)
                    ->  Index Scan using meuktracker_pkey on meuktracker
  (cost=0.00..2168.16 rows=2981 width=4)
                          Index Cond: (id > 7810)
                          Filter: (id IS NOT NULL)
    ->  Index Scan using meuktracker_pkey on meuktracker m
(cost=0.00..3.88 rows=1 width=1140)
          Index Cond: (id = $1)
    ->  Bitmap Heap Scan on pwproduktrel p  (cost=4.34..8.36 rows=9
width=13)
          Recheck Cond: (($1 = tabelid) AND (tabel = 'm'::bpchar))
          ->  BitmapAnd  (cost=4.34..4.34 rows=1 width=0)
                ->  Bitmap Index Scan on pwproduktrel_tabelid_meuk
(cost=0.00..2.04 rows=9 width=0)
                      Index Cond: ($1 = tabelid)
                ->  Bitmap Index Scan on pwproduktrel_pkey
(cost=0.00..2.06 rows=9 width=0)
                      Index Cond: ((tabel = 'm'::bpchar) AND ($1 = tabelid))

With table structures:
                                    Table "public.meuktracker"
    Column    |          Type          |                        Modifiers
-------------+------------------------+----------------------------------------------------------
  id          | integer                | not null default
nextval('meuktracker_id_seq'::regclass)
  header      | character varying(40)  |
  message     | text                   |
  quote       | text                   |
  versie      | character varying(30)  |
  bron        | character varying(40)  |
  link        | character varying(255) |
  submitter   | integer                | not null
  filegrootte | integer                | not null
  licentieid  | smallint               | not null
  cat         | smallint               | not null
  authorid    | smallint               |
  time        | bigint                 | not null
  linksite    | character varying(255) | not null
  poll        | smallint               |
  embargo     | boolean                |
Indexes:
     "meuktracker_pkey" PRIMARY KEY, btree (id)
     "meuktracker_cat" btree (cat)
     "meuktracker_time" btree ("time")


and

               Table "public.pwproduktrel"
   Column   |     Type     |          Modifiers
-----------+--------------+-----------------------------
  tabel     | character(1) | not null
  tabelid   | integer      | not null
  produktid | integer      | not null
Indexes:
     "pwproduktrel_pkey" PRIMARY KEY, btree (tabel, tabelid, produktid)
     "pwproduktrel_produktid" btree (produktid)
     "pwproduktrel_produktid_meuk" btree (produktid) WHERE tabel =
'm'::bpchar
     "pwproduktrel_produktid_news" btree (produktid) WHERE tabel =
'n'::bpchar
     "pwproduktrel_tabel_image" btree (tabelid) WHERE tabel = 'i'::bpchar
     "pwproduktrel_tabel_produktid" btree (tabel, produktid)
     "pwproduktrel_tabelid" btree (tabelid)
     "pwproduktrel_tabelid_meuk" btree (tabelid) WHERE tabel = 'm'::bpchar
     "pwproduktrel_tabelid_news" btree (tabelid) WHERE tabel = 'n'::bpchar

(yes, I'm trying to figure out the best index combinations here ;-) )

Either of the two selected indexes is useable, but the
pwproduktrel_tabelid_meuk is obviously (much?) smaller in size and
therefore faster to look in, isn't it?

There are 10575 records in meuktracker and 146757 in pwproduktrel of
which 128513 are with tabel = 'm'.

Best regards,

Arjen

Tom Lane wrote:
> "Arjen" <acmmailing@tweakers.net> writes:
>> So, it uses the correct index, but somehow decides to also use the other
>> cat2_... index, which it doesn't need of course.
>
> I've tweaked the heuristics in choose_bitmap_and to (hopefully) work a
> bit better in scenarios like this.  Thanks for the example.
>
>             regards, tom lane
>