Odd performance / query plan with bitmasked field as opposed to equality - Mailing list pgsql-performance

From Frank Joerdens
Subject Odd performance / query plan with bitmasked field as opposed to equality
Date
Msg-id 7d10d2df0907131346p2e1446ddwf736f03b535381d1@mail.gmail.com
Whole thread Raw
Responses Re: Odd performance / query plan with bitmasked field as opposed to equality  (Robert Haas <robertmhaas@gmail.com>)
List pgsql-performance
I can't figure what is going on below; first of all, this count  which
returns 1.5 million from a ~2 million row table:

woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE
"webapp_person"."permissionflags" =
B'0000000000001111111111111111111111111111'::"bit";
                                                           QUERY PLAN

---------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=125774.83..125774.84 rows=1 width=0) (actual
time=2976.405..2976.405 rows=1 loops=1)
   ->  Seq Scan on webapp_person  (cost=0.00..122041.10 rows=1493490
width=0) (actual time=0.019..2781.735 rows=1518635 loops=1)
         Filter: (permissionflags =
B'0000000000001111111111111111111111111111'::"bit")
 Total runtime: 2976.475 ms
(4 rows)

is slower than

woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE
"webapp_person"."permissionflags" &
b'0000000000000000000000000000000000000100' =
b'0000000000000000000000000000000000000100';

            QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=49341.55..49341.56 rows=1 width=0) (actual
time=1035.226..1035.226 rows=1 loops=1)
   ->  Bitmap Heap Scan on webapp_person  (cost=26280.49..49316.11
rows=10174 width=0) (actual time=221.672..872.037 rows=1518630
loops=1)
         Recheck Cond: ((permissionflags &
B'0000000000000000000000000000000000000100'::"bit") =
B'0000000000000000000000000000000000000100'::"bit")
         ->  Bitmap Index Scan on
webapp_person_permissionflags_bitmasked0100_idx  (cost=0.00..26277.95
rows=10174 width=0) (actual time=186.596..186.596 rows=1574558
loops=1)
 Total runtime: 1035.328 ms
(5 rows)

with both a straight btree index on the permissionflags column, and a
conditional index that matches the where clause in the 2nd example.
Both queries return the same result because with current data, the
only one value in the table which matches the bitmask. How come the
more complicated one is 3x faster? Maybe the size of the conditional
index? But why does the first form not use the conditional index?

Now if I add a straight join with another ~300k row table:

woome=# explain analyze SELECT COUNT(*) FROM webapp_person join
webapp_report on webapp_person.id = webapp_report.reported_id WHERE
webapp_report.crime='IP_SPAM' and webapp_person.permissionflags =
b'0000000000001111111111111111111111111111';
                                                                 QUERY
PLAN

---------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=150870.81..150870.82 rows=1 width=0) (actual
time=4024.475..4024.475 rows=1 loops=1)
   ->  Hash Join  (cost=140740.92..150531.01 rows=135922 width=0)
(actual time=3601.576..4013.332 rows=91126 loops=1)
         Hash Cond: (webapp_report.reported_id = webapp_person.id)
         ->  Seq Scan on webapp_report  (cost=0.00..6579.06
rows=185180 width=4) (actual time=0.024..88.038 rows=183558 loops=1)
               Filter: ((crime)::text = 'IP_SPAM'::text)
         ->  Hash  (cost=122059.09..122059.09 rows=1494547 width=4)
(actual time=3600.877..3600.877 rows=1518724 loops=1)
               ->  Seq Scan on webapp_person  (cost=0.00..122059.09
rows=1494547 width=4) (actual time=0.011..2984.683 rows=1518724
loops=1)
                     Filter: (permissionflags =
B'0000000000001111111111111111111111111111'::"bit")
 Total runtime: 4043.415 ms
(9 rows)

it adds a couple of seconds but the execution time stays within the
same order of magnitude. However if I now replace the straight
equality comparison there with the bitmasked comparison, I get

woome=# explain analyze SELECT COUNT(*) FROM webapp_person join
webapp_report on webapp_person.id = webapp_report.reported_id WHERE
webapp_report.crime='IP_SPAM' and webapp_person.permissionflags &
b'0000000000000000000000000000000000000100' =
b'0000000000000000000000000000000000000100';

                  QUERY PLAN

--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=58927.28..58927.29 rows=1 width=0) (actual
time=58004.762..58004.762 rows=1 loops=1)
   ->  Hash Join  (cost=49558.94..58924.96 rows=926 width=0) (actual
time=1505.944..57978.469 rows=91132 loops=1)
         Hash Cond: (webapp_report.reported_id = webapp_person.id)
         ->  Seq Scan on webapp_report  (cost=0.00..6579.06
rows=185180 width=4) (actual time=0.030..201.187 rows=183564 loops=1)
               Filter: ((crime)::text = 'IP_SPAM'::text)
         ->  Hash  (cost=49431.68..49431.68 rows=10181 width=4)
(actual time=1505.462..1505.462 rows=1518756 loops=1)
               ->  Bitmap Heap Scan on webapp_person
(cost=26383.65..49431.68 rows=10181 width=4) (actual
time=225.114..1058.692 rows=1518756 loops=1)
                     Recheck Cond: ((permissionflags &
B'0000000000000000000000000000000000000100'::"bit") =
B'0000000000000000000000000000000000000100'::"bit")
                     ->  Bitmap Index Scan on
webapp_person_permissionflags_bitmasked0100_idx  (cost=0.00..26381.10
rows=10181 width=0) (actual time=188.089..188.089 rows=1579945
loops=1)
 Total runtime: 58004.897 ms
(10 rows)

Time: 58024.124 ms

It takes almost a minute to run. My first question is, why is the
actual execution time for the hash join in the last example 15x higher
with the bitmasked condition than with the straight equality version
of the query above? This being even more puzzling since the
performance relationship between bitmask and equality is reversed if I
omit the join altogether, presumably because the conditional index
gets used ... however if I define a conditional index on the column
for the value that matches the bitmask, it still doesn't get used with
equality ...

I'm generally interested in the behaviour/performance of such bitmask
fields with and without indexes as we've started using it a lot. The
above seems to suggest there a things to keep in mind and to observe.

Regards,

Frank

pgsql-performance by date:

Previous
From: "Haszlakiewicz, Eric"
Date:
Subject: Re: embedded sql regression from 8.2.4 to 8.3.7
Next
From: Marc Cousin
Date:
Subject: Re: Very big insert/join performance problem (bacula)