Thread: Odd performance / query plan with bitmasked field as opposed to equality
I can't figure what is going on below; first of all, this count which returns 1.5 million from a ~2 million row table: woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE "webapp_person"."permissionflags" = B'0000000000001111111111111111111111111111'::"bit"; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=125774.83..125774.84 rows=1 width=0) (actual time=2976.405..2976.405 rows=1 loops=1) -> Seq Scan on webapp_person (cost=0.00..122041.10 rows=1493490 width=0) (actual time=0.019..2781.735 rows=1518635 loops=1) Filter: (permissionflags = B'0000000000001111111111111111111111111111'::"bit") Total runtime: 2976.475 ms (4 rows) is slower than woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE "webapp_person"."permissionflags" & b'0000000000000000000000000000000000000100' = b'0000000000000000000000000000000000000100'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=49341.55..49341.56 rows=1 width=0) (actual time=1035.226..1035.226 rows=1 loops=1) -> Bitmap Heap Scan on webapp_person (cost=26280.49..49316.11 rows=10174 width=0) (actual time=221.672..872.037 rows=1518630 loops=1) Recheck Cond: ((permissionflags & B'0000000000000000000000000000000000000100'::"bit") = B'0000000000000000000000000000000000000100'::"bit") -> Bitmap Index Scan on webapp_person_permissionflags_bitmasked0100_idx (cost=0.00..26277.95 rows=10174 width=0) (actual time=186.596..186.596 rows=1574558 loops=1) Total runtime: 1035.328 ms (5 rows) with both a straight btree index on the permissionflags column, and a conditional index that matches the where clause in the 2nd example. Both queries return the same result because with current data, the only one value in the table which matches the bitmask. How come the more complicated one is 3x faster? Maybe the size of the conditional index? But why does the first form not use the conditional index? Now if I add a straight join with another ~300k row table: woome=# explain analyze SELECT COUNT(*) FROM webapp_person join webapp_report on webapp_person.id = webapp_report.reported_id WHERE webapp_report.crime='IP_SPAM' and webapp_person.permissionflags = b'0000000000001111111111111111111111111111'; QUERY PLAN --------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=150870.81..150870.82 rows=1 width=0) (actual time=4024.475..4024.475 rows=1 loops=1) -> Hash Join (cost=140740.92..150531.01 rows=135922 width=0) (actual time=3601.576..4013.332 rows=91126 loops=1) Hash Cond: (webapp_report.reported_id = webapp_person.id) -> Seq Scan on webapp_report (cost=0.00..6579.06 rows=185180 width=4) (actual time=0.024..88.038 rows=183558 loops=1) Filter: ((crime)::text = 'IP_SPAM'::text) -> Hash (cost=122059.09..122059.09 rows=1494547 width=4) (actual time=3600.877..3600.877 rows=1518724 loops=1) -> Seq Scan on webapp_person (cost=0.00..122059.09 rows=1494547 width=4) (actual time=0.011..2984.683 rows=1518724 loops=1) Filter: (permissionflags = B'0000000000001111111111111111111111111111'::"bit") Total runtime: 4043.415 ms (9 rows) it adds a couple of seconds but the execution time stays within the same order of magnitude. However if I now replace the straight equality comparison there with the bitmasked comparison, I get woome=# explain analyze SELECT COUNT(*) FROM webapp_person join webapp_report on webapp_person.id = webapp_report.reported_id WHERE webapp_report.crime='IP_SPAM' and webapp_person.permissionflags & b'0000000000000000000000000000000000000100' = b'0000000000000000000000000000000000000100'; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Aggregate (cost=58927.28..58927.29 rows=1 width=0) (actual time=58004.762..58004.762 rows=1 loops=1) -> Hash Join (cost=49558.94..58924.96 rows=926 width=0) (actual time=1505.944..57978.469 rows=91132 loops=1) Hash Cond: (webapp_report.reported_id = webapp_person.id) -> Seq Scan on webapp_report (cost=0.00..6579.06 rows=185180 width=4) (actual time=0.030..201.187 rows=183564 loops=1) Filter: ((crime)::text = 'IP_SPAM'::text) -> Hash (cost=49431.68..49431.68 rows=10181 width=4) (actual time=1505.462..1505.462 rows=1518756 loops=1) -> Bitmap Heap Scan on webapp_person (cost=26383.65..49431.68 rows=10181 width=4) (actual time=225.114..1058.692 rows=1518756 loops=1) Recheck Cond: ((permissionflags & B'0000000000000000000000000000000000000100'::"bit") = B'0000000000000000000000000000000000000100'::"bit") -> Bitmap Index Scan on webapp_person_permissionflags_bitmasked0100_idx (cost=0.00..26381.10 rows=10181 width=0) (actual time=188.089..188.089 rows=1579945 loops=1) Total runtime: 58004.897 ms (10 rows) Time: 58024.124 ms It takes almost a minute to run. My first question is, why is the actual execution time for the hash join in the last example 15x higher with the bitmasked condition than with the straight equality version of the query above? This being even more puzzling since the performance relationship between bitmask and equality is reversed if I omit the join altogether, presumably because the conditional index gets used ... however if I define a conditional index on the column for the value that matches the bitmask, it still doesn't get used with equality ... I'm generally interested in the behaviour/performance of such bitmask fields with and without indexes as we've started using it a lot. The above seems to suggest there a things to keep in mind and to observe. Regards, Frank
On Mon, Jul 13, 2009 at 4:46 PM, Frank Joerdens<frank@joerdens.de> wrote: > I can't figure what is going on below; first of all, this count which > returns 1.5 million from a ~2 million row table: > > woome=# explain analyze SELECT COUNT(*) FROM "webapp_person" WHERE > "webapp_person"."permissionflags" = > B'0000000000001111111111111111111111111111'::"bit"; > QUERY PLAN > --------------------------------------------------------------------------------------------------------------------------------- > Aggregate (cost=125774.83..125774.84 rows=1 width=0) (actual > time=2976.405..2976.405 rows=1 loops=1) > -> Seq Scan on webapp_person (cost=0.00..122041.10 rows=1493490 > width=0) (actual time=0.019..2781.735 rows=1518635 loops=1) > Filter: (permissionflags = > B'0000000000001111111111111111111111111111'::"bit") > Total runtime: 2976.475 ms > (4 rows) There are two possibilities here: the planner thinks it CAN'T use the relevant index for this query, or it thinks that the index will be slower than just seq-scaning the whole table. To figure out which it is, try EXPLAIN ANALYZE again with enable_seqscan set to false (note: this is a bad idea in general, but useful for debugging). If you still get a seqscan anyway, then there's some reason why it thinks that it can't use the index (which we can investigate). If that makes it switch to an index scan, then you can try adjusting your cost parameters. But the first thing is to figure out which kind of problem you have. In any case, send the output to the list. Solving this problem will probably shed some light on the other things in your original email, so I'm not going to specifically address each one at this point. ...Robert