Re: Bitmap scans vs. the statistics views - Mailing list pgsql-hackers

From Tom Lane
Subject Re: Bitmap scans vs. the statistics views
Date
Msg-id 12175.1114198203@sss.pgh.pa.us
Whole thread Raw
In response to Re: Bitmap scans vs. the statistics views  (Josh Berkus <josh@agliodbs.com>)
Responses Re: Bitmap scans vs. the statistics views
Re: Bitmap scans vs. the statistics views
List pgsql-hackers
Josh Berkus <josh@agliodbs.com> writes:
> Well, technically a bitmapscan is a different operation.   So it should 
> probably have its own columns.  Unless you're talking about an overhaul of 
> the stats views more drastic than that?   If so, what?

That was basically what I was asking: do we expand all the stats support
to handle this as a separate path, and if so what does it look like
exactly?  I'm particularly unclear what to do at the level of the
functions described in
http://www.postgresql.org/docs/8.0/static/monitoring-stats.html#MONITORING-STATS-FUNCS-TABLE

I've never fully understood the distinction the stats make between
"tuples fetched" and "tuples returned", and it's even less obvious how
to apply it when the index and heap operations are decoupled.  In
particular the function design assumes that heap tuple fetches can be
tied to particular indexes, which is now a broken assumption.  You
might be amused by this test case I just finished debugging:

regression=# explain analyze select * from tenk1, int4_tbl where unique1 in (40,50,f1) and unique2 >= 3999 and unique2
<9999;                                                                         QUERY PLAN
 

---------------------------------------------------------------------------------------------------------------------------------------------------------------Nested
Loop (cost=9.07..66.56 rows=4 width=248) (actual time=16.266..77.452 rows=6 loops=1)  ->  Seq Scan on int4_tbl
(cost=0.00..1.05rows=5 width=4) (actual time=0.055..0.105 rows=5 loops=1)  ->  Bitmap Heap Scan on tenk1
(cost=9.07..13.08rows=1 width=244) (actual time=15.387..15.396 rows=1 loops=5)        Recheck Cond: (((tenk1.unique1 =
40)OR (tenk1.unique1 = 50) OR (tenk1.unique1 = "outer".f1)) AND (tenk1.unique2 >= 3999) AND (tenk1.unique2 < 9999))
  ->  BitmapAnd  (cost=9.07..9.07 rows=50 width=0) (actual time=15.353..15.353 rows=0 loops=5)              ->
BitmapOr (cost=6.52..6.52 rows=50 width=0) (actual time=0.152..0.152 rows=0 loops=5)                    ->  Bitmap
IndexScan on tenk1_unique1  (cost=0.00..2.17 rows=50 width=0) (actual time=0.059..0.059 rows=1 loops=5)
        Index Cond: (unique1 = 40)                    ->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.17 rows=50
width=0)(actual time=0.032..0.032 rows=1 loops=5)                          Index Cond: (unique1 = 50)
->  Bitmap Index Scan on tenk1_unique1  (cost=0.00..2.17 rows=50 width=0) (actual time=0.029..0.029 rows=0 loops=5)
                    Index Cond: (tenk1.unique1 = "outer".f1)              ->  Bitmap Index Scan on tenk1_unique2
(cost=0.00..2.30rows=50 width=0) (actual time=15.148..15.148 rows=6000 loops=5)                    Index Cond:
((unique2>= 3999) AND (unique2 < 9999))Total runtime: 78.369 ms
 
(15 rows)

What exactly do we want to count here?  The 6000 TIDs pulled from
tenk1_unique2 don't translate into much of anything at the heap
access stage.  (Shortly I'm going to add some logic to not bother
using very nonselective index conditions in BitAnd, but it's not there
right now.)

> I'm not clear on why bitmapscan doesn't bump tuples_returned.   Can you 
> explain?

Well, there was no such bump in the bits of code I cribbed to make
nodeBitmapHeapScan and friends ;-).  It is easy enough to add, once
we have a clear idea of what we want to count, but I don't feel that
I have that idea yet.
        regards, tom lane


pgsql-hackers by date:

Previous
From: Ron Mayer
Date:
Subject: Re: possible TODO: read-only tables, select from indexes only.
Next
From: Jan Wieck
Date:
Subject: Re: Bitmap scans vs. the statistics views