Thread: Bitmap reuse

Bitmap reuse

From
Jeff Janes
Date:
For some queries PostgreSQL can spend most of its time creating the exact same bitmap over and over.  For example, in the below case: (also attached as a file because line-wrapping is going to make a mess of it)

drop table if exists foo;
create table foo (x daterange, i int, t text);
insert into foo select daterange(x::date,x::date+3), random()*3000 from (select now()-interval '3 years'*random() as x from generate_series(1,1e6))foo;
vacuum analyze foo;
create index ON  foo using gist  ( x);
create index ON  foo   ( i);
explain (analyze, buffers) select * from generate_series(1,20) g(i), foo where x && '[2019-08-09,2019-08-11)' and g.i=foo.i;

                                                              QUERY PLAN                                                              
---------------------------------------------------------------------------------------------------------------------------------------
 Nested Loop  (cost=170.21..3563.24 rows=33 width=54) (actual time=1.295..24.890 rows=28 loops=1)
   Buffers: shared hit=543 read=8
   I/O Timings: read=0.040
   ->  Function Scan on generate_series g  (cost=0.00..0.20 rows=20 width=4) (actual time=0.007..0.014 rows=20 loops=1)
   ->  Bitmap Heap Scan on foo  (cost=170.21..178.13 rows=2 width=50) (actual time=1.238..1.240 rows=1 loops=20)
         Recheck Cond: ((i = g.i) AND (x && '[2019-08-09,2019-08-11)'::daterange))
         Heap Blocks: exact=28
         Buffers: shared hit=543 read=8
         I/O Timings: read=0.040
         ->  BitmapAnd  (cost=170.21..170.21 rows=2 width=0) (actual time=1.234..1.234 rows=0 loops=20)
               Buffers: shared hit=515 read=8
               I/O Timings: read=0.040
               ->  Bitmap Index Scan on foo_i_idx  (cost=0.00..6.92 rows=333 width=0) (actual time=0.031..0.031 rows=327 loops=20)
                     Index Cond: (i = g.i)
                     Buffers: shared hit=55 read=8
                     I/O Timings: read=0.040
               ->  Bitmap Index Scan on foo_x_idx  (cost=0.00..161.78 rows=5000 width=0) (actual time=1.183..1.183 rows=3670 loops=20)
                     Index Cond: (x && '[2019-08-09,2019-08-11)'::daterange)
                     Buffers: shared hit=460

Note that the fast bitmap index scan is parameterized to the other side of the nested loop, so has to be recomputed.  While the slow one is parameterized to a constant, so it could in principle just be reused.

What kind of infrastructure would be needed to detect this case and reuse that bitmap?

Cheers,

Jeff
Attachment

Re: Bitmap reuse

From
Tom Lane
Date:
Jeff Janes <jeff.janes@gmail.com> writes:
> For some queries PostgreSQL can spend most of its time creating the exact
> same bitmap over and over.  For example, in the below case: (also attached
> as a file because line-wrapping is going to make a mess of it)

Uh .... it's not the "exact same bitmap each time", because the selected
rows vary depending on the value of g.i.

If the output of the subplan was indeed constant, I'd expect the planner
to stick a Materialize node atop it.  That would almost certainly win
more than re-using the index output to scan the heap additional times.

            regards, tom lane



Re: Bitmap reuse

From
David Rowley
Date:
On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> Jeff Janes <jeff.janes@gmail.com> writes:
> > For some queries PostgreSQL can spend most of its time creating the exact
> > same bitmap over and over.  For example, in the below case: (also attached
> > as a file because line-wrapping is going to make a mess of it)
>
> Uh .... it's not the "exact same bitmap each time", because the selected
> rows vary depending on the value of g.i.

I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
the combined ANDed bitmap from both indexes.

I didn't look in detail, but I'd think it would just be a matter of
caching the bitmap then in ExecReScanBitmapIndexScan(), if the
PlanState's chgParam indicate a parameter has changed, then throw away
the cache.  Then if the cache is still valid in
MultiExecBitmapIndexScan(), return it.  Not too sure about the memory
context part for the cache. As I said, I didn't look in detail.

David



Re: Bitmap reuse

From
Tom Lane
Date:
David Rowley <dgrowleyml@gmail.com> writes:
> On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>> Uh .... it's not the "exact same bitmap each time", because the selected
>> rows vary depending on the value of g.i.

> I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
> the combined ANDed bitmap from both indexes.

To re-use that, you'd need a way to prevent the upper node from
destructively modifying the tidbitmap.

            regards, tom lane



Re: Bitmap reuse

From
David Rowley
Date:
On Wed, 21 Jul 2021 at 13:32, Tom Lane <tgl@sss.pgh.pa.us> wrote:
>
> David Rowley <dgrowleyml@gmail.com> writes:
> > I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not
> > the combined ANDed bitmap from both indexes.
>
> To re-use that, you'd need a way to prevent the upper node from
> destructively modifying the tidbitmap.

Yeah.  And that would slow things down in the case where it was just
executed once as we'd need to make a copy of it to prevent the cached
version from being modified regardless if it would ever be used again
or not.

Maybe the planner would need to be involved in making the decision of
if the bitmap index scan should tuck away a carbon copy of the
resulting TIDBitmap after the first scan.  That way on rescan we could
just make a copy of the cached version and return that.  That saves
having to modify the callers to tell them not to damage the returned
TIDBitmap.

David



Re: Bitmap reuse

From
David Rowley
Date:
On Thu, 22 Jul 2021 at 01:54, David Rowley <dgrowleyml@gmail.com> wrote:
> Maybe the planner would need to be involved in making the decision of
> if the bitmap index scan should tuck away a carbon copy of the
> resulting TIDBitmap after the first scan.  That way on rescan we could
> just make a copy of the cached version and return that.  That saves
> having to modify the callers to tell them not to damage the returned
> TIDBitmap.

Oh but, meh. Caching could blow out work_mem...  We might end up using
work_mem * 2.

David