Thread: Bitmap reuse
For some queries PostgreSQL can spend most of its time creating the exact same bitmap over and over. For example, in the below case: (also attached as a file because line-wrapping is going to make a mess of it)
drop table if exists foo;
create table foo (x daterange, i int, t text);
insert into foo select daterange(x::date,x::date+3), random()*3000 from (select now()-interval '3 years'*random() as x from generate_series(1,1e6))foo;
vacuum analyze foo;
create index ON foo using gist ( x);
create index ON foo ( i);
explain (analyze, buffers) select * from generate_series(1,20) g(i), foo where x && '[2019-08-09,2019-08-11)' and g.i=foo.i;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=170.21..3563.24 rows=33 width=54) (actual time=1.295..24.890 rows=28 loops=1)
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> Function Scan on generate_series g (cost=0.00..0.20 rows=20 width=4) (actual time=0.007..0.014 rows=20 loops=1)
-> Bitmap Heap Scan on foo (cost=170.21..178.13 rows=2 width=50) (actual time=1.238..1.240 rows=1 loops=20)
Recheck Cond: ((i = g.i) AND (x && '[2019-08-09,2019-08-11)'::daterange))
Heap Blocks: exact=28
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> BitmapAnd (cost=170.21..170.21 rows=2 width=0) (actual time=1.234..1.234 rows=0 loops=20)
Buffers: shared hit=515 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_i_idx (cost=0.00..6.92 rows=333 width=0) (actual time=0.031..0.031 rows=327 loops=20)
Index Cond: (i = g.i)
Buffers: shared hit=55 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_x_idx (cost=0.00..161.78 rows=5000 width=0) (actual time=1.183..1.183 rows=3670 loops=20)
Index Cond: (x && '[2019-08-09,2019-08-11)'::daterange)
Buffers: shared hit=460
create table foo (x daterange, i int, t text);
insert into foo select daterange(x::date,x::date+3), random()*3000 from (select now()-interval '3 years'*random() as x from generate_series(1,1e6))foo;
vacuum analyze foo;
create index ON foo using gist ( x);
create index ON foo ( i);
explain (analyze, buffers) select * from generate_series(1,20) g(i), foo where x && '[2019-08-09,2019-08-11)' and g.i=foo.i;
QUERY PLAN
---------------------------------------------------------------------------------------------------------------------------------------
Nested Loop (cost=170.21..3563.24 rows=33 width=54) (actual time=1.295..24.890 rows=28 loops=1)
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> Function Scan on generate_series g (cost=0.00..0.20 rows=20 width=4) (actual time=0.007..0.014 rows=20 loops=1)
-> Bitmap Heap Scan on foo (cost=170.21..178.13 rows=2 width=50) (actual time=1.238..1.240 rows=1 loops=20)
Recheck Cond: ((i = g.i) AND (x && '[2019-08-09,2019-08-11)'::daterange))
Heap Blocks: exact=28
Buffers: shared hit=543 read=8
I/O Timings: read=0.040
-> BitmapAnd (cost=170.21..170.21 rows=2 width=0) (actual time=1.234..1.234 rows=0 loops=20)
Buffers: shared hit=515 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_i_idx (cost=0.00..6.92 rows=333 width=0) (actual time=0.031..0.031 rows=327 loops=20)
Index Cond: (i = g.i)
Buffers: shared hit=55 read=8
I/O Timings: read=0.040
-> Bitmap Index Scan on foo_x_idx (cost=0.00..161.78 rows=5000 width=0) (actual time=1.183..1.183 rows=3670 loops=20)
Index Cond: (x && '[2019-08-09,2019-08-11)'::daterange)
Buffers: shared hit=460
Note that the fast bitmap index scan is parameterized to the other side of the nested loop, so has to be recomputed. While the slow one is parameterized to a constant, so it could in principle just be reused.
What kind of infrastructure would be needed to detect this case and reuse that bitmap?
Cheers,
Jeff
Attachment
Jeff Janes <jeff.janes@gmail.com> writes: > For some queries PostgreSQL can spend most of its time creating the exact > same bitmap over and over. For example, in the below case: (also attached > as a file because line-wrapping is going to make a mess of it) Uh .... it's not the "exact same bitmap each time", because the selected rows vary depending on the value of g.i. If the output of the subplan was indeed constant, I'd expect the planner to stick a Materialize node atop it. That would almost certainly win more than re-using the index output to scan the heap additional times. regards, tom lane
On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > Jeff Janes <jeff.janes@gmail.com> writes: > > For some queries PostgreSQL can spend most of its time creating the exact > > same bitmap over and over. For example, in the below case: (also attached > > as a file because line-wrapping is going to make a mess of it) > > Uh .... it's not the "exact same bitmap each time", because the selected > rows vary depending on the value of g.i. I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not the combined ANDed bitmap from both indexes. I didn't look in detail, but I'd think it would just be a matter of caching the bitmap then in ExecReScanBitmapIndexScan(), if the PlanState's chgParam indicate a parameter has changed, then throw away the cache. Then if the cache is still valid in MultiExecBitmapIndexScan(), return it. Not too sure about the memory context part for the cache. As I said, I didn't look in detail. David
David Rowley <dgrowleyml@gmail.com> writes: > On Wed, 21 Jul 2021 at 11:25, Tom Lane <tgl@sss.pgh.pa.us> wrote: >> Uh .... it's not the "exact same bitmap each time", because the selected >> rows vary depending on the value of g.i. > I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not > the combined ANDed bitmap from both indexes. To re-use that, you'd need a way to prevent the upper node from destructively modifying the tidbitmap. regards, tom lane
On Wed, 21 Jul 2021 at 13:32, Tom Lane <tgl@sss.pgh.pa.us> wrote: > > David Rowley <dgrowleyml@gmail.com> writes: > > I imagined Jeff was meaning the bitmap from the scan of foo_x_idx, not > > the combined ANDed bitmap from both indexes. > > To re-use that, you'd need a way to prevent the upper node from > destructively modifying the tidbitmap. Yeah. And that would slow things down in the case where it was just executed once as we'd need to make a copy of it to prevent the cached version from being modified regardless if it would ever be used again or not. Maybe the planner would need to be involved in making the decision of if the bitmap index scan should tuck away a carbon copy of the resulting TIDBitmap after the first scan. That way on rescan we could just make a copy of the cached version and return that. That saves having to modify the callers to tell them not to damage the returned TIDBitmap. David
On Thu, 22 Jul 2021 at 01:54, David Rowley <dgrowleyml@gmail.com> wrote: > Maybe the planner would need to be involved in making the decision of > if the bitmap index scan should tuck away a carbon copy of the > resulting TIDBitmap after the first scan. That way on rescan we could > just make a copy of the cached version and return that. That saves > having to modify the callers to tell them not to damage the returned > TIDBitmap. Oh but, meh. Caching could blow out work_mem... We might end up using work_mem * 2. David