Thread: initscan for MVCC snapshot

initscan for MVCC snapshot

From

Andy Fan

Date:

07 December 2020, 12:26:53

Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case, the

impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call

multi times for non-MVCC snapshot, IIUC, does the following change reasonable?

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
ParallelBlockTableScanDesc bpscan = NULL;
bool allow_strat;
bool allow_sync;
+ bool is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

/*
* Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
scan->rs_nblocks = bpscan->phs_nblocks;
}
else
- scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+ if (scan->rs_nblocks == -1 || !is_mvcc)
+ scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

/*
* If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;

+ scan->rs_nblocks = -1;
initscan(scan, key, false);

Best Regards

Andy Fan

Re: initscan for MVCC snapshot

From

Andy Fan

Date:

10 December 2020, 11:31:15

On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case, the
impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change reasonable?

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
ParallelBlockTableScanDesc bpscan = NULL;
bool allow_strat;
bool allow_sync;
+ bool is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

/*
* Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
scan->rs_nblocks = bpscan->phs_nblocks;
}
else
- scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+ if (scan->rs_nblocks == -1 || !is_mvcc)
+ scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

/*
* If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;

+ scan->rs_nblocks = -1;
initscan(scan, key, false);

--
Best Regards
Andy Fan

I have tested this with an ext4 file system, and I can get a 7%+ performance improvement

for the given test case. Here are the steps:

create table t(a int, b char(8000));

insert into t select i, i from generate_series(1, 1000000)i;

create index on t(a);

delete from t where a <= 10000;

vacuum t;

alter system set enable_indexscan to off;

select pg_reload_conf();

cat 1.sql

select * from generate_series(1, 10000)i, t where i = t.a;

bin/pgbench -f 1.sql postgres -T 300 -c 10

Without this patch: latency average = 61.806 ms

with this patch: latency average = 57.484 ms

I think the result is good and I think we can probably make this change. However, I'm not

sure about it.

Best Regards

Andy Fan

Re: initscan for MVCC snapshot

From

Andy Fan

Date:

10 December 2020, 11:58:46

On Thu, Dec 10, 2020 at 7:31 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:

On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
doesn't deserve a talk.. ), so in a nest loop + Bitmap heap scan case, the
impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot. However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned. To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change reasonable?

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
ParallelBlockTableScanDesc bpscan = NULL;
bool allow_strat;
bool allow_sync;
+ bool is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

/*
* Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
scan->rs_nblocks = bpscan->phs_nblocks;
}
else
- scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+ if (scan->rs_nblocks == -1 || !is_mvcc)
+ scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

/*
* If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
else
scan->rs_base.rs_key = NULL;

+ scan->rs_nblocks = -1;
initscan(scan, key, false);

--
Best Regards
Andy Fan

I have tested this with an ext4 file system, and I can get a 7%+ performance improvement
for the given test case. Here are the steps:

create table t(a int, b char(8000));
insert into t select i, i from generate_series(1, 1000000)i;
create index on t(a);
delete from t where a <= 10000;
vacuum t;
alter system set enable_indexscan to off;
select pg_reload_conf();

cat 1.sql
select * from generate_series(1, 10000)i, t where i = t.a;

bin/pgbench -f 1.sql postgres -T 300 -c 10

Without this patch: latency average = 61.806 ms
with this patch: latency average = 57.484 ms

I think the result is good and I think we can probably make this change. However, I'm not
sure about it.

The plan which was used is below, in the rescan of Bitmap Heap Scan, mdnblocks will

be called 10000 times in current implementation, Within my patch, it will be only called

once.

postgres=# explain (costs off) select * from generate_series(1, 10000)i, t where i = t.a;
QUERY PLAN
------------------------------------------
Nested Loop
-> Function Scan on generate_series i
-> Bitmap Heap Scan on t
Recheck Cond: (a = i.i)
-> Bitmap Index Scan on t_a_idx
Index Cond: (a = i.i)

(6 rows)

Best Regards

Andy Fan