Thread: initscan for MVCC snapshot

initscan for MVCC snapshot

From
Andy Fan
Date:
Hi:
 I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
 initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
 doesn't deserve a talk.. ),  so in a nest loop + Bitmap heap scan case, the 
impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot.  However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned.  To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change reasonable?  

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
        ParallelBlockTableScanDesc bpscan = NULL;
        bool            allow_strat;
        bool            allow_sync;
+       bool            is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

        /*
         * Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
                scan->rs_nblocks = bpscan->phs_nblocks;
        }
        else
-               scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+               if (scan->rs_nblocks == -1 || !is_mvcc)
+                       scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

        /*
         * If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
        else
                scan->rs_base.rs_key = NULL;

+       scan->rs_nblocks = -1;
        initscan(scan, key, false);

--
Best Regards
Andy Fan

Re: initscan for MVCC snapshot

From
Andy Fan
Date:


On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
 I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
 initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
 doesn't deserve a talk.. ),  so in a nest loop + Bitmap heap scan case, the 
impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot.  However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned.  To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change reasonable?  

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
        ParallelBlockTableScanDesc bpscan = NULL;
        bool            allow_strat;
        bool            allow_sync;
+       bool            is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

        /*
         * Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
                scan->rs_nblocks = bpscan->phs_nblocks;
        }
        else
-               scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+               if (scan->rs_nblocks == -1 || !is_mvcc)
+                       scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

        /*
         * If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
        else
                scan->rs_base.rs_key = NULL;

+       scan->rs_nblocks = -1;
        initscan(scan, key, false);

--
Best Regards
Andy Fan

I have tested this with an ext4 file system, and I can get a 7%+ performance improvement
for the given test case. Here are the steps:

create table t(a int, b char(8000));
insert into t select i, i from generate_series(1, 1000000)i;
create index on t(a);
delete from t where a <= 10000;
vacuum t;
alter system set enable_indexscan to off;
select pg_reload_conf();

cat 1.sql
select * from generate_series(1, 10000)i, t where i = t.a;

bin/pgbench -f 1.sql postgres -T 300 -c 10

Without this patch:  latency average = 61.806 ms 
with this patch:   latency average = 57.484 ms

I think the result is good and I think we can probably make this change. However, I'm not
sure about it. 



--
Best Regards
Andy Fan

Re: initscan for MVCC snapshot

From
Andy Fan
Date:


On Thu, Dec 10, 2020 at 7:31 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:


On Mon, Dec 7, 2020 at 8:26 PM Andy Fan <zhihui.fan1213@gmail.com> wrote:
Hi:
 I see initscan calls RelationGetwNumberOfBlocks every time and rescan calls
 initscan as well. In my system, RelationGetNumberOfBlocks is expensive (the reason
 doesn't deserve a talk.. ),  so in a nest loop + Bitmap heap scan case, the 
impact will be huge. The comments of initscan are below.

/*
* Determine the number of blocks we have to scan.
*
* It is sufficient to do this once at scan start, since any tuples added
* while the scan is in progress will be invisible to my snapshot anyway.
* (That is not true when using a non-MVCC snapshot.  However, we couldn't
* guarantee to return tuples added after scan start anyway, since they
* might go into pages we already scanned.  To guarantee consistent
* results for a non-MVCC snapshot, the caller must hold some higher-level
* lock that ensures the interesting tuple(s) won't change.)
*/

I still do not fully understand the comments. Looks we only need to call
multi times for non-MVCC snapshot, IIUC, does the following change reasonable?  

===

diff --git a/src/backend/access/heap/heapam.c b/src/backend/access/heap/heapam.c
index 1b2f70499e..8238eabd8b 100644
--- a/src/backend/access/heap/heapam.c
+++ b/src/backend/access/heap/heapam.c
@@ -211,6 +211,7 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
        ParallelBlockTableScanDesc bpscan = NULL;
        bool            allow_strat;
        bool            allow_sync;
+       bool            is_mvcc = scan->rs_base.rs_snapshot && IsMVCCSnapshot(scan->rs_base.rs_snapshot);

        /*
         * Determine the number of blocks we have to scan.
@@ -229,7 +230,8 @@ initscan(HeapScanDesc scan, ScanKey key, bool keep_startblock)
                scan->rs_nblocks = bpscan->phs_nblocks;
        }
        else
-               scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);
+               if (scan->rs_nblocks == -1 || !is_mvcc)
+                       scan->rs_nblocks = RelationGetNumberOfBlocks(scan->rs_base.rs_rd);

        /*
         * If the table is large relative to NBuffers, use a bulk-read access
@@ -1210,6 +1212,7 @@ heap_beginscan(Relation relation, Snapshot snapshot,
        else
                scan->rs_base.rs_key = NULL;

+       scan->rs_nblocks = -1;
        initscan(scan, key, false);

--
Best Regards
Andy Fan

I have tested this with an ext4 file system, and I can get a 7%+ performance improvement
for the given test case. Here are the steps:

create table t(a int, b char(8000));
insert into t select i, i from generate_series(1, 1000000)i;
create index on t(a);
delete from t where a <= 10000;
vacuum t;
alter system set enable_indexscan to off;
select pg_reload_conf();

cat 1.sql
select * from generate_series(1, 10000)i, t where i = t.a;

bin/pgbench -f 1.sql postgres -T 300 -c 10

Without this patch:  latency average = 61.806 ms 
with this patch:   latency average = 57.484 ms

I think the result is good and I think we can probably make this change. However, I'm not
sure about it. 


The plan which was used is below, in the rescan of Bitmap Heap Scan,  mdnblocks will
be called 10000 times in current implementation,  Within my patch,  it will be only called
once.

postgres=# explain (costs off) select * from generate_series(1, 10000)i, t where i = t.a;
                QUERY PLAN
------------------------------------------
 Nested Loop
   ->  Function Scan on generate_series i
   ->  Bitmap Heap Scan on t
         Recheck Cond: (a = i.i)
         ->  Bitmap Index Scan on t_a_idx
               Index Cond: (a = i.i)
(6 rows)
 


--
Best Regards
Andy Fan