Re: Reduce pinning in btree indexes - Mailing list pgsql-hackers

From Kyotaro HORIGUCHI
Subject Re: Reduce pinning in btree indexes
Date
Msg-id 20150227.143047.132686426.horiguchi.kyotaro@lab.ntt.co.jp
Whole thread Raw
In response to Re: Reduce pinning in btree indexes  (Kevin Grittner <kgrittn@ymail.com>)
Responses Re: Reduce pinning in btree indexes  (Andrew Gierth <andrew@tao11.riddles.org.uk>)
Re: Reduce pinning in btree indexes  (Kevin Grittner <kgrittn@ymail.com>)
List pgsql-hackers
Hello, I measured the performance of this patch considering
markpos/restorepos. The loss seems to be up to about 10%
unfortunately.

At Thu, 26 Feb 2015 17:49:23 +0000 (UTC), Kevin Grittner <kgrittn@ymail.com> wrote in
<440831854.629116.1424972963735.JavaMail.yahoo@mail.yahoo.com>
> Heikki Linnakangas <hlinnakangas@vmware.com> wrote:> On 02/15/2015 02:19 AM, Kevin Grittner wrote:
> > Hmm. Did your test case actually exercise mark/restore? The memcpy()s 
> > are not that small. Especially if it's an index-only scan, you're 
> > copying a large portion of the page. Some scans call markpos on every 
> > tuple, so that could add up.
> 
> 
> I have results from the `make world` regression tests and a 48-hour
> customer test.  Unfortunately I don't know how heavily either of those
> exercise this code.  Do you have a suggestion for a way to test whether
> there is a serious regression for something approaching a "worst case"?

ammarkpos/amrestrpos are called in merge joins. By the steps
shown below, I had 1M times of markpos and no restorepos for 1M
result rows, and had 500k times of markpos and the same number of
times of restorepos for 2M rows result by a bit different
configuration. I suppose we can say that they are the worst case
considering maskpos/restrpos. The call counts can be taken using
the attached patch.

Both binaries ware compiled with -O2. shared_buffers=1GB and all
shared pages used in the query were hit when measuring.

The numbers were taken 12 times for each cofiguration and took
averages and stddevs of 10 numbers excluding best and worst.

Case 1. 500k markpos/restorepos for 2M result rows.

Index scan: The patched loses about 2%. (1.98%) master:  6166 ms (std dev = 3.2 ms) Patched: 6288 ms (std dev = 3.7
ms)

IndesOnly scan: The patches loses about 2%. (2.14%) master:  5946 ms (std dev = 5.0 ms) Patched: 6073 ms (std dev = 3.3
ms)

The patched version is slower by about 2%. Of course all of it is
not the effect of memcpy but I don't know the distribution.


Case 2. 1M markpos, no restorepos for 1M result rows.

IndesOnly scan: The patches loses about 10%. master:  3655 ms (std dev = 2.5 ms) Patched: 4038 ms (std dev = 2.6 ms)

The loss is about 10%. The case looks a bit impractical but
unfortunately the number might be unignorable. The distribution
of the loss is unknown, too.


regards,

===========
CREATE TABLE t1 (a int);
CREATE TABLE t2 (a int);
delete from t1;
delete from t2;
-- This causes 1M times of markpos and no restorepos
INSERT INTO t1 (SELECT a FROM generate_series(0, 999999) a);
INSERT INTO t2 (SELECT a FROM generate_series(0, 999999) a);
-- This causes 500k times of markpos and the same number of restorepos
-- INSERT INTO t1 (SELECT a/2 FROM generate_series(0, 999999) a);
-- INSERT INTO t2 (SELECT a/2 FROM generate_series(0, 999999) a);
CREATE INDEX it1 ON t1 (a);
CREATE INDEX it2 ON t2 (a);
ANALYZE t1;
ANALYZE t1;
SET enable_seqscan to false;
SET enable_material to false;
SET enable_hashjoin to false;
SET enable_nestloop to false;
SET enable_indexonlyscan to false; -- omit this to do indexonly scan

EXPLAIN (ANALYZE) SELECT t1.a, t2.a FROM t1 JOIN t2 on (t1.a = t2.a);

                                     QUERY PLAN         
--------------------------------------------------------------------------Merge Join  (cost=2.83..322381.82
rows=3031231width=8) (actual time=0.013..5193.566 rows=2000000 loops=1)  Merge Cond: (t1.a = t2.a)  ->  Index Scan
usingit1 on t1  (cost=0.43..65681.87 rows=1666667 width=4) (actual time=0.004..727.557 rows=1000000
 
oops=1)  ->  Index Scan using it2 on t2  (cost=0.43..214642.89 rows=3031231 width=4) (actual time=0.004..1478.361
rows=19999loops=1)Planningtime: 25.585 msExecution time: 6299.521 ms
 
(6 rows)

-- 
Kyotaro Horiguchi
NTT Open Source Software Center
diff --git a/src/backend/access/nbtree/nbtree.c b/src/backend/access/nbtree/nbtree.c
index 3af462b..04d6cec 100644
--- a/src/backend/access/nbtree/nbtree.c
+++ b/src/backend/access/nbtree/nbtree.c
@@ -543,15 +543,19 @@ btendscan(PG_FUNCTION_ARGS)    PG_RETURN_VOID();}
+/* *    btmarkpos() -- save current scan position */
+int hoge_markpos_count = 0;
+int hoge_restrpos_count = 0;Datumbtmarkpos(PG_FUNCTION_ARGS){    IndexScanDesc scan = (IndexScanDesc)
PG_GETARG_POINTER(0);   BTScanOpaque so = (BTScanOpaque) scan->opaque;
 
+    hoge_markpos_count++;    /* we aren't holding any read locks, but gotta drop the pin */    if
(BTScanPosIsValid(so->markPos))   {
 
@@ -585,7 +589,7 @@ btrestrpos(PG_FUNCTION_ARGS){    IndexScanDesc scan = (IndexScanDesc) PG_GETARG_POINTER(0);
BTScanOpaqueso = (BTScanOpaque) scan->opaque;
 
-
+    hoge_restrpos_count++;    /* Restore the marked positions of any array keys */    if (so->numArrayKeys)
_bt_restore_array_keys(scan);
diff --git a/src/backend/executor/execMain.c b/src/backend/executor/execMain.c
index 33b172b..e7c1b99 100644
--- a/src/backend/executor/execMain.c
+++ b/src/backend/executor/execMain.c
@@ -258,14 +258,19 @@ standard_ExecutorStart(QueryDesc *queryDesc, int eflags) * *
----------------------------------------------------------------*/
 
+extern int hoge_markpos_count;
+extern int hoge_restrpos_count;voidExecutorRun(QueryDesc *queryDesc,            ScanDirection direction, long count){
+    hoge_markpos_count = 0;
+    hoge_restrpos_count = 0;    if (ExecutorRun_hook)        (*ExecutorRun_hook) (queryDesc, direction, count);
else       standard_ExecutorRun(queryDesc, direction, count);
 
+    elog(LOG, "markpos=%d, restrpos=%d", hoge_markpos_count, hoge_restrpos_count);}void

pgsql-hackers by date:

Previous
From: Andrew Gierth
Date:
Subject: Re: Renaming MemoryContextResetAndDeleteChildren to MemoryContextReset
Next
From: Andrew Gierth
Date:
Subject: Re: Reduce pinning in btree indexes