Re: FunctionCallN improvement. - Mailing list pgsql-hackers

From a_ogawa
Subject Re: FunctionCallN improvement.
Date
Msg-id PIEMIKOOMKNIJLLLBCBBEEJCCEAA.a_ogawa@hi-ho.ne.jp
Whole thread Raw
In response to Re: FunctionCallN improvement.  (Tom Lane <tgl@sss.pgh.pa.us>)
Responses Re: FunctionCallN improvement.  (Tom Lane <tgl@sss.pgh.pa.us>)
List pgsql-hackers
Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > I agree; I think the macro is a nice improvement to readability.
> 
> But a dead loss for performance, since it does a MemSet *and* some other
> operations.  What's worse, it changes a word-aligned MemSet into a
> non-aligned one, knocking out all the optimizations therein.

Thanks for your advice.
I change MemSet to for-loop in this macro. 

I think FunctionCallInfoData is large to initialize it by using MemSet.
MemSet is very fast in most cases. However, when it only has to 
initialize a part of large structure, it might be faster to initialize 
the few members directly. 

I made the test program to measure the effect of this macro. 
The test program was:
---------------------------------------------------------------------------
#include "postgres.h"
#include "fmgr.h"
#include <stdio.h>

/** Initialize minimum fields of FunctionCallInfoData that must be* initialized.*/
#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)              \   do {
                 \       int     i_;                                                  \       (Fcinfo)->flinfo =
Flinfo;                                  \       (Fcinfo)->context = NULL;                                    \
(Fcinfo)->resultinfo= NULL;                                 \       (Fcinfo)->isnull = false;
        \       (Fcinfo)->nargs = Nargs;                                     \       for(i_ = 0; i_ < Nargs; i_++)
(Fcinfo)->argnull[i_]= false; \   } while(0)
 

/** dummyFunc is to control excessive optimization.* When this function is not called from loop, the initialization of*
FunctionCallInfoDatamight move outside of the loop by gcc.*/
 
void dummyFunc(FunctionCallInfoData *fcinfo, int cnt)
{   fcinfo->arg[0] = Int32GetDatum(cnt);
}

void TestMemSet(int cnt, int nargs)
{   FunctionCallInfoData fcinfo;
   printf("test MemSet: %d\n", cnt);
   for(; cnt; cnt--) {       MemSet(&fcinfo, 0, sizeof(fcinfo));       dummyFunc(&fcinfo, cnt);   }
}

void TestMacro(int cnt, int nargs)
{   FunctionCallInfoData fcinfo;
   printf("test Macro: %d\n", cnt);
   for(; cnt; cnt--) {       InitFunctionCallInfoData(&fcinfo, NULL, nargs);       dummyFunc(&fcinfo, cnt);   }
}

int main(int argc, char **argv)
{   int     test_cnt;   int     nargs;
   if(argc != 4) {       printf("usage: fmgrtest -memset|-macro test_cnt nargs\n");       return 1;   }   test_cnt =
atoi(argv[2]);  nargs = atoi(argv[3]);
 
   if(strcmp(argv[1], "-memset") == 0) TestMemSet(test_cnt, nargs);   if(strcmp(argv[1], "-macro") == 0)
TestMacro(test_cnt,nargs);
 
   return 0;
}
---------------------------------------------------------------------------

It was compiled like so:  gcc -O2 -o test_fmgr -I ${PGSRC}/src/include/ test_fmgr.c

Executed the test of MemSet:  time ./test_fmgr -memset 10000000 9

Executed the test of Macro that uses for loop:  time ./test_fmgr -macro  10000000 9

Results:
(1)linux Kernel 2.4.9 (Pentium III 800MHz, gcc-3.4.1)MemSet         real 0m1.486s, user 0m1.480s, sys
0m0.000sMacro(nargs=9)real 0m0.606s, user 0m0.600s, sys 0m0.000sMacro(nargs=3) real 0m0.375s, user 0m0.370s, sys
0m0.000sMacro(nargs=2)real 0m0.298s, user 0m0.290s, sys 0m0.000s (*)In the test of MemSet, nargs is not related.
 

(2)Solaris8 (Ultra SPARC III 750MHz, gcc-2.95.3)MemSet         real 2.0s, user 2.0s, sys 0.0sMacro(nargs=9) real 0.7s,
user0.7s, sys 0.0sMacro(nargs=3) real 0.3s, user 0.3s, sys 0.0sMacro(nargs=2) real 0.2s, user 0.2s, sys 0.0s
 

The effect of this macro can be seen in the application that outputs
a lot of data such as psql and pg_dump. These applications enlarge
the load of FunctionCall3. 

This is a result of pg_dump. Environment: linux Kernel 2.4.9, Pentium III 800MHz,              PostgreSQL 8.0.1,
gcc-3.4.1,compile option: -O2,             My database have about 400,000 tuples.Results(time pg_dump > dump.sql):
Originalcode:               real 0m5.369s, user 0m0.600s, sys 0m0.120s Using this macro in fmgr.c:  real 0m5.061s, user
0m0.550s,sys 0m0.120s
 

I think this macro is improvement to readability and performance.

regards,

---
A.Ogawa ( a_ogawa@hi-ho.ne.jp )



pgsql-hackers by date:

Previous
From: "Mark Cave-Ayland"
Date:
Subject: Re: 7.3.8 under FC3 takes excessive semaphores?
Next
From: Alvaro Herrera
Date:
Subject: Re: [NOVICE] Last ID Problem