Thread: FunctionCallN improvement.

FunctionCallN improvement.

From

a_ogawa

Date:

31 January 2005, 14:39:42

When SQL that returns many tuples with character code conversion
is executed, the FunctionCall3/FunctionCall5 becomes a bottleneck.
Because MemSet is used to initialize FunctionCallInfoData in these
functions, a lot of cycles are spent. 

<test query>
set client_encoding to 'SJIS';
select * from pg_class, pg_amop;
(This SQL is used only to get a lot of tuples, and there is no 
logical meaning) 

<result of profile>
Each sample counts as 0.01 seconds. %   cumulative   self              self     totaltime   seconds   seconds    calls
s/call   s/call  name22.91      1.29     1.29  1562351     0.00     0.00  FunctionCall518.29      2.32     1.03
1602006    0.00     0.00  FunctionCall3 5.06      2.60     0.28  4892127     0.00     0.00  AllocSetAlloc 4.88
2.88    0.28  9781322     0.00     0.00  AllocSetFreeIndex 4.35      3.12     0.24  1587600     0.00     0.00
ExecEvalVar

Most of calls of these functions are from printtup. 
FunctionCall3 is used to generate the text. 
FunctionCall5 is used to character code conversion.
(printtup -> pq_sendcountedtext -> pg_server_to_client ->perform_default_encoding_conversion -> FunctionCall5)

I think that we should initialize only the fields of 
FunctionCallInfoData that must be initialized. 
(Such as FunctionCall1)

I have two plans to modify the code. 
(a)Change FunctionCall3/FunctionCall5 like FunctionCall1. It is simple, minimum change.

(b)Define the macro that initialize FunctionCallInfoData, and use it 
instead of MemSet in all FunctionCallN, DirectFunctionCallN, 
OidFunctionCallN.This macro is the following. 

#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)     \   do {
\      (Fcinfo)->flinfo = Flinfo;                          \       (Fcinfo)->context = NULL;
\      (Fcinfo)->resultinfo = NULL;                        \       (Fcinfo)->isnull = false;
\      (Fcinfo)->nargs = Nargs;                            \       MemSet((Fcinfo)->argnull, 0, Nargs * sizeof(bool));
\  } while(0)
 

I think that plan(b) is better, because source code consistency 
and efficiency improve.

Any comments?

regards, 

---
A.Ogawa ( a_ogawa@hi-ho.ne.jp )

Re: FunctionCallN improvement.

From

Neil Conway

Date:

31 January 2005, 23:00:50

On Mon, 2005-01-31 at 23:38 +0900, a_ogawa wrote:
> (b)Define the macro that initialize FunctionCallInfoData, and use it 
> instead of MemSet in all FunctionCallN, DirectFunctionCallN, 
> OidFunctionCallN.
>  This macro is the following. 
> 
> #define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)     \
>     do {                                                    \
>         (Fcinfo)->flinfo = Flinfo;                          \
>         (Fcinfo)->context = NULL;                           \
>         (Fcinfo)->resultinfo = NULL;                        \
>         (Fcinfo)->isnull = false;                           \
>         (Fcinfo)->nargs = Nargs;                            \
>         MemSet((Fcinfo)->argnull, 0, Nargs * sizeof(bool)); \
>     } while(0)
> 
> I think that plan(b) is better, because source code consistency 
> and efficiency improve.

I agree; I think the macro is a nice improvement to readability. It
would be good to see some benchmarks once the patch is written to verify
that this really does improve performance, but I think it's a good idea.

-Neil

Re: FunctionCallN improvement.

From

Tom Lane

Date:

01 February 2005, 01:09:11

Neil Conway <neilc@samurai.com> writes:
> On Mon, 2005-01-31 at 23:38 +0900, a_ogawa wrote:
>> (b)Define the macro that initialize FunctionCallInfoData, and use it 
>> instead of MemSet in all FunctionCallN, DirectFunctionCallN, 
>> OidFunctionCallN.
>> This macro is the following. 
>> 
>> #define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)     \
>> do {                                                    \
>> (Fcinfo)->flinfo = Flinfo;                          \
>> (Fcinfo)->context = NULL;                           \
>> (Fcinfo)->resultinfo = NULL;                        \
>> (Fcinfo)->isnull = false;                           \
>> (Fcinfo)->nargs = Nargs;                            \
>> MemSet((Fcinfo)->argnull, 0, Nargs * sizeof(bool)); \
>> } while(0)
>> 
>> I think that plan(b) is better, because source code consistency 
>> and efficiency improve.

> I agree; I think the macro is a nice improvement to readability.

But a dead loss for performance, since it does a MemSet *and* some other
operations.  What's worse, it changes a word-aligned MemSet into a
non-aligned one, knocking out all the optimizations therein.
        regards, tom lane

Re: FunctionCallN improvement.

From

a_ogawa

Date:

01 February 2005, 13:06:56

Tom Lane wrote:
> Neil Conway <neilc@samurai.com> writes:
> > I agree; I think the macro is a nice improvement to readability.
> 
> But a dead loss for performance, since it does a MemSet *and* some other
> operations.  What's worse, it changes a word-aligned MemSet into a
> non-aligned one, knocking out all the optimizations therein.

Thanks for your advice.
I change MemSet to for-loop in this macro. 

I think FunctionCallInfoData is large to initialize it by using MemSet.
MemSet is very fast in most cases. However, when it only has to 
initialize a part of large structure, it might be faster to initialize 
the few members directly. 

I made the test program to measure the effect of this macro. 
The test program was:
---------------------------------------------------------------------------
#include "postgres.h"
#include "fmgr.h"
#include <stdio.h>

/** Initialize minimum fields of FunctionCallInfoData that must be* initialized.*/
#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)              \   do {
                 \       int     i_;                                                  \       (Fcinfo)->flinfo =
Flinfo;                                  \       (Fcinfo)->context = NULL;                                    \
(Fcinfo)->resultinfo= NULL;                                 \       (Fcinfo)->isnull = false;
        \       (Fcinfo)->nargs = Nargs;                                     \       for(i_ = 0; i_ < Nargs; i_++)
(Fcinfo)->argnull[i_]= false; \   } while(0)
 

/** dummyFunc is to control excessive optimization.* When this function is not called from loop, the initialization of*
FunctionCallInfoDatamight move outside of the loop by gcc.*/
 
void dummyFunc(FunctionCallInfoData *fcinfo, int cnt)
{   fcinfo->arg[0] = Int32GetDatum(cnt);
}

void TestMemSet(int cnt, int nargs)
{   FunctionCallInfoData fcinfo;
   printf("test MemSet: %d\n", cnt);
   for(; cnt; cnt--) {       MemSet(&fcinfo, 0, sizeof(fcinfo));       dummyFunc(&fcinfo, cnt);   }
}

void TestMacro(int cnt, int nargs)
{   FunctionCallInfoData fcinfo;
   printf("test Macro: %d\n", cnt);
   for(; cnt; cnt--) {       InitFunctionCallInfoData(&fcinfo, NULL, nargs);       dummyFunc(&fcinfo, cnt);   }
}

int main(int argc, char **argv)
{   int     test_cnt;   int     nargs;
   if(argc != 4) {       printf("usage: fmgrtest -memset|-macro test_cnt nargs\n");       return 1;   }   test_cnt =
atoi(argv[2]);  nargs = atoi(argv[3]);
 
   if(strcmp(argv[1], "-memset") == 0) TestMemSet(test_cnt, nargs);   if(strcmp(argv[1], "-macro") == 0)
TestMacro(test_cnt,nargs);
 
   return 0;
}
---------------------------------------------------------------------------

It was compiled like so:  gcc -O2 -o test_fmgr -I ${PGSRC}/src/include/ test_fmgr.c

Executed the test of MemSet:  time ./test_fmgr -memset 10000000 9

Executed the test of Macro that uses for loop:  time ./test_fmgr -macro  10000000 9

Results:
(1)linux Kernel 2.4.9 (Pentium III 800MHz, gcc-3.4.1)MemSet         real 0m1.486s, user 0m1.480s, sys
0m0.000sMacro(nargs=9)real 0m0.606s, user 0m0.600s, sys 0m0.000sMacro(nargs=3) real 0m0.375s, user 0m0.370s, sys
0m0.000sMacro(nargs=2)real 0m0.298s, user 0m0.290s, sys 0m0.000s (*)In the test of MemSet, nargs is not related.
 

(2)Solaris8 (Ultra SPARC III 750MHz, gcc-2.95.3)MemSet         real 2.0s, user 2.0s, sys 0.0sMacro(nargs=9) real 0.7s,
user0.7s, sys 0.0sMacro(nargs=3) real 0.3s, user 0.3s, sys 0.0sMacro(nargs=2) real 0.2s, user 0.2s, sys 0.0s
 

The effect of this macro can be seen in the application that outputs
a lot of data such as psql and pg_dump. These applications enlarge
the load of FunctionCall3. 

This is a result of pg_dump. Environment: linux Kernel 2.4.9, Pentium III 800MHz,              PostgreSQL 8.0.1,
gcc-3.4.1,compile option: -O2,             My database have about 400,000 tuples.Results(time pg_dump > dump.sql):
Originalcode:               real 0m5.369s, user 0m0.600s, sys 0m0.120s Using this macro in fmgr.c:  real 0m5.061s, user
0m0.550s,sys 0m0.120s
 

I think this macro is improvement to readability and performance.

regards,

---
A.Ogawa ( a_ogawa@hi-ho.ne.jp )

Re: FunctionCallN improvement.

From

Tom Lane

Date:

01 February 2005, 21:24:16

a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> I made the test program to measure the effect of this macro.

Well, if we're going to be tense about this, let's actually be tense
about it.  Your test program isn't a great model for what's going to
happen in fmgr.c, because you've designed it so that Nargs cannot be
known at compile time.  In the fmgr routines, Nargs is certainly a
compile-time constant, and so implementations that can exploit that
will have an advantage.

Also, we can take advantage of some improvements in the MemSet macro
family that occurred since fmgr.c was last rewritten.  I see no reason
not to use MemSetLoop directly, since the fcinfo struct will have the
correct size and correct alignment.

In addition to your original macro, I tried two other variants: one
that uses MemSetLoop with a loop length rounded to the next higher
multiple of 4, and one that expects the argisnull settings to be written
out directly, in the same style as is currently done in FunctionCall1
and FunctionCall2.  (This amounts to unrolling the loop in the original
macro; something that could be done by the compiler given a constant
Nargs, but it seems not to be done by the compilers I tested.)

I tested two cases: NARGS = 2, which is certainly the single most
critical case, and NARGS = 5, which is probably the largest number
of arguments that we really care too much about.  (You have to hand-edit
the test program and recompile to adjust NARGS, since the point is to
treat it as a compile-time constant.)

Here are wall-clock timings on the architectures and compilers I have at
hand:

NARGS = 2
        MemSetLoop    OrigMacro    SetMacro    Unrolled

i386, gcc -O2    37.655s        6.411s        7.060s        6.362s

i386, gcc -O6    35.420s        1.129s        1.814s        0.567s

PPC, gcc -O2    54.033s        6.754s        11.138s        6.438s

HPPA, gcc -O2    58.82s        10.38s        9.79s        7.85s

HPPA, cc +O2    60.39s        13.43s        8.40s        7.31s

NARGS = 5
        MemSetLoop    OrigMacro    SetMacro    Unrolled

i386, gcc -O2    37.566s        11.329s        7.688s        8.874s

i386, gcc -O6    32.992s        5.928s        2.881s        0.566s

PPC, gcc -O2    86.300s        19.048s        14.626s        8.751s

HPPA, gcc -O2    58.28s        15.09s        13.42s        14.37s

HPPA, cc +O2    58.23s        8.96s        12.88s        7.28s

(I used different loop counts on the different machines to get similar
overall times for the memset case; so it's OK to compare numbers across
a row but not down a column.)

Based on this I think we ought to go with the "unrolled" approach, ie,
we'll create a macro to initialize the fixed fields of fcinfo but fill
in the arg and argisnull arrays with code like what's already in
FunctionCall2:

    fcinfo.arg[0] = arg1;
    fcinfo.arg[1] = arg2;
    fcinfo.argnull[0] = false;
    fcinfo.argnull[1] = false;

If anyone would like to try the results on other platforms, my test
program is attached.

            regards, tom lane

#include "postgres.h"
#include "fmgr.h"

#define NARGS 2                    /* Unrolled code can handle up to 10 */

/*
 * Initialize minimum fields of FunctionCallInfoData that must be
 * initialized.
 */
#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)              \
    do {                                                             \
        int     i_;                                                  \
        (Fcinfo)->flinfo = Flinfo;                                   \
        (Fcinfo)->context = NULL;                                    \
        (Fcinfo)->resultinfo = NULL;                                 \
        (Fcinfo)->isnull = false;                                    \
        (Fcinfo)->nargs = Nargs;                                     \
        for(i_ = 0; i_ < Nargs; i_++) (Fcinfo)->argnull[i_] = false; \
    } while(0)

/*
 * dummyFunc is to control excessive optimization.
 * When this function is not called from loop, the initialization of
 * FunctionCallInfoData might move outside of the loop by gcc.
 */
void dummyFunc(FunctionCallInfoData *fcinfo, int cnt)
{
    fcinfo->arg[0] = Int32GetDatum(cnt);
}

void TestMemSet(int cnt)
{
    FunctionCallInfoData fcinfo;

    printf("test MemSetLoop(%d): %d\n", NARGS, cnt);

    for(; cnt; cnt--) {
        MemSetLoop(&fcinfo, 0, sizeof(fcinfo));
        dummyFunc(&fcinfo, cnt);
    }
}

void TestOrigMacro(int cnt)
{
    FunctionCallInfoData fcinfo;

    printf("test OrigMacro(%d): %d\n", NARGS, cnt);

    for(; cnt; cnt--) {
        InitFunctionCallInfoData(&fcinfo, NULL, NARGS);
        dummyFunc(&fcinfo, cnt);
    }
}

#undef InitFunctionCallInfoData

#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)              \
    do {                                                             \
        (Fcinfo)->flinfo = Flinfo;                                   \
        (Fcinfo)->context = NULL;                                    \
        (Fcinfo)->resultinfo = NULL;                                 \
        (Fcinfo)->isnull = false;                                    \
        (Fcinfo)->nargs = Nargs;                                     \
        MemSetLoop((Fcinfo)->argnull, 0, \
                   sizeof(int32) * ((Nargs + sizeof(int32)-1) / sizeof(int32))); \
    } while(0)

void TestSetMacro(int cnt)
{
    FunctionCallInfoData fcinfo;

    printf("test SetMacro(%d): %d\n", NARGS, cnt);

    for(; cnt; cnt--) {
        InitFunctionCallInfoData(&fcinfo, NULL, NARGS);
        dummyFunc(&fcinfo, cnt);
    }
}

#undef InitFunctionCallInfoData

#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs)              \
    do {                                                             \
        (Fcinfo)->flinfo = Flinfo;                                   \
        (Fcinfo)->context = NULL;                                    \
        (Fcinfo)->resultinfo = NULL;                                 \
        (Fcinfo)->isnull = false;                                    \
        (Fcinfo)->nargs = Nargs;                                     \
    } while(0)

void TestUnrolled(int cnt)
{
    FunctionCallInfoData fcinfo;

    printf("test Unrolled(%d): %d\n", NARGS, cnt);

    for(; cnt; cnt--) {
        InitFunctionCallInfoData(&fcinfo, NULL, NARGS);
#if NARGS > 0
        fcinfo.argnull[0] = false;
#endif
#if NARGS > 1
        fcinfo.argnull[1] = false;
#endif
#if NARGS > 2
        fcinfo.argnull[2] = false;
#endif
#if NARGS > 3
        fcinfo.argnull[3] = false;
#endif
#if NARGS > 4
        fcinfo.argnull[4] = false;
#endif
#if NARGS > 5
        fcinfo.argnull[5] = false;
#endif
#if NARGS > 6
        fcinfo.argnull[6] = false;
#endif
#if NARGS > 7
        fcinfo.argnull[7] = false;
#endif
#if NARGS > 8
        fcinfo.argnull[8] = false;
#endif
#if NARGS > 9
        fcinfo.argnull[9] = false;
#endif
        dummyFunc(&fcinfo, cnt);
    }
}

int main(int argc, char **argv)
{
    int     test_cnt;

    if(argc != 3) {
        printf("usage: fmgrtest -memset|-origmacro|-setmacro|-unrolled test_cnt\n");
        return 1;
    }
    test_cnt = atoi(argv[2]);

    if(strcmp(argv[1], "-memset") == 0) TestMemSet(test_cnt);
    if(strcmp(argv[1], "-origmacro") == 0) TestOrigMacro(test_cnt);
    if(strcmp(argv[1], "-setmacro") == 0) TestSetMacro(test_cnt);
    if(strcmp(argv[1], "-unrolled") == 0) TestUnrolled(test_cnt);

    return 0;
}

Re: FunctionCallN improvement.

From

Darcy Buskermolen

Date:

01 February 2005, 22:39:41

On February 1, 2005 01:23 pm, Tom Lane wrote:
> a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> > I made the test program to measure the effect of this macro.
>
> Well, if we're going to be tense about this, let's actually be tense
> about it.  Your test program isn't a great model for what's going to
> happen in fmgr.c, because you've designed it so that Nargs cannot be
> known at compile time.  In the fmgr routines, Nargs is certainly a
> compile-time constant, and so implementations that can exploit that
> will have an advantage.
>
> Also, we can take advantage of some improvements in the MemSet macro
> family that occurred since fmgr.c was last rewritten.  I see no reason
> not to use MemSetLoop directly, since the fcinfo struct will have the
> correct size and correct alignment.
>
> In addition to your original macro, I tried two other variants: one
> that uses MemSetLoop with a loop length rounded to the next higher
> multiple of 4, and one that expects the argisnull settings to be written
> out directly, in the same style as is currently done in FunctionCall1
> and FunctionCall2.  (This amounts to unrolling the loop in the original
> macro; something that could be done by the compiler given a constant
> Nargs, but it seems not to be done by the compilers I tested.)
>
> I tested two cases: NARGS = 2, which is certainly the single most
> critical case, and NARGS = 5, which is probably the largest number
> of arguments that we really care too much about.  (You have to hand-edit
> the test program and recompile to adjust NARGS, since the point is to
> treat it as a compile-time constant.)
>
> Here are wall-clock timings on the architectures and compilers I have at
> hand:
>
> NARGS = 2
>         MemSetLoop    OrigMacro    SetMacro    Unrolled
>
> i386, gcc -O2    37.655s        6.411s        7.060s        6.362s
>
> i386, gcc -O6    35.420s        1.129s        1.814s        0.567s
>
> PPC, gcc -O2    54.033s        6.754s        11.138s        6.438s
>
> HPPA, gcc -O2    58.82s        10.38s        9.79s        7.85s
>
> HPPA, cc +O2    60.39s        13.43s        8.40s        7.31s
>
> NARGS = 5
>         MemSetLoop    OrigMacro    SetMacro    Unrolled
>
> i386, gcc -O2    37.566s        11.329s        7.688s        8.874s
>
> i386, gcc -O6    32.992s        5.928s        2.881s        0.566s
>
> PPC, gcc -O2    86.300s        19.048s        14.626s        8.751s
>
> HPPA, gcc -O2    58.28s        15.09s        13.42s        14.37s
>
> HPPA, cc +O2    58.23s        8.96s        12.88s        7.28s

I see simular comparitive times on an UltraSparc running Solaris.


>
> (I used different loop counts on the different machines to get similar
> overall times for the memset case; so it's OK to compare numbers across
> a row but not down a column.)
>
> Based on this I think we ought to go with the "unrolled" approach, ie,
> we'll create a macro to initialize the fixed fields of fcinfo but fill
> in the arg and argisnull arrays with code like what's already in
> FunctionCall2:
>
>     fcinfo.arg[0] = arg1;
>     fcinfo.arg[1] = arg2;
>     fcinfo.argnull[0] = false;
>     fcinfo.argnull[1] = false;
>
> If anyone would like to try the results on other platforms, my test
> program is attached.
>
>             regards, tom lane

-- 
Darcy Buskermolen
Wavefire Technologies Corp.
ph: 250.717.0200
fx:  250.763.1759
http://www.wavefire.com

Re: FunctionCallN improvement.

From

Mike Rylander

Date:

02 February 2005, 01:13:22

On Tue, 01 Feb 2005 16:23:56 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> > I made the test program to measure the effect of this macro.
> 
> Well, if we're going to be tense about this, let's actually be tense
> about it.  Your test program isn't a great model for what's going to
> happen in fmgr.c, because you've designed it so that Nargs cannot be
> known at compile time.  In the fmgr routines, Nargs is certainly a
> compile-time constant, and so implementations that can exploit that
> will have an advantage.
> 

<big snip>

Here are some numbers for AMD64 (gcc -O2 -I
/opt/include/postgresql/server/ pg_test.c -o pg_test):

miker@weezie miker $ time ./pg_test -memset 1000000000
test MemSetLoop(2): 1000000000

real    1m15.896s
user    1m15.881s
sys     0m0.006s
miker@weezie miker $ time ./pg_test -origmacro 1000000000
test OrigMacro(2): 1000000000

real    0m4.217s
user    0m4.215s
sys     0m0.001s
miker@weezie miker $ time ./pg_test -setmacro 1000000000
test SetMacro(2): 1000000000

real    0m4.217s
user    0m4.216s
sys     0m0.001s
miker@weezie miker $ time ./pg_test -unrolled 1000000000
test Unrolled(2): 1000000000

real    0m4.218s
user    0m4.215s
sys     0m0.002s


and now with -O6:

miker@weezie miker $ time ./pg_test -memset 1000000000
test MemSetLoop(2): 1000000000

real    1m13.624s
user    1m13.542s
sys     0m0.001s
miker@weezie miker $ time ./pg_test -origmacro 1000000000
test OrigMacro(2): 1000000000

real    0m2.929s
user    0m2.926s
sys     0m0.001s
miker@weezie miker $ time ./pg_test -setmacro 1000000000
test SetMacro(2): 1000000000

real    0m2.929s
user    0m2.926s
sys     0m0.000s
miker@weezie miker $ time ./pg_test -unrolled 1000000000
test Unrolled(2): 1000000000

real    0m2.510s
user    0m2.508s
sys     0m0.001s


Now with NARGS = 5, -O2:

miker@weezie miker $ time ./pg_test -memset 1000000000
test MemSetLoop(5): 1000000000

real    1m15.204s
user    1m15.175s
sys     0m0.002s
miker@weezie miker $ time ./pg_test -origmacro 1000000000
test OrigMacro(5): 1000000000

real    0m10.027s
user    0m10.022s
sys     0m0.001s
miker@weezie miker $ time ./pg_test -setmacro 1000000000
test SetMacro(5): 1000000000

real    0m4.177s
user    0m4.177s
sys     0m0.000s
miker@weezie miker $ time ./pg_test -unrolled 1000000000
test Unrolled(5): 1000000000

real    0m5.013s
user    0m5.011s
sys     0m0.000s

And once more, with -O6:

miker@weezie miker $ time ./pg_test -memset 1000000000
test MemSetLoop(5): 1000000000

real    1m47.090s
user    1m46.972s
sys     0m0.000s
miker@weezie miker $ time ./pg_test -origmacro 1000000000
test OrigMacro(5): 1000000000

real    0m8.367s
user    0m8.358s
sys     0m0.000s
miker@weezie miker $ time ./pg_test -setmacro 1000000000
test SetMacro(5): 1000000000

real    0m3.349s
user    0m3.345s
sys     0m0.000s
miker@weezie miker $ time ./pg_test -unrolled 1000000000
test Unrolled(5): 1000000000

real    0m3.347s
user    0m3.343s
sys     0m0.000s


Hope the numbers help!

-- 
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: FunctionCallN improvement.

From

Mike Rylander

Date:

02 February 2005, 02:15:22

Sorry, forgot the compiler version.

gcc (GCC) 3.3.4 20040623 (Gentoo Linux 3.3.4-r1, ssp-3.3.2-2, pie-8.7.6)

On Wed, 2 Feb 2005 01:12:04 +0000, Mike Rylander <mrylander@gmail.com> wrote:
> On Tue, 01 Feb 2005 16:23:56 -0500, Tom Lane <tgl@sss.pgh.pa.us> wrote:
> > a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> > > I made the test program to measure the effect of this macro.
> >
> > Well, if we're going to be tense about this, let's actually be tense
> > about it.  Your test program isn't a great model for what's going to
> > happen in fmgr.c, because you've designed it so that Nargs cannot be
> > known at compile time.  In the fmgr routines, Nargs is certainly a
> > compile-time constant, and so implementations that can exploit that
> > will have an advantage.
> >
> 
> <big snip>
> 
> Here are some numbers for AMD64 (gcc -O2 -I
> /opt/include/postgresql/server/ pg_test.c -o pg_test):
> 
> miker@weezie miker $ time ./pg_test -memset 1000000000
> test MemSetLoop(2): 1000000000
> 
> real    1m15.896s
> user    1m15.881s
> sys     0m0.006s
> miker@weezie miker $ time ./pg_test -origmacro 1000000000
> test OrigMacro(2): 1000000000
> 
> real    0m4.217s
> user    0m4.215s
> sys     0m0.001s
> miker@weezie miker $ time ./pg_test -setmacro 1000000000
> test SetMacro(2): 1000000000
> 
> real    0m4.217s
> user    0m4.216s
> sys     0m0.001s
> miker@weezie miker $ time ./pg_test -unrolled 1000000000
> test Unrolled(2): 1000000000
> 
> real    0m4.218s
> user    0m4.215s
> sys     0m0.002s
> 
> and now with -O6:
> 
> miker@weezie miker $ time ./pg_test -memset 1000000000
> test MemSetLoop(2): 1000000000
> 
> real    1m13.624s
> user    1m13.542s
> sys     0m0.001s
> miker@weezie miker $ time ./pg_test -origmacro 1000000000
> test OrigMacro(2): 1000000000
> 
> real    0m2.929s
> user    0m2.926s
> sys     0m0.001s
> miker@weezie miker $ time ./pg_test -setmacro 1000000000
> test SetMacro(2): 1000000000
> 
> real    0m2.929s
> user    0m2.926s
> sys     0m0.000s
> miker@weezie miker $ time ./pg_test -unrolled 1000000000
> test Unrolled(2): 1000000000
> 
> real    0m2.510s
> user    0m2.508s
> sys     0m0.001s
> 
> Now with NARGS = 5, -O2:
> 
> miker@weezie miker $ time ./pg_test -memset 1000000000
> test MemSetLoop(5): 1000000000
> 
> real    1m15.204s
> user    1m15.175s
> sys     0m0.002s
> miker@weezie miker $ time ./pg_test -origmacro 1000000000
> test OrigMacro(5): 1000000000
> 
> real    0m10.027s
> user    0m10.022s
> sys     0m0.001s
> miker@weezie miker $ time ./pg_test -setmacro 1000000000
> test SetMacro(5): 1000000000
> 
> real    0m4.177s
> user    0m4.177s
> sys     0m0.000s
> miker@weezie miker $ time ./pg_test -unrolled 1000000000
> test Unrolled(5): 1000000000
> 
> real    0m5.013s
> user    0m5.011s
> sys     0m0.000s
> 
> And once more, with -O6:
> 
> miker@weezie miker $ time ./pg_test -memset 1000000000
> test MemSetLoop(5): 1000000000
> 
> real    1m47.090s
> user    1m46.972s
> sys     0m0.000s
> miker@weezie miker $ time ./pg_test -origmacro 1000000000
> test OrigMacro(5): 1000000000
> 
> real    0m8.367s
> user    0m8.358s
> sys     0m0.000s
> miker@weezie miker $ time ./pg_test -setmacro 1000000000
> test SetMacro(5): 1000000000
> 
> real    0m3.349s
> user    0m3.345s
> sys     0m0.000s
> miker@weezie miker $ time ./pg_test -unrolled 1000000000
> test Unrolled(5): 1000000000
> 
> real    0m3.347s
> user    0m3.343s
> sys     0m0.000s
> 
> 
> Hope the numbers help!
> 
> --
> Mike Rylander
> mrylander@gmail.com
> GPLS -- PINES Development
> Database Developer
> http://open-ils.org
> 


-- 
Mike Rylander
mrylander@gmail.com
GPLS -- PINES Development
Database Developer
http://open-ils.org

Re: FunctionCallN improvement.

From

a_ogawa

Date:

02 February 2005, 15:08:33

Tom Lane wrote:
> Based on this I think we ought to go with the "unrolled" approach, ie,
> we'll create a macro to initialize the fixed fields of fcinfo but fill
> in the arg and argisnull arrays with code like what's already in
> FunctionCall2:

I agree. The unrolled approach is a good result in most environments. 

I think that a new macro becomes the following:

#define InitFunctionCallInfoData(Fcinfo, Flinfo, Nargs) \   do {                                                \
(Fcinfo)->flinfo= Flinfo;                      \       (Fcinfo)->context = NULL;                       \
(Fcinfo)->resultinfo= NULL;                    \       (Fcinfo)->isnull = false;                       \
(Fcinfo)->nargs= Nargs;                        \   } while(0)

I think that this macro is effective also in other function such as 
ExecMakeFunctionResultNoSets. However, we should apply that after 
actually examining the effect.

First of all, this macro will be applied only to fmgr.c, but I think 
we better define it in fmgr.h. 

regards,

---
A.Ogawa ( a_ogawa@hi-ho.ne.jp )

Re: FunctionCallN improvement.

From

Tom Lane

Date:

02 February 2005, 22:46:33

a_ogawa <a_ogawa@hi-ho.ne.jp> writes:
> Tom Lane wrote:
>> Based on this I think we ought to go with the "unrolled" approach,

> I agree. The unrolled approach is a good result in most environments. 

I have committed changes along this line in HEAD and 8_0 branches.

> First of all, this macro will be applied only to fmgr.c, but I think 
> we better define it in fmgr.h. 

For the moment I just put it in fmgr.c to have a minimally invasive
patch.  We can make it globally available if there's evidence it's
needed elsewhere.
        regards, tom lane