Thread: inlining

inlining

From
Bruce Momjian
Date:
Let me add, I am not inlining all the functions, but only the top part
of them that deals with cachoffsets and nulls.  These are the easy ones,
and the ones that get used most often.
--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] inlining

From
"Vadim B. Mikheev"
Date:
Bruce Momjian wrote:
>
> Let me add, I am not inlining all the functions, but only the top part
> of them that deals with cachoffsets and nulls.  These are the easy ones,
> and the ones that get used most often.

fastgetattr() is called from a HUNDREDS places - I'm not sure that
this is good idea.

I suggest to inline _entire_ body of this func in the
execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
data from tuples.

(We could #define FASTGETATTR macro and re-write fastgetattr() as just
this macro "call".)

I don't know should we follow the same way for fastgetiattr() or not...

Vadim

Re: [HACKERS] inlining

From
"Vadim B. Mikheev"
Date:
Sorry - this is with valid charset...

Vadim B. Mikheev wrote:
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.
>
> (We could #define FASTGETATTR macro and re-write fastgetattr() as just
> this macro "call".)
>
> I don't know should we follow the same way for fastgetiattr() or not...
>
> Vadim

Re: [HACKERS] inlining

From
Bruce Momjian
Date:
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.

Here is the fastgetattr macro.  Again, I just inlined the cacheoffset
and null handlling at the top.  Doesn't look like much code, though the
?: macro format makes it look larger.  What do you think?

I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now.  For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.

---------------------------------------------------------------------------

#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
    AssertMacro((attnum) > 0) ? \
    ( \
        ((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
        HeapTupleNoNulls(tup) ? \
        ( \
            ((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
            ( \
                (Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
                    (char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
            ) \
            : \
            ( \
                ((attnum)-1 > 0) ? \
                ( \
                    (Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
                ) \
                : \
                ( \
                    nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
                ) \
            ) \
        ) \
        : \
        ( \
            att_isnull((attnum)-1, (tup)->t_bits) ? \
            ( \
                ((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
                (Datum)NULL \
            ) \
            : \
            ( \
                nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
            ) \
        ) \
    ) \
    : \
    ( \
         (Datum)NULL \
    ) \
)
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.
>
> (We could #define FASTGETATTR macro and re-write fastgetattr() as just
> this macro "call".)
>
> I don't know should we follow the same way for fastgetiattr() or not...
>
> Vadim
>


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] inlining

From
"Vadim B. Mikheev"
Date:
Bruce Momjian wrote:
>
> >
> > Bruce Momjian wrote:
> > >
> > > Let me add, I am not inlining all the functions, but only the top part
> > > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > > and the ones that get used most often.
> >
> > fastgetattr() is called from a HUNDREDS places - I'm not sure that
> > this is good idea.
>
> Here is the fastgetattr macro.  Again, I just inlined the cacheoffset
> and null handlling at the top.  Doesn't look like much code, though the
> ?: macro format makes it look larger.  What do you think?

Try to gmake clean and gmake... Please compare old/new sizes for
debug version too.

Vadim

Re: [HACKERS] inlining

From
Bruce Momjian
Date:
>
> Bruce Momjian wrote:
> >
> > >
> > > Bruce Momjian wrote:
> > > >
> > > > Let me add, I am not inlining all the functions, but only the top part
> > > > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > > > and the ones that get used most often.
> > >
> > > fastgetattr() is called from a HUNDREDS places - I'm not sure that
> > > this is good idea.
> >
> > Here is the fastgetattr macro.  Again, I just inlined the cacheoffset
> > and null handlling at the top.  Doesn't look like much code, though the
> > ?: macro format makes it look larger.  What do you think?
>
> Try to gmake clean and gmake... Please compare old/new sizes for
> debug version too.

OK, here it is, 'size' with two regression run timings:

OLD
    text    data    bss    dec    hex
    831488    155648    201524    1188660    122334
      151.12 real         4.66 user         8.52 sys
      141.70 real         1.28 user         7.44 sys

NEW
    text    data    bss    dec    hex
    864256    155648    201548    1221452    12a34c
      143.52 real         3.48 user         9.08 sys
      146.10 real         1.34 user         7.44 sys

These numbers are with assert and -g on.

Interesting that the 1st regression test is the greatest, and the 2nd is
the least, with the same no-inlining, but with standard optimizations.

Now, my test of startup times shows it saves 0.015 seconds on a 0.10
second test.  This 0.015 is the equvalent to the fork() overhead time.
This speedup is reproducable.

The inlining is a 3% increase in size, but provides a 15% speed increase
on my startup test.

Looks good to me.  I am going to apply the patch, and let people tell me
if they see a speedup worth a 3% binary size increase.

The only visible change is that heap_getattr() does not take a buffer
parameter anymore, thanks to the removal of time travel.

Vadim, I will send you the patch separately to look at.

--
Bruce Momjian
maillist@candle.pha.pa.us



Re: [HACKERS] inlining

From
Bruce Momjian
Date:
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.

Here is the fastgetattr macro.  Again, I just inlined the cacheoffset
and null handlling at the top.  Doesn't look like much code, though the
?: macro format makes it look larger.  What do you think?

I did the same with fastgetiattr, which in fact just called
index_getattr, so that is gone now.  For getsysattr, I made an array of
offsetof(), and do a lookup into the array from heap_getattr, so that is
gone too.

---------------------------------------------------------------------------

#define fastgetattr(tup, attnum, tupleDesc, isnull) \
( \
    AssertMacro((attnum) > 0) ? \
    ( \
        ((isnull) ? (*(isnull) = false) : (dummyret)NULL), \
        HeapTupleNoNulls(tup) ? \
        ( \
            ((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \
            ( \
                (Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \
                    (char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \
            ) \
            : \
            ( \
                ((attnum)-1 > 0) ? \
                ( \
                    (Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \
                ) \
                : \
                ( \
                    nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
                ) \
            ) \
        ) \
        : \
        ( \
            att_isnull((attnum)-1, (tup)->t_bits) ? \
            ( \
                ((isnull) ? (*(isnull) = true) : (dummyret)NULL), \
                (Datum)NULL \
            ) \
            : \
            ( \
                nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \
            ) \
        ) \
    ) \
    : \
    ( \
         (Datum)NULL \
    ) \
)
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.
>
> (We could #define FASTGETATTR macro and re-write fastgetattr() as just
> this macro "call".)
>
> I don't know should we follow the same way for fastgetiattr() or not...
>
> Vadim
>


--
Bruce Momjian
maillist@candle.pha.pa.us


Re: [HACKERS] inlining

From
Bruce Momjian
Date:
> OLD
>     text    data    bss    dec    hex
>     831488    155648    201524    1188660    122334
>       151.12 real         4.66 user         8.52 sys
>       141.70 real         1.28 user         7.44 sys
>
> NEW
>     text    data    bss    dec    hex
>     864256    155648    201548    1221452    12a34c

I have the new size down to 852000, which is only 2.5% increase.


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] inlining

From
Bruce Momjian
Date:
>
> Bruce Momjian wrote:
> >
> > Let me add, I am not inlining all the functions, but only the top part
> > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > and the ones that get used most often.
>
> fastgetattr() is called from a HUNDREDS places - I'm not sure that
> this is good idea.
>
> I suggest to inline _entire_ body of this func in the
> execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> data from tuples.

I don't think I can do that easily.  Inlining the top of the the
function that uses attcacheoff or gets NULL's is easy, but after that,
lots of loops and stuff, which are hard to inline because you really
can't define your own variables inside a macro that returns a value.

Let's see that profiling shows after my changes, and how many times
nocache_getattr(), the new name for the remaining part of the function,
actually has to be called.

Also, there is nocache_getiattr(), and get_sysattr() is gone.  Just an
array lookup for the offset now.


--
Bruce Momjian
maillist@candle.pha.pa.us

Re: [HACKERS] inlining

From
"Vadim B. Mikheev"
Date:
Bruce Momjian wrote:
>
> >
> > Bruce Momjian wrote:
> > >
> > > Let me add, I am not inlining all the functions, but only the top part
> > > of them that deals with cachoffsets and nulls.  These are the easy ones,
> > > and the ones that get used most often.
> >
> > fastgetattr() is called from a HUNDREDS places - I'm not sure that
> > this is good idea.
> >
> > I suggest to inline _entire_ body of this func in the
> > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get
> > data from tuples.
>
> I don't think I can do that easily.  Inlining the top of the the
> function that uses attcacheoff or gets NULL's is easy, but after that,
> lots of loops and stuff, which are hard to inline because you really
> can't define your own variables inside a macro that returns a value.

Ok.

>
> Let's see that profiling shows after my changes, and how many times
> nocache_getattr(), the new name for the remaining part of the function,
> actually has to be called.

Ok.

>
> Also, there is nocache_getiattr(), and get_sysattr() is gone.  Just an
> array lookup for the offset now.

Nice.

Vadim