Thread: inlining
Let me add, I am not inlining all the functions, but only the top part of them that deals with cachoffsets and nulls. These are the easy ones, and the ones that get used most often. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > Let me add, I am not inlining all the functions, but only the top part > of them that deals with cachoffsets and nulls. These are the easy ones, > and the ones that get used most often. fastgetattr() is called from a HUNDREDS places - I'm not sure that this is good idea. I suggest to inline _entire_ body of this func in the execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get data from tuples. (We could #define FASTGETATTR macro and re-write fastgetattr() as just this macro "call".) I don't know should we follow the same way for fastgetiattr() or not... Vadim
Sorry - this is with valid charset... Vadim B. Mikheev wrote: > > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. > > (We could #define FASTGETATTR macro and re-write fastgetattr() as just > this macro "call".) > > I don't know should we follow the same way for fastgetiattr() or not... > > Vadim
> > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. Here is the fastgetattr macro. Again, I just inlined the cacheoffset and null handlling at the top. Doesn't look like much code, though the ?: macro format makes it look larger. What do you think? I did the same with fastgetiattr, which in fact just called index_getattr, so that is gone now. For getsysattr, I made an array of offsetof(), and do a lookup into the array from heap_getattr, so that is gone too. --------------------------------------------------------------------------- #define fastgetattr(tup, attnum, tupleDesc, isnull) \ ( \ AssertMacro((attnum) > 0) ? \ ( \ ((isnull) ? (*(isnull) = false) : (dummyret)NULL), \ HeapTupleNoNulls(tup) ? \ ( \ ((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \ ( \ (Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \ (char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \ ) \ : \ ( \ ((attnum)-1 > 0) ? \ ( \ (Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \ ) \ : \ ( \ nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \ ) \ ) \ ) \ : \ ( \ att_isnull((attnum)-1, (tup)->t_bits) ? \ ( \ ((isnull) ? (*(isnull) = true) : (dummyret)NULL), \ (Datum)NULL \ ) \ : \ ( \ nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \ ) \ ) \ ) \ : \ ( \ (Datum)NULL \ ) \ ) > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. > > (We could #define FASTGETATTR macro and re-write fastgetattr() as just > this macro "call".) > > I don't know should we follow the same way for fastgetiattr() or not... > > Vadim > -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > > > Bruce Momjian wrote: > > > > > > Let me add, I am not inlining all the functions, but only the top part > > > of them that deals with cachoffsets and nulls. These are the easy ones, > > > and the ones that get used most often. > > > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > > this is good idea. > > Here is the fastgetattr macro. Again, I just inlined the cacheoffset > and null handlling at the top. Doesn't look like much code, though the > ?: macro format makes it look larger. What do you think? Try to gmake clean and gmake... Please compare old/new sizes for debug version too. Vadim
> > Bruce Momjian wrote: > > > > > > > > Bruce Momjian wrote: > > > > > > > > Let me add, I am not inlining all the functions, but only the top part > > > > of them that deals with cachoffsets and nulls. These are the easy ones, > > > > and the ones that get used most often. > > > > > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > > > this is good idea. > > > > Here is the fastgetattr macro. Again, I just inlined the cacheoffset > > and null handlling at the top. Doesn't look like much code, though the > > ?: macro format makes it look larger. What do you think? > > Try to gmake clean and gmake... Please compare old/new sizes for > debug version too. OK, here it is, 'size' with two regression run timings: OLD text data bss dec hex 831488 155648 201524 1188660 122334 151.12 real 4.66 user 8.52 sys 141.70 real 1.28 user 7.44 sys NEW text data bss dec hex 864256 155648 201548 1221452 12a34c 143.52 real 3.48 user 9.08 sys 146.10 real 1.34 user 7.44 sys These numbers are with assert and -g on. Interesting that the 1st regression test is the greatest, and the 2nd is the least, with the same no-inlining, but with standard optimizations. Now, my test of startup times shows it saves 0.015 seconds on a 0.10 second test. This 0.015 is the equvalent to the fork() overhead time. This speedup is reproducable. The inlining is a 3% increase in size, but provides a 15% speed increase on my startup test. Looks good to me. I am going to apply the patch, and let people tell me if they see a speedup worth a 3% binary size increase. The only visible change is that heap_getattr() does not take a buffer parameter anymore, thanks to the removal of time travel. Vadim, I will send you the patch separately to look at. -- Bruce Momjian maillist@candle.pha.pa.us
> > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. Here is the fastgetattr macro. Again, I just inlined the cacheoffset and null handlling at the top. Doesn't look like much code, though the ?: macro format makes it look larger. What do you think? I did the same with fastgetiattr, which in fact just called index_getattr, so that is gone now. For getsysattr, I made an array of offsetof(), and do a lookup into the array from heap_getattr, so that is gone too. --------------------------------------------------------------------------- #define fastgetattr(tup, attnum, tupleDesc, isnull) \ ( \ AssertMacro((attnum) > 0) ? \ ( \ ((isnull) ? (*(isnull) = false) : (dummyret)NULL), \ HeapTupleNoNulls(tup) ? \ ( \ ((tupleDesc)->attrs[(attnum)-1]->attcacheoff > 0) ? \ ( \ (Datum)fetchatt(&((tupleDesc)->attrs[(attnum)-1]), \ (char *) (tup) + (tup)->t_hoff + (tupleDesc)->attrs[(attnum)-1]->attcacheoff) \ ) \ : \ ( \ ((attnum)-1 > 0) ? \ ( \ (Datum)fetchatt(&((tupleDesc)->attrs[0]), (char *) (tup) + (tup)->t_hoff) \ ) \ : \ ( \ nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \ ) \ ) \ ) \ : \ ( \ att_isnull((attnum)-1, (tup)->t_bits) ? \ ( \ ((isnull) ? (*(isnull) = true) : (dummyret)NULL), \ (Datum)NULL \ ) \ : \ ( \ nocachegetattr((tup), (attnum), (tupleDesc), (isnull)) \ ) \ ) \ ) \ : \ ( \ (Datum)NULL \ ) \ ) > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. > > (We could #define FASTGETATTR macro and re-write fastgetattr() as just > this macro "call".) > > I don't know should we follow the same way for fastgetiattr() or not... > > Vadim > -- Bruce Momjian maillist@candle.pha.pa.us
> OLD > text data bss dec hex > 831488 155648 201524 1188660 122334 > 151.12 real 4.66 user 8.52 sys > 141.70 real 1.28 user 7.44 sys > > NEW > text data bss dec hex > 864256 155648 201548 1221452 12a34c I have the new size down to 852000, which is only 2.5% increase. -- Bruce Momjian maillist@candle.pha.pa.us
> > Bruce Momjian wrote: > > > > Let me add, I am not inlining all the functions, but only the top part > > of them that deals with cachoffsets and nulls. These are the easy ones, > > and the ones that get used most often. > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > this is good idea. > > I suggest to inline _entire_ body of this func in the > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > data from tuples. I don't think I can do that easily. Inlining the top of the the function that uses attcacheoff or gets NULL's is easy, but after that, lots of loops and stuff, which are hard to inline because you really can't define your own variables inside a macro that returns a value. Let's see that profiling shows after my changes, and how many times nocache_getattr(), the new name for the remaining part of the function, actually has to be called. Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an array lookup for the offset now. -- Bruce Momjian maillist@candle.pha.pa.us
Bruce Momjian wrote: > > > > > Bruce Momjian wrote: > > > > > > Let me add, I am not inlining all the functions, but only the top part > > > of them that deals with cachoffsets and nulls. These are the easy ones, > > > and the ones that get used most often. > > > > fastgetattr() is called from a HUNDREDS places - I'm not sure that > > this is good idea. > > > > I suggest to inline _entire_ body of this func in the > > execQual.c:ExecEvalVar() - Executor uses _only_ ExecEvalVar() to get > > data from tuples. > > I don't think I can do that easily. Inlining the top of the the > function that uses attcacheoff or gets NULL's is easy, but after that, > lots of loops and stuff, which are hard to inline because you really > can't define your own variables inside a macro that returns a value. Ok. > > Let's see that profiling shows after my changes, and how many times > nocache_getattr(), the new name for the remaining part of the function, > actually has to be called. Ok. > > Also, there is nocache_getiattr(), and get_sysattr() is gone. Just an > array lookup for the offset now. Nice. Vadim