Neil Conway <neilc@samurai.com> writes:
> tgl@postgresql.org (Tom Lane) writes:
>> code for more speed by marking some routines 'inline'. All together
>> these changes speed up simple sorts, like count(distinct int4column),
>> by about 25% on a P4 running RH Linux 7.2.
> I didn't know we were still doing optimizations / features for 7.3 :-)
Normally I wouldn't have, but it was a sufficiently big win for the
amount of effort put in that I figured I'd commit it now instead of
later...
> But very interesting results -- the 25% percent improvement is really
> surprising. Do you think there's more low-hanging fruit in this area?
Well, I'd known for awhile that FunctionCall2 was a bit of a bottleneck,
so that change was on my to-do list. I was looking at count(distinct
foo) because someone had complained about it being slow, and I saw that
the overhead of calling the datatype-specific comparison function was
really a large fraction of the runtime in this case. Getting further
won't be so easy. What I'm seeing with those commits is that the
top hotspots for count(distinct int4column) are
% cumulative self self total
time seconds seconds calls ms/call ms/call name
39.61 30.07 30.07 289140554 0.00 0.00 comparetup_datum
16.36 42.49 12.42 16777216 0.00 0.00 tuplesort_heap_siftup
5.22 46.45 3.96 289140558 0.00 0.00 btint4cmp
4.37 49.77 3.32 25169576 0.00 0.00 AllocSetAlloc
2.37 51.57 1.80 33554460 0.00 0.00 GetMemoryChunkSpace
(this is everything above 2% of the runtime). comparetup_datum includes
inlined ApplySortFunction and FunctionCall2 overhead here, and I'm not
sure how we can get it down much further.
> Also, is the use of inline functions encouraged instead of macros?
I think it's okay for stuff that you are willing to tolerate possibly
not having inlined. There is no portability issue because configure
takes care of providing a suitable "#define inline" if needed --- but
on some compilers it'll be #define'd as empty.
regards, tom lane