Thread: pgsql-server/src/backend executor/nodeAgg.c ut ...

pgsql-server/src/backend executor/nodeAgg.c ut ...

From
tgl@postgresql.org (Tom Lane)
Date:
CVSROOT:    /cvsroot
Module name:    pgsql-server
Changes by:    tgl@postgresql.org    02/10/04 13:19:55

Modified files:
    src/backend/executor: nodeAgg.c
    src/backend/utils/fmgr: fmgr.c
    src/backend/utils/sort: tuplesort.c

Log message:
    Tweak a few of the most heavily used function call points to zero out
    just the significant fields of FunctionCallInfoData, rather than MemSet'ing
    the whole struct to zero.  Unused positions in the arg[] array will
    thereby contain garbage rather than zeroes.  This buys back some of the
    performance hit from increasing FUNC_MAX_ARGS.  Also tweak tuplesort.c
    code for more speed by marking some routines 'inline'.  All together
    these changes speed up simple sorts, like count(distinct int4column),
    by about 25% on a P4 running RH Linux 7.2.


Re: pgsql-server/src/backend executor/nodeAgg.c ut ...

From
Bruce Momjian
Date:
Tom Lane wrote:
> CVSROOT:    /cvsroot
> Module name:    pgsql-server
> Changes by:    tgl@postgresql.org    02/10/04 13:19:55
>
> Modified files:
>     src/backend/executor: nodeAgg.c
>     src/backend/utils/fmgr: fmgr.c
>     src/backend/utils/sort: tuplesort.c
>
> Log message:
>     Tweak a few of the most heavily used function call points to zero out
>     just the significant fields of FunctionCallInfoData, rather than MemSet'ing
>     the whole struct to zero.  Unused positions in the arg[] array will
>     thereby contain garbage rather than zeroes.  This buys back some of the
>     performance hit from increasing FUNC_MAX_ARGS.  Also tweak tuplesort.c
>     code for more speed by marking some routines 'inline'.  All together
>     these changes speed up simple sorts, like count(distinct int4column),
>     by about 25% on a P4 running RH Linux 7.2.

Wow, huge win.  Thanks, Tom.

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

Re: pgsql-server/src/backend executor/nodeAgg.c ut ...

From
Neil Conway
Date:
tgl@postgresql.org (Tom Lane) writes:
>     Tweak a few of the most heavily used function call points to zero out
>     just the significant fields of FunctionCallInfoData, rather than MemSet'ing
>     the whole struct to zero.  Unused positions in the arg[] array will
>     thereby contain garbage rather than zeroes.  This buys back some of the
>     performance hit from increasing FUNC_MAX_ARGS.  Also tweak tuplesort.c
>     code for more speed by marking some routines 'inline'.  All together
>     these changes speed up simple sorts, like count(distinct int4column),
>     by about 25% on a P4 running RH Linux 7.2.

I didn't know we were still doing optimizations / features for 7.3 :-)

But very interesting results -- the 25% percent improvement is really
surprising. Do you think there's more low-hanging fruit in this area?

Also, is the use of inline functions encouraged instead of macros? As
of C99 they're a standard part of the language, but I'm not sure how
many compilers implemented them before that (GCC did, of course).

Cheers,

Neil

--
Neil Conway <neilc@samurai.com> || PGP Key ID: DB3C29FC

Re: pgsql-server/src/backend executor/nodeAgg.c ut ...

From
Tom Lane
Date:
Neil Conway <neilc@samurai.com> writes:
> tgl@postgresql.org (Tom Lane) writes:
>> code for more speed by marking some routines 'inline'.  All together
>> these changes speed up simple sorts, like count(distinct int4column),
>> by about 25% on a P4 running RH Linux 7.2.

> I didn't know we were still doing optimizations / features for 7.3 :-)

Normally I wouldn't have, but it was a sufficiently big win for the
amount of effort put in that I figured I'd commit it now instead of
later...

> But very interesting results -- the 25% percent improvement is really
> surprising. Do you think there's more low-hanging fruit in this area?

Well, I'd known for awhile that FunctionCall2 was a bit of a bottleneck,
so that change was on my to-do list.  I was looking at count(distinct
foo) because someone had complained about it being slow, and I saw that
the overhead of calling the datatype-specific comparison function was
really a large fraction of the runtime in this case.  Getting further
won't be so easy.  What I'm seeing with those commits is that the
top hotspots for count(distinct int4column) are

  %   cumulative   self              self     total
 time   seconds   seconds    calls  ms/call  ms/call  name
 39.61     30.07    30.07 289140554     0.00     0.00  comparetup_datum
 16.36     42.49    12.42 16777216     0.00     0.00  tuplesort_heap_siftup
  5.22     46.45     3.96 289140558     0.00     0.00  btint4cmp
  4.37     49.77     3.32 25169576     0.00     0.00  AllocSetAlloc
  2.37     51.57     1.80 33554460     0.00     0.00  GetMemoryChunkSpace

(this is everything above 2% of the runtime).  comparetup_datum includes
inlined ApplySortFunction and FunctionCall2 overhead here, and I'm not
sure how we can get it down much further.

> Also, is the use of inline functions encouraged instead of macros?

I think it's okay for stuff that you are willing to tolerate possibly
not having inlined.  There is no portability issue because configure
takes care of providing a suitable "#define inline" if needed --- but
on some compilers it'll be #define'd as empty.

            regards, tom lane