Re: [PATCHES] MemSet inline for newNode - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCHES] MemSet inline for newNode
Date
Msg-id 200211102106.gAAL6ir09610@candle.pha.pa.us
Whole thread Raw
Responses Re: [PATCHES] MemSet inline for newNode  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
OK, my compiler, gcc 2.95 does the optimization at -O2, so I ran the
tests here with the attached program, testing a constant of 256 and a
variable 'j' equal to 256.

I found a 1.7% slowdown with the variable compared to the constant.
The 256 value is rather large.  For a length of 32, the difference was
14%!

So, seems I should back out the palloc0 usage and let the MemSet inline
at each location.  Does that seems like the direction to head?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> Tom Lane wrote:
> > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > I have applied this patch to inline MemSet for newNode.
> > > I will make another commit to make more general use of palloc0.
> >
> > Refresh my memory on why either of these was a good idea?
>
> You are talking about the use of palloc0 in place of palloc/MemSet(0),
> not the use of palloc0 in newNode, right?
>
> > > This was already discussed on hackers.
> >
> > And this was not the approach agreed to, IIRC.  What you've done has
> > eliminated the possibility of optimizing away the controlling tests
> > in MemSet.
>
> I see now:
>
>     http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us
>
> I thought someone had tested that there was little/no performance
> difference between these two statements:
>
>     MemSet(ptr, 0, 256)
>
> vs.
>
>     i = 256;
>     MemSet(ptr, 0, i)
>
> However, seeing no email reply, I now assume no test was done.
>
> Neil, can you run such a test and let us know.  It has to be with a
> compiler that optimizes out the MemSet length test when the len is a
> constant.  I can't remember what compiler version/settings that was, but
> it may have been -O3 gcc 3.20.
>
> I can back out my changes, but it would be easier to see if there is a
> benefit before doing that.  Personally, I am going to be surprised if a
> single conditional tests in MemSet (which is not in the assignment loop)
> will make any measurable difference.
>
> --
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
>
> ---------------------------(end of broadcast)---------------------------
> TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
>

--
  Bruce Momjian                        |  http://candle.pha.pa.us
  pgman@candle.pha.pa.us               |  (610) 359-1001
  +  If your life is a hard drive,     |  13 Roberts Road
  +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
#define INT_ALIGN_MASK (sizeof(int) - 1)
#include <stdlib.h>

/*
 * MemSet
 *    Exactly the same as standard library function memset(), but considerably
 *    faster for zeroing small word-aligned structures (such as parsetree nodes).
 *    This has to be a macro because the main point is to avoid function-call
 *    overhead.   However, we have also found that the loop is faster than
 *    native libc memset() on some platforms, even those with assembler
 *    memset() functions.  More research needs to be done, perhaps with
 *    platform-specific MEMSET_LOOP_LIMIT values or tests in configure.
 *
 *    bjm 2002-10-08
 */
#define MemSet(start, val, len) \
    do \
    { \
        int * _start = (int *) (start); \
        int        _val = (val); \
        size_t    _len = (len); \
\
        if ((((long) _start) & INT_ALIGN_MASK) == 0 && \
            (_len & INT_ALIGN_MASK) == 0 && \
            _val == 0 && \
            _len <= MEMSET_LOOP_LIMIT) \
        { \
            int * _stop = (int *) ((char *) _start + _len); \
            while (_start < _stop) \
                *_start++ = 0; \
        } \
        else \
            memset((char *) _start, _val, _len); \
    } while (0)

#define MEMSET_LOOP_LIMIT  1024

int
main(int argc, char **argv)
{
    int i,j;
    char *buf;

    buf = malloc(256);

    j = 32;
    for (i = 0; i < 1000000; i++)
        MemSet(buf, 0, j /* choose 256 or j here */);
    return 0;
}

pgsql-hackers by date:

Previous
From: Tom Lane
Date:
Subject: Implicit coercions, choosing types for constants, etc (yet again)
Next
From: GB Clark
Date:
Subject: Where is src/interfaces/perl5 in beta 5?