Re: [PATCHES] MemSet inline for newNode - Mailing list pgsql-hackers

From Bruce Momjian
Subject Re: [PATCHES] MemSet inline for newNode
Date
Msg-id 200211102307.gAAN7Y611251@candle.pha.pa.us
Whole thread Raw
In response to Re: [PATCHES] MemSet inline for newNode  (Bruce Momjian <pgman@candle.pha.pa.us>)
List pgsql-hackers
Further update:

I ran a test and compared the sizes of the postgres binary:  2926925
with palloc0, 2931293 without, a 0.14% decrease.  Not very much,
considering the 1-14% speedup of MemSet with a constant.

Tom, I really didn't think those conditional tests would be significant,
but clearly they are --- call me surprised.

Back out palloc0()?

---------------------------------------------------------------------------

Bruce Momjian wrote:
> 
> OK, my compiler, gcc 2.95 does the optimization at -O2, so I ran the
> tests here with the attached program, testing a constant of 256 and a
> variable 'j' equal to 256.
> 
> I found a 1.7% slowdown with the variable compared to the constant.
> The 256 value is rather large.  For a length of 32, the difference was
> 14%!
> 
> So, seems I should back out the palloc0 usage and let the MemSet inline
> at each location.  Does that seems like the direction to head?
> 
> ---------------------------------------------------------------------------
> 
> Bruce Momjian wrote:
> > Tom Lane wrote:
> > > Bruce Momjian <pgman@candle.pha.pa.us> writes:
> > > > I have applied this patch to inline MemSet for newNode.
> > > > I will make another commit to make more general use of palloc0.
> > > 
> > > Refresh my memory on why either of these was a good idea?
> > 
> > You are talking about the use of palloc0 in place of palloc/MemSet(0),
> > not the use of palloc0 in newNode, right?
> > 
> > > > This was already discussed on hackers.
> > > 
> > > And this was not the approach agreed to, IIRC.  What you've done has
> > > eliminated the possibility of optimizing away the controlling tests
> > > in MemSet.
> > 
> > I see now:
> > 
> >     http://groups.google.com/groups?hl=en&lr=&ie=UTF-8&selm=200210120006.g9C06i502964%40candle.pha.pa.us
> > 
> > I thought someone had tested that there was little/no performance
> > difference between these two statements:
> > 
> >     MemSet(ptr, 0, 256)
> > 
> > vs.
> > 
> >     i = 256;
> >     MemSet(ptr, 0, i)
> > 
> > However, seeing no email reply, I now assume no test was done.
> > 
> > Neil, can you run such a test and let us know.  It has to be with a
> > compiler that optimizes out the MemSet length test when the len is a
> > constant.  I can't remember what compiler version/settings that was, but
> > it may have been -O3 gcc 3.20.
> > 
> > I can back out my changes, but it would be easier to see if there is a
> > benefit before doing that.  Personally, I am going to be surprised if a
> > single conditional tests in MemSet (which is not in the assignment loop)
> > will make any measurable difference.
> > 
> > -- 
> >   Bruce Momjian                        |  http://candle.pha.pa.us
> >   pgman@candle.pha.pa.us               |  (610) 359-1001
> >   +  If your life is a hard drive,     |  13 Roberts Road
> >   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073
> > 
> > ---------------------------(end of broadcast)---------------------------
> > TIP 1: subscribe and unsubscribe commands go to majordomo@postgresql.org
> > 
> 
> -- 
>   Bruce Momjian                        |  http://candle.pha.pa.us
>   pgman@candle.pha.pa.us               |  (610) 359-1001
>   +  If your life is a hard drive,     |  13 Roberts Road
>   +  Christ can be your backup.        |  Newtown Square, Pennsylvania 19073

> #define INT_ALIGN_MASK (sizeof(int) - 1)
> #include <stdlib.h>
> 
> /*
>  * MemSet
>  *    Exactly the same as standard library function memset(), but considerably
>  *    faster for zeroing small word-aligned structures (such as parsetree nodes).
>  *    This has to be a macro because the main point is to avoid function-call
>  *    overhead.   However, we have also found that the loop is faster than
>  *    native libc memset() on some platforms, even those with assembler
>  *    memset() functions.  More research needs to be done, perhaps with
>  *    platform-specific MEMSET_LOOP_LIMIT values or tests in configure.
>  *
>  *    bjm 2002-10-08
>  */
> #define MemSet(start, val, len) \
>     do \
>     { \
>         int * _start = (int *) (start); \
>         int        _val = (val); \
>         size_t    _len = (len); \
> \
>         if ((((long) _start) & INT_ALIGN_MASK) == 0 && \
>             (_len & INT_ALIGN_MASK) == 0 && \
>             _val == 0 && \
>             _len <= MEMSET_LOOP_LIMIT) \
>         { \
>             int * _stop = (int *) ((char *) _start + _len); \
>             while (_start < _stop) \
>                 *_start++ = 0; \
>         } \
>         else \
>             memset((char *) _start, _val, _len); \
>     } while (0)
> 
> #define MEMSET_LOOP_LIMIT  1024
> 
> int
> main(int argc, char **argv)
> {
>     int i,j;
>     char *buf;
> 
>     buf = malloc(256);
> 
>     j = 32;
>     for (i = 0; i < 1000000; i++)
>         MemSet(buf, 0, j /* choose 256 or j here */);
>     return 0;
> }

> 
> ---------------------------(end of broadcast)---------------------------
> TIP 3: if posting/reading through Usenet, please send an appropriate
> subscribe-nomail command to majordomo@postgresql.org so that your
> message can get through to the mailing list cleanly

--  Bruce Momjian                        |  http://candle.pha.pa.us pgman@candle.pha.pa.us               |  (610)
359-1001+  If your life is a hard drive,     |  13 Roberts Road +  Christ can be your backup.        |  Newtown Square,
Pennsylvania19073
 


pgsql-hackers by date:

Previous
From: GB Clark
Date:
Subject: Where is src/interfaces/perl5 in beta 5?
Next
From: "Christopher Kings-Lynne"
Date:
Subject: Beta 5 build report